Obfuscate Python Code Effectively

We will be discussing how to obfuscate Python code effectively in this article and figure out why it is necessary to protect Python code and make it difficult to understand when Python is an interpreted language, its code is stored as it is, and it is very simple to understand.

To obfuscate Python code effectively, we need to understand what obfuscation exactly is and its importance.

What is Obfuscation?

Obfuscation, in general words, is the practice of making the code difficult to read. It is a practice followed widely in the industry before publishing proprietary software in the public domain as it always faces a threat of being copied and cloned.

Obfuscation can be done in many ways that we will be discussing further in this article. Obfuscation involves things like encrypting code, splitting code, and other things.

Once code is obfuscated, it needs to be deobfuscated to be able to read it.

Importance of Code Protection

Proprietary software is built with a massive investment of time and money, and the code base of such software is a precious asset of companies, and securing this asset is extremely important in such scenarios.

Hence obfuscation is a really important practice to follow before deploying software in the public domain as it will keep your code base safe and make it difficult to understand, and it will definitely be difficult to be copied as compared to just normal code.

Obfuscating Python Code

The reason for this is as Python is gaining popularity, more and more proprietary projects are being developed, and storing the code base of such software without encryption can be very dangerous for the software’s developers, and it needs to be secured.

Obfuscation will provide some layer of security to our code anyways, there is no possible way to obfuscate Python code in a way that it cannot be deobfuscated, but something is always better than nothing.

We will be taking a look at ways in which obfuscation can be done to Python to provide some level of security to our code base.

Obfuscation Techniques

Name Mangling

How does it work?

To make the code difficult to understand, a very easy practice code is to jumble and mix up the naming of variables and functions so that no one can understand their relevance of them.

However, authorized personnel can be given an idea so that the concerned individual does not have a problem working with our code.

Name Mangling is nothing but just renaming variables and functions.

Benefits and Limitations

The benefit of following this technique is that it is very easy to be implemented, and it creates some level of difficulty in understanding the code base.

The limitation would be that as it is easy to implement, it is also easy to be decrypted, and if we try to increase the complexity by jumbling the names more, there might be a situation where we ourselves or the people within our team working with us would not be able to work with our code base, and that would create a troublesome situation in a team environment.

Obfuscating String Literals

Obfuscating string literals is a very crucial element in adding protection to our code. In this technique, we encrypt the normal string literals, which contain crucial data such as passwords, API keys, and other important strings, using certain encryption algorithms to stop the important information or data from getting accessed by potential attackers.

There are various methods of encrypting a string, out of which we will be discussing a few:

String Segmentation

Consider a string that contains a very important password; now, storing such a string as a normal string would be a very wrong practice as it would be very easy for the attackers to extract this.

To avoid this, we will save the strings as small substrings, and then when we want to use the string, we will concatenate the substring in such an order that the substring would join to become the actual string.

actual_string = "poiuytrewq>?<*&^%%"

#Segmentation
s1 = 'poi'
s2 = "^%%"
s3 = "uyt"
s4 = "<*&"
s5 = "rew"
s6 = "q>?"

new = s1 + s3 + s5 + s6 + s4 + s2

if new == actual_string:
     print("Yes")

The above image shows an example of the above technique.

Base64 Encoding

To implement this method, we will first import the base64 module and then encode the string using the module.

import base64

#Actual string to be encoded

actual_string = "poiuytrewq>?<"

#Obfuscating string using the imported module
obfuscated_string = base64.b64encode(b"poiuytrewq>?<").decode()
print(obfuscated_string)

Running the above code successfully would obfuscate your string into Base64 encoding.

XOR Encryption

For this method, we will be choosing an encryption key by which we will XORing each character of our string to obfuscate our critical string.

#actual String
actual_string = "poiuytrewq>?<"

#Obfuscation function
def encrypt_actual_string(target, key):
       return "".join(chr(ord(c) ^ key) for c in target)

#Obfuscating the actual string

obfuscated_string = encrypt_actual_string(actual_string, 28)
print(obfuscated_string)

The above code is just a simple function that takes the target string and a key as parameters and returns the encrypted key.

Adding Junk Code and Comments

This method of obfuscation is quite simple and does not need any explicit explanation or examples.

To implement this method, all we need to do is write misleading code so that any unauthorized user or attacker may get misled by our code and comments.

This technique can be used to create a level of confusion and not to keep our code straightforward.

Comments are used to give ideas to others about what is happening in our code, and they play a vital role in communicating our code with others; this can also be interpreted by the attackers, so creating misleading comments will mislead them and trap them into a code that is inconsequential to our code.

Since this method is very easy to use, there might be a possibility of us getting overboard and messing up the entire code, so we should make sure that we keep our core code intact and not mess things up for us and our team.

Advanced Obfuscation Techniques

Conversion to Bye Code

The first method could be converting the file into a ‘.pyo’ file.

For the demonstration, I will be creating a sample.py file which I will be using to demonstrate the obfuscation process.

To convert the code into byte code, we must first navigate to our current directory through our terminal. I am using Pycharm as the IDE for this article; the choice of IDE is entirely your decision.

If you wish to choose Pycharm as your IDE, you can follow the extensive article about installing and using Pycharm given in the link.

As you can see, I have navigated to the current directory in my Pycharm terminal.

Byte Current Directory — Current Directory

Since we have navigated to the current directory, let’s run the below command there to convert our code into byte code.

python -OO -m py_compile <your program.py>

Running the above code without any error will create a ‘.pyo’ file which will have the byte code.

This is one of the simplest methods but also has certain limitations as code can be recovered easily by others.

Using Third-Party Obfuscation Tools

Using ‘pyarmor’

Firstly we will be installing pyarmor using pip which is a package manager of Python.

To know more about pip, check the article in this link.

To install ‘pyarmor’, run the below code in your terminal.

pip install pyarmor

On successful installation of pyarmor we can move ahead with obfuscating our python code.

Firstly navigate to the current folder in our terminal and run the below command there.

pyarmor obfuscate <filename.py>

The obfuscated file will be loaded in the dist folder. The above command usually obfuscates all the files in the directory. To avoid this and obfuscate a single file, use the below command.

pyarmor obfuscate --exact <filename.py>

However, the problem is even this process can be reversed, and the code can be decrypted.

‘pyinstaller’

A freezing tool that creates standalone executables from Python scripts. Obfuscates code as a result of the freezing process. The standalone is nothing but compiled binary files, which are tough to decrypt back into Python scripts.

To understand more about ‘pyinstaller’, please check out this link.

‘Cython’

This is not an obfuscation tool but a library that converts Python code into C as we know that when a C program complies, it is converted into machine code which is not easily understandable.

For demonstration purposes, I will be using Google Colab notebook

Firstly we need to install cython in our machine. To install cython, run the below command on your terminal.

pip install cython

The below Image shows an example of using cython.

‘Pyobfuscator’

This is a dedicated Python library used to obfuscate Python code. It has built-in methods to implement most of the obfuscation techniques that we have discussed above so that we don’t need to implement them from scratch.

To start using this library, first, install it using the below command.

pip install PyObfuscator

To explore this module in-depth, you can take a look at the official documentation.

Best Practices for Effective Obfuscation

Balancing Obfuscation and Maintainability

We have discussed obfuscation extensively, and we have discussed the point that over-obfuscation may lead to loss of code, so we must understand and learn how to strike a balance between strong obfuscation and maintainability so that our code base remains safe and at the same time it should not be very difficult to be maintained and understood in the future.

For this purpose, we need to make sure that every encryption that we make has a decryption key that is safe with us, and if we are in a team, then with our team members as well.

Testing Obfuscated Code

Obfuscation is a process done towards the project cycle’s later stages. Many-a-times, while obfuscating, we make changes to our code; this code is in a fully functional state, and at times, while obfuscating, we might break the code or generate some error.

Hence it is important to test the code even after obfuscation so that any errors that might have been created while encrypting can be identified and debugged.

Conclusion

Risk of Obfuscating the Code

Obfuscating the code definitely has many advantages, to it needs to be done very carefully because we don’t want our code to get so complicated that we ourselves are not able to understand our code, and we also need to make sure that people on our team can interpret our code.

So we have to maintain a certain level of obfuscation and make sure that we do not go beyond that.

Importance of Obfuscating Python Code

Python is a very easy-to-understand language. Its code is not only easy to read but also less secure and while using it for large software there should be a layer of security in your code so that you avoid unauthorized access to your code.

Obfuscation does not entirely satisfy this need but definitely provides some layers of security and reduces the readability to some extent, which will safeguard our valuable code base from unauthorized access.

Summary

To summarize this article, let’s go through whatever we have been through in this article. We have an idea of obfuscation and understand how it can be done on Python code in detail.

We have seen how and why obfuscation is important and how it secures our valuable code bases from being attacked. Lastly, we went through some practices to be followed and finished off by understanding the demerits of Obfuscation.

References

Stackoverflow Query