Python encode() and decode() Functions

Python’s encode and decode methods are used to encode and decode the input string, using a given encoding. Let us look at these two functions in detail in this article.


Encode a given String

We use the encode() method on the input string, which every string object has.

Format:

input_string.encode(encoding, errors)

This encodes input_string using encoding, where errors decides the behavior to be followed if, by any chance, the encoding fails on the string.

encode() will result in a sequence of bytes.

inp_string = 'Hello'
bytes_encoded = inp_string.encode()
print(type(bytes_encoded))

This results in an object of <class 'bytes'>, as expected:

<class 'bytes'>

The type of encoding to be followed is shown by the encoding parameter. There are various types of character encoding schemes, out of which the scheme UTF-8 is used in Python by default.

Let us look at the encoding parameter using an example.

a = 'This is a simple sentence.'

print('Original string:', a)

# Decodes to utf-8 by default
a_utf = a.encode()

print('Encoded string:', a_utf)

Output

Original string: This is a simple sentence.
Encoded string: b'This is a simple sentence.'

NOTE: As you can observe, we have encoded the input string in the UTF-8 format. Although there is not much of a difference, you can observe that the string is prefixed with a b. This means that the string is converted to a stream of bytes, which is how it is stored on any computer. As bytes!

This is actually not human-readable and is only represented as the original string for readability, prefixed with a b, to denote that it is not a string, but a sequence of bytes.


Handling errors

There are various types of errors, some of which are mentioned below:

Type of ErrorBehavior
strictDefault behavior which raises UnicodeDecodeError on failure.
ignoreIgnores the un-encodable Unicode from the result.
replaceReplaces all un-encodable Unicode characters with a question mark (?)
backslashreplaceInserts a backslash escape sequence (\uNNNN) instead of un-encodable Unicode characters.

Let us look at the above concepts using a simple example. We will consider an input string where not all characters are encodable (such as ö),

a = 'This is a bit möre cömplex sentence.'

print('Original string:', a)

print('Encoding with errors=ignore:', a.encode(encoding='ascii', errors='ignore'))
print('Encoding with errors=replace:', a.encode(encoding='ascii', errors='replace'))

Output

Original string: This is a möre cömplex sentence.
Encoding with errors=ignore: b'This is a bit mre cmplex sentence.'
Encoding with errors=replace: b'This is a bit m?re c?mplex sentence.'

Decoding a Stream of Bytes

Similar to encoding a string, we can decode a stream of bytes to a string object, using the decode() function.

Format:

encoded = input_string.encode()
# Using decode()
decoded = encoded.decode(decoding, errors)

Since encode() converts a string to bytes, decode() simply does the reverse.

byte_seq = b'Hello'
decoded_string = byte_seq.decode()
print(type(decoded_string))
print(decoded_string)

Output

<class 'str'>
Hello

This shows that decode() converts bytes to a Python string.

Similar to those of encode(), the decoding parameter decides the type of encoding from which the byte sequence is decoded. The errors parameter denotes the behavior if the decoding fails, which has the same values as that of encode().


Importance of encoding

Since encoding and decoding an input string depends on the format, we must be careful when encoding/decoding. If we use the wrong format, it will result in the wrong output and can give rise to errors.

The below snippet shows the importance of encoding and decoding.

The first decoding is incorrect, as it tries to decode an input string which is encoded in the UTF-8 format. The second one is correct since the encoding and decoding formats are the same.

a = 'This is a bit möre cömplex sentence.'

print('Original string:', a)

# Encoding in UTF-8
encoded_bytes = a.encode('utf-8', 'replace')

# Trying to decode via ASCII, which is incorrect
decoded_incorrect = encoded_bytes.decode('ascii', 'replace')
decoded_correct = encoded_bytes.decode('utf-8', 'replace')

print('Incorrectly Decoded string:', decoded_incorrect)
print('Correctly Decoded string:', decoded_correct)

Output

Original string: This is a bit möre cömplex sentence.
Incorrectly Decoded string: This is a bit m��re c��mplex sentence.
Correctly Decoded string: This is a bit möre cömplex sentence.

Conclusion

In this article, we learned how to use the encode() and decode() methods to encode an input string and decode an encoded byte sequence.

We also learned about how it handles errors in encoding/decoding via the errors parameter. This can be useful for encryption and decryption purposes, such as locally caching an encrypted password and decoding them for later use.

References

  • JournalDev article on encode-decode