decode methods are used to encode and decode the input string, using a given encoding. Let us look at these two functions in detail in this article.
Encode a given String
We use the
encode() method on the input string, which every string object has.
errors decides the behavior to be followed if, by any chance, the encoding fails on the string.
encode() will result in a sequence of
inp_string = 'Hello' bytes_encoded = inp_string.encode() print(type(bytes_encoded))
This results in an object of
<class 'bytes'>, as expected:
The type of encoding to be followed is shown by the
encoding parameter. There are various types of character encoding schemes, out of which the scheme UTF-8 is used in Python by default.
Let us look at the
encoding parameter using an example.
a = 'This is a simple sentence.' print('Original string:', a) # Decodes to utf-8 by default a_utf = a.encode() print('Encoded string:', a_utf)
Original string: This is a simple sentence. Encoded string: b'This is a simple sentence.'
NOTE: As you can observe, we have encoded the input string in the UTF-8 format. Although there is not much of a difference, you can observe that the string is prefixed with a
b. This means that the string is converted to a stream of bytes, which is how it is stored on any computer. As bytes!
This is actually not human-readable and is only represented as the original string for readability, prefixed with a
b, to denote that it is not a string, but a sequence of bytes.
There are various types of
errors, some of which are mentioned below:
|Type of Error||Behavior|
|Default behavior which raises |
|Ignores the un-encodable Unicode from the result.|
|Replaces all un-encodable Unicode characters with a question mark (|
|Inserts a backslash escape sequence (|
Let us look at the above concepts using a simple example. We will consider an input string where not all characters are encodable (such as
a = 'This is a bit möre cömplex sentence.' print('Original string:', a) print('Encoding with errors=ignore:', a.encode(encoding='ascii', errors='ignore')) print('Encoding with errors=replace:', a.encode(encoding='ascii', errors='replace'))
Original string: This is a möre cömplex sentence. Encoding with errors=ignore: b'This is a bit mre cmplex sentence.' Encoding with errors=replace: b'This is a bit m?re c?mplex sentence.'
Decoding a Stream of Bytes
Similar to encoding a string, we can decode a stream of bytes to a string object, using the
encoded = input_string.encode() # Using decode() decoded = encoded.decode(decoding, errors)
encode() converts a string to bytes,
decode() simply does the reverse.
byte_seq = b'Hello' decoded_string = byte_seq.decode() print(type(decoded_string)) print(decoded_string)
<class 'str'> Hello
This shows that
decode() converts bytes to a Python string.
Similar to those of
decoding parameter decides the type of encoding from which the byte sequence is decoded. The
errors parameter denotes the behavior if the decoding fails, which has the same values as that of
Importance of encoding
Since encoding and decoding an input string depends on the format, we must be careful when encoding/decoding. If we use the wrong format, it will result in the wrong output and can give rise to errors.
The below snippet shows the importance of encoding and decoding.
The first decoding is incorrect, as it tries to decode an input string which is encoded in the UTF-8 format. The second one is correct since the encoding and decoding formats are the same.
a = 'This is a bit möre cömplex sentence.' print('Original string:', a) # Encoding in UTF-8 encoded_bytes = a.encode('utf-8', 'replace') # Trying to decode via ASCII, which is incorrect decoded_incorrect = encoded_bytes.decode('ascii', 'replace') decoded_correct = encoded_bytes.decode('utf-8', 'replace') print('Incorrectly Decoded string:', decoded_incorrect) print('Correctly Decoded string:', decoded_correct)
Original string: This is a bit möre cömplex sentence. Incorrectly Decoded string: This is a bit m��re c��mplex sentence. Correctly Decoded string: This is a bit möre cömplex sentence.
In this article, we learned how to use the
decode() methods to encode an input string and decode an encoded byte sequence.
We also learned about how it handles errors in encoding/decoding via the
errors parameter. This can be useful for encryption and decryption purposes, such as locally caching an encrypted password and decoding them for later use.
- JournalDev article on encode-decode