Hello everyone! In today’s article, we’ll be taking a look at the gzip module in Python.
This module gives us an easy way to deal with gzip files (.gz
). This works very similarly to the Linux utility commands gzip
and gunzip
.
Let’s look at how we can use this module effectively, using some illustrative examples!
Using the gzip Module in Python
This module provides us with high-level functions such as open()
, compress()
and decompress()
, for quickly dealing with these file extensions.
Essentially, this will be simply opening a file!
To import this module, you need the below statement:
import gzip
There is no need to pip install this module since it is a part of the standard library! Let’s get started with dealing with some gzip files.
Writing to a compressed file
We can use the gzip.open()
method to directly open .gz
files and write to these compressed files!
import gzip
import os
import io
name = 'sample.txt.gz'
with gzip.open(name, 'wb') as output:
# We cannot directly write Python objects like strings!
# We must first convert them into a bytes format using io.BytesIO() and then write it
with io.TextIOWrapper(output, encoding='utf-8') as encode:
encode.write('This is a sample text')
# Let's print the updated file stats now
print(f"The file {name} now contains {os.stat(name).st_size} bytes")
Here, note that we cannot directly write Python objects like strings!
We must first convert them into a bytes format using io.TextIOWrapper()
and then write it using this wrapper function. That’s why we open the file in binary-write mode (wb
).
If you run the program, you’ll get the below output.
Output
The file sample.txt.gz now contains 57 bytes
Also, you would observe that the file sample.txt.gz
is created on your current directory. Alright, so we’ve successfully written to this compressed file.
Let’s now try to decompress it and read it’s contents.
Reading Compressed Data from the gzip file
Now, similar to the write()
function via the wrapper, we can also read()
using that same function.
import gzip
import os
import io
name = 'sample.txt.gz'
with gzip.open(name, 'rb') as ip:
with io.TextIOWrapper(ip, encoding='utf-8') as decoder:
# Let's read the content using read()
content = decoder.read()
print(content)
Output
This is a sample text
Indeed, we were able to get back the same text we wrote initially!
Compressing Data
Another useful feature of this module is that we can effectively compress data using gzip
.
If we have a lot of byte content as input, we can use the gzip.compress()
function to compress it.
import gzip
ip = b"This is a large wall of text. This is also from AskPython"
out = gzip.compress(ip)
In this case, the binary string will be compressed using gzip.compress
.
Conclusion
In this article, we learned about how we can use the gzip module in Python, to read and write to .gz
files.
References
- Python gzip module Documentation
- JournalDev article on Python gzip module