Python HowTo – Using the gzip Module in Python

Gzip Module in Python

Hello everyone! In today’s article, we’ll be taking a look at the gzip module in Python.

This module gives us an easy way to deal with gzip files (.gz). This works very similarly to the Linux utility commands gzip and gunzip.

Let’s look at how we can use this module effectively, using some illustrative examples!


Using the gzip Module in Python

This module provides us with high-level functions such as open(), compress() and decompress(), for quickly dealing with these file extensions.

Essentially, this will be simply opening a file!

To import this module, you need the below statement:

import gzip

There is no need to pip install this module since it is a part of the standard library! Let’s get started with dealing with some gzip files.

Writing to a compressed file

We can use the gzip.open() method to directly open .gz files and write to these compressed files!

import gzip
import os
import io

name = 'sample.txt.gz'

with gzip.open(name, 'wb') as output:
        # We cannot directly write Python objects like strings!
        # We must first convert them into a bytes format using io.BytesIO() and then write it
        with io.TextIOWrapper(output, encoding='utf-8') as encode:
            encode.write('This is a sample text')

# Let's print the updated file stats now
print(f"The file {name} now contains {os.stat(name).st_size} bytes")

Here, note that we cannot directly write Python objects like strings!

We must first convert them into a bytes format using io.TextIOWrapper() and then write it using this wrapper function. That’s why we open the file in binary-write mode (wb).

If you run the program, you’ll get the below output.

Output

The file sample.txt.gz now contains 57 bytes

Also, you would observe that the file sample.txt.gz is created on your current directory. Alright, so we’ve successfully written to this compressed file.

Let’s now try to decompress it and read it’s contents.

Reading Compressed Data from the gzip file

Now, similar to the write() function via the wrapper, we can also read() using that same function.

import gzip
import os
import io

name = 'sample.txt.gz'

with gzip.open(name, 'rb') as ip:
        with io.TextIOWrapper(ip, encoding='utf-8') as decoder:
            # Let's read the content using read()
            content = decoder.read()
            print(content)

Output

This is a sample text

Indeed, we were able to get back the same text we wrote initially!

Compressing Data

Another useful feature of this module is that we can effectively compress data using gzip.

If we have a lot of byte content as input, we can use the gzip.compress() function to compress it.

import gzip

ip = b"This is a large wall of text. This is also from AskPython"
out = gzip.compress(ip)

In this case, the binary string will be compressed using gzip.compress.


Conclusion

In this article, we learned about how we can use the gzip module in Python, to read and write to .gz files.

References

  • Python gzip module Documentation
  • JournalDev article on Python gzip module