Pandas to_pickle(): Pickle (serialize) object to File

Pickle is generally used in Python to serialize and deserialize Python object structures. To put it another way, it is the procedure of transforming a Python object into a byte stream so that it can be stored in a file or database, have its state preserved across sessions, or be used to transfer data over a network. By using the pickled byte stream and then unpickling it, the original object hierarchy can be recreated.

In this tutorial, you will learn about the Pandas to_pickle() function and how to use it to serialize a Pandas object.

Also read: Pandas to_numeric – Convert the argument to a numeric type.

Prerequisites of to_pickle()

You need to have Python and Pandas installed on your computer and your favorite IDE set up to start coding.

If you don’t have Pandas installed, you can install it using the command:

pip install pandas

If you are using the Anaconda distribution use the command:

conda install pandas

Syntax of Pandas to_pickle()

Let’s look at the function’s syntax before moving on to the examples.

DataFrame.to_pickle(path, compression='infer', protocol=5, storage_options=None)

Parameter	Description
path	File path where the pickled object will be stored.
compression	A string denotes the type of compression used on the output.
protocol	An integer denotes which protocol should be followed by the pickler.
storage_options	Additional options that let you save to certain storage connections, like S3.

Note: The compression parameter can take the following values:

infer
gzip
zip
bz2
xz
tar
zst

Also read: Pandas to_timedelta – Convert argument to timedelta.

Implementing Pandas to_pickle()

Now, that you have understood the syntax of the to_pickle() function, let us create a sample data frame and try it out!

Creating a Pandas DataFrame

import pandas as pd 

# creating a data frame
data = {
    'fruit': ['apple', 'cherry', 'banana', 'watermelon', 'grapes'],
    'colour': ['red', 'red', 'yellow', 'green', 'black'], 
    'price': [100, 120, 60, 110, 80]
}

df = pd.DataFrame(data)

df

This data frame has 3 columns and 5 records.

Convert a DataFrame to a Pickle File

Use the pandas.to_pickle() function to create a pickle file from this data frame. There is only one necessary argument, which is path. You can specify the path to the pickled file’s destination or a string specifying the name of the pickled file to store it in the same directory as your code file.

import pandas as pd 

# creating a data frame
df = pd.DataFrame({
    'fruit': ['apple', 'cherry', 'banana', 'watermelon', 'grapes'],
    'colour': ['red', 'red', 'yellow', 'green', 'black'], 
    'price': [100, 120, 60, 110, 80]
})

df.to_pickle('fruit_pickle.pkl')

‘pkl’ is the extension for pickle files.

Now, the data frame has been serialized to a pickle file named ‘fruit_pickle.pkl’.

All prior transformations that were carried out on a data frame are also kept when it is pickled. Anyone can utilize the file by unpickling it whenever they need data in this format.

Unpickle a File

You can also perform the opposite process of pickling i.e. unpickling of files. The Pandas read_pickle() function lets you do so.

unpickled_fruit = pd.read_pickle('fruit_pickle.pkl')

unpickled_fruit

This will return you the same data frame instance which was pickled earlier.

Add Compression when Pickling a Pandas DataFrame

The DataFrame can be pickled in a variety of formats, as was described before in this article. By utilizing the compression parameter, you can choose the format you want.

How to add zip compression to the pickled file is demonstrated in the code below.

df.to_pickle('compressed_fruit_pickle.pkl', compression='zip')

Please take note that you must also declare the compression when attempting to read a pickled file with compression.

Convert a Column of the DataFrame to a Pickle File

You can serialize a portion of a Pandas DataFrame in addition to serializing the complete thing. Applying the Pandas to pickle() method solely to the necessary column will do this.

df['colour'].to_pickle('colour_pickle.pkl')

This code serializes only the ‘color’ column from the DataFrame. You can deserialize and view it as shown below:

unpickled_colour = pd.read_pickle('colour_pickle.pkl')

unpickled_colour

Output:

0       red
1       red
2    yellow
3     green
4     black
Name: colour, dtype: object

Conclusion

You were familiar with the pandas.to_pickle() function in this tutorial and saw how to use it to serialize an object. You gained knowledge of the function’s syntax, as well as how to apply compression to the file and, finally, how to serialize just a portion of the complete Pandas DataFrame.

Reference

Pandas to_pickle Official Documentation