Pandas to_csv(): Write an object to a CSV File

Pandas To Csv Cover Image

Pandas is a popular Python library used for working with data. It provides methods that are suitable for cleaning, analyzing and manipulating data sets. 

Pandas has a data structure called a data frame which represents data in the form of rows and columns or tables. It is fast and efficient to use data frames in terms of computation.

Pandas enables you to read the data from various file formats like JSON, CSV, HTML, etc. Just like we can read data from any format into a data frame, we can also export data frames into our desired formats like CSV, JSON, HTML, etc. 

In this article, you will learn about the pandas.to_csv() function which lets you pass a file object to save it as a Comma Separated Value, i.e. CSV file.

Also read: How to Read CSV with Headers Using Pandas?


Prerequisites of to_csv()

You need to have Python and Pandas installed on your computer and your favourite IDE set up to start coding. 

If you don’t have Pandas installed, you can install it using the command:

pip install pandas

If you are using the Anaconda distribution use the command:

conda install pandas

Syntax of pandas.to_csv()

DataFrame.to_csv(path_or_buf=None, 
sep=',', na_rep='', 
float_format=None, 
columns=None, 
header=True, 
index=True,
index_label=None,
mode='w', 
encoding=None, 
compression='infer', 
quoting=None, 
quotechar='"', 
lineterminator=None, 
chunksize=None, 
date_format=None, 
doublequote=True, 
escapechar=None, 
decimal='.', 
errors='strict', 
storage_options=None)

Some of the important parameters are:

ParameterDescription
path_or_bufFile path or object. If None, the result is returned as a string
sepDefault ‘,’. The delimiter for the output file. It should be a string of length 1.
columnsColumns that are to be included in the output file.
headerDefault True. Takes boolean values or a list of strings. If True, the header names are included else not. If a list of strings is given, it is used as an alias for column names and its length should be the same as the number of columns desired in the output file.
indexDefault True. Takes boolean values. If True, writes index in the output else no index is included in the output CSV file.

Exporting a data frame as a CSV File

Let us first create a data frame.

import pandas as pd

# creating a data frame
data = {
    'ID': [3, 1, 4],
    'Name': ['John', 'Mary', 'Sherlock'],
    'Designation': ['Salesperson', 'CEO', 'Software Developer'],
    'Salary': [30000, 80000, 55000]
}

df = pd.DataFrame(data)

df
To Csv Df

Now, you can export this data frame as a Comma Separated File by using the pandas.to_csv() as shown below.

ap_csv_data = df.to_csv()
print('CSV String:\n', ap_csv_data)

Output:

CSV String:
 ,ID,Name,Designation,Salary
0,3,John,Salesperson,30000
1,1,Mary,CEO,80000
2,4,Sherlock,Software Developer,55000

To save the file, provide a name for it as a parameter to the function.

df.to_csv('ap_csv_file.csv')

Output:

Ap Csv File

Specifying delimiters for the output

You can provide any string of length 1 as a delimiter for the output to the to_csv() function using the parameter ‘sep’. The default delimiter is comma ‘,’.

In the below example, the symbol backslash i.e. ‘/’ is used as a delimiter.

csv_delimited_data = df.to_csv(sep='/')
print(csv_delimited_data)

Output:

/ID/Name/Designation/Salary
0/3/John/Salesperson/30000
1/1/Mary/CEO/80000
2/4/Sherlock/Software Developer/55000

Selecting only specific columns for the CSV output

Using the ‘columns’ parameter, you can provide a list of column headers that you want to include in your CSV output. By default, all the columns in the data frame are included in the output.

The below code shows that only the columns ‘Name’ and ‘Designation’ are included in the CSV output.

csv_specified_cols = df.to_csv(columns=['Name', 'Designation'])
print(csv_specified_cols)

Output:

,Name,Designation
0,John,Salesperson
1,Mary,CEO
2,Sherlock,Software Developer

Specifying headers for the CSV output

You can specify if you want the headers of the columns to be there in the CSV output or not. You can also change the existing column headers/names to new headers by providing them as a list of strings to the function using the ‘header’ parameter.

Removing headers:

csv_header = df.to_csv(header=False)
print(csv_header)

Output:

0,3,John,Salesperson,30000
1,1,Mary,CEO,80000
2,4,Sherlock,Software Developer,55000

Providing new headers to the output CSV:

csv_header_list = df.to_csv(header=['New ID', 'New Name', 
'New Designation', 'New Salary'])
print(csv_header_list)

Output:

,New ID,New Name,New Designation,New Salary
0,3,John,Salesperson,30000
1,1,Mary,CEO,80000
2,4,Sherlock,Software Developer,55000

The above output shows the new headers specified in the list.


Specifying index for the CSV Output

Just like the ‘header’ parameter, the ‘index’ parameter lets you specify if you want to include the index in the output or not. You can also specify a custom index for the output using the ‘index_label’ parameter as shown below.

Removing the index:

csv_index = df.to_csv(index=False)
print(csv_index)

Output:

ID,Name,Designation,Salary
3,John,Salesperson,30000
1,Mary,CEO,80000
4,Sherlock,Software Developer,55000

Specifying custom index:

csv_custom_index = df.to_csv(index_label='Sr. No.')
print(csv_custom_index)

Output:

Sr. No.,ID,Name,Designation,Salary
0,3,John,Salesperson,30000
1,1,Mary,CEO,80000
2,4,Sherlock,Software Developer,55000

The above output shows the new index as ‘Sr. No.’ as specified in the ‘index_label’ parameter in the pandas.to_csv() function.


Conclusion

The Pandas to_csv() function lets you convert any file object or a data frame into the Comma Separated Values format. It also provides you with many options to customize your CSV output based on your requirements by using the different parameters discussed earlier in this article.


Reference