Often while working with CSV files, we need to deal with large datasets. Depending on the requirements of the data analysis, we may find that all the required data is not present in a single CSV file. Then the need arises to merge multiple files to get the desired data. However, copy-pasting the required columns from one file to another and that too from large datasets is not the best way to around it.
To solve this problem, we will learn how to use the append
, merge
and concat
methods from Pandas to combine CSV files.
Combining Multiple CSV Files together
To begin with, let’s create sample CSV files that we will be using.



Notice that, all three files have the same columns or headers i.e. ‘name’, ‘age’ and ‘score’. Also, file 1 and file 3 have a common entry for the ‘name’ column which is Sam, but the rest of the values are different in these files.
Note that, in the below examples we are considering that all the CSV files are in the same folder as your Python code file. If this is not the case for you, please specify the paths accordingly while trying out the examples by yourself.
All the examples were executed in a Jupyter notebook.
Different Ways to Combine CSV Files in Python
Before starting, we will be creating a list of the CSV files that will be used in the examples below as follows:
import glob
# list all csv files only
csv_files = glob.glob('*.{}'.format('csv'))
csv_files
Output:
['csv_file_1.csv', 'csv_file_2.csv', 'csv_file_3.csv']
Method 1: append()
Let’s look at the append method here to merge the three CSV files.
import pandas as pd
df_csv_append = pd.DataFrame()
# append the CSV files
for file in csv_files:
df = pd.read_csv(file)
df_csv_append = df_csv_append.append(df, ignore_index=True)
df_csv_append
Output:

The append
method, as the name suggests, appends each file’s data frame to the end of the previous one. In the above code, we first create a data frame to store the result named df_csv_append. Then, we iterate through the list and read each CSV file and append it to the data frame df_csv_append.
Method 2: concat()
Another method used to combine CSV files is the Pandas concat()
method. This method requires a series of objects as a parameter, hence we first create a series of the data frame objects of each CSV file and then apply the concat()
method to it.
import pandas as pd
df_csv_concat = pd.concat([pd.read_csv(file) for file in csv_files ], ignore_index=True)
df_csv_concat
An easier-to-understand way of writing this code is:
l = []
for f in csv_files:
l.append(pd.read_csv(f))
df_res = pd.concat(l, ignore_index=True)
df_res
Both the above codes when executed produce the same output as shown below.

Notice that the resulting data frame is the same as that of the append()
method.
Method 3: merge()
The merge
method is used to join very large data frames. A join can be performed on two data frames at a time. We can specify the key based on which the join is to be performed.
It is a good practice to choose a key that is unique for each entry in the data frame, in order to avoid duplication of rows. We can also specify the type of join we wish to perform i.e. either of ‘inner’, ‘outer’, ‘left’, ‘right’ or ‘cross’ join.
We need to first read each CSV file into a separate data frame.
import pandas as pd
df1 = pd.read_csv('csv_file_1.csv')
df2 = pd.read_csv('csv_file_2.csv')
df3 = pd.read_csv('csv_file_3.csv')



Joining df1 and df2:
df_merged = df1.merge(df2, how='outer')
df_merged
Output:

Joining df1 and df3 based on the key ‘name’.
df_merged = df1.merge(df3, on="name", how='outer')
df_merged
Output:

df1 and df3, both have an entry for the name ‘Sam’ and the age and score values for both of them are different. Hence, in the resulting data frame, there are columns for representing the entries from both df1 and df3. Since John and Bob are not common in the data frames df1 and df3, their values are NaN wherever applicable.
Conclusion
In this article, we learned about the Pandas methods namely concat, merge and append and how to use them to combine CSV files using Python.
References
- Pandas append official documentation
- Pandas concat official documentation
- Pandas merge official documentation