Hello Readers! In this tutorial, we are going to learn the different ways to combine DataFrames in Python.
What are DataFrames in Python?
In Python, DataFrames are the structured, two-dimensional Python objects that are used to store the data in the tabular format i.e. using rows and columns. To work with DataFrames, we need the pandas
Python module. We can create a Pandas DataFrame from the various Python objects such as list, dictionary, NumPy ndarray, another DataFrame, etc. using the pandas.DataFrame()
function. Following is the command to install the pandas
Python module:
C:\Users\Guest> pip install pandas
Let’s create two pandas
DataFrames which we will be using in our further discussions. Python code to create pandas DataFrames.
# Import pandas Python module
import pandas as pd
# Create two datasets using dictionary of list
data1 = {"name": ['Sanjay Kumar', 'Shreya Mohan', 'Abhishek Kumar', 'Sameer Singh', 'Sumit Kumar'],
"roll_no": [101, 102, 103, 104, 105]}
data2 = {"state": ['Bihar', 'Jharkhand', 'Maharashtra', 'Haryana', 'Punjab'],
"City": ['Nalanda', 'Deoghar', 'Pune', 'Kunjpura', 'Jalandhar']}
# Create DataFrame-1
df1 = pd.DataFrame(data1)
print("This is DataFrame-1:")
print(df1)
# Create DataFrame-2
df2 = pd.DataFrame(data2)
print("This is DataFrame-2:")
print(df2)
Output:

Methods to Combine DataFrames in Python
The process of combining the two or more DataFrames along either axis is one of the core data preprocessing techniques used in data analysis. A Data Scientist or Data Analyst has to combine the data present in the form of pandas DataFrames frequently using different methods. It becomes a very crucial step to perform during data analysis when the different data is being collected from multiple sources and are in different formats. As we have created our two pandas DataFrames, let’s start discussing the different methods to combine DataFrames in Python one by one.
Method 1: Using concat() function
In Python, the concat()
function is defined in the pandas module and is used to combine two or more pandas DataFrames along the specified axis. Axis = 0 means vertical axis and axis = 1 means horizontal axis.
By concatenating them the function returns a new DataFrame object. It can be used to combine either rows or columns of one DataFrame to another DataFrame. Let’s write the Python code to implement the concat()
function on pandas DataFrames.
# Combine the DataFrame-1 & DataFrame-2
# along horizontal axis using concat() function
df = pd.concat([df1, df2], axis = 1)
print("The resultant DataFrame:")
print(df)
Output:

Method 2: Using append() function
In Python, the append()
function is also used to combine two or more pandas DataFrames by appending them along either axis (horizontal or vertical axis). Let’s implement the append()
function on pandas DataFrames through the Python code.
# Create a new DataFrame-3
df3 = pd.DataFrame({"name": ['Ravi', 'Shantanu', 'Shiv'],
"roll_no": [106, 107, 108],
"state": ['Bihar', 'UP', 'Bihar'],
"City": ['Muzaffarpur', 'Agra', 'Bakhtiarpur']},
index = [5, 6, 7])
print("This is DataFrame-3:")
print(df3)
# Combine this newly created DataFrame-3
# to the existing DataFrame along vertical axis
# using append() function
df = df.append(df3)
print("The resultant DataFrame:")
print(df)
Output:

Method 3: Using merge() function
In Python, the pandas module provides the merge()
function to combine Dataframes in Python by merging them using the database-style joins. By default, it uses an “inner join” operation to merge the pandas Dataframes.
It can be used to combine Dataframes depending on the column name or index but the passed column name or index level must be present in both the DataFrame. Let’s understand the Python code to implement the merge()
function on pandas DataFrames.
# Create a new DataFrame-4
df4 = pd.DataFrame({"roll_no": [101, 102, 103, 104, 105, 106, 107, 108],
"cgpa": [8.15, 8.18, 9.41, 8.56, 7.67, 9.36, 9.52, 7.35]})
print("This is DataFrame-4:")
print(df4)
# Combine this newly created DataFrame-3
# to the existing DataFrame along horizontal axis
# using merge() function
df = pd.merge(df, df4, on = "roll_no")
print("The resultant DataFrame:")
print(df)
Output:

Method 4: Using join() function
In Python, the pandas module provides the join()
function which can efficiently combine two or more pandas DataFrames by joining them either on a specified column or index level. By default, it joins the pandas Dataframe objects by the index level. Let’s see the Python code to implement the join()
function on pandas DataFrames.
# Create a new DataFrame-5
df5 = pd.DataFrame({"branch": ['ECE', 'ECE', 'CSE', 'EE', 'ICE', 'ME', 'TT', 'CHE'],
"year": [3, 3, 2, 1, 1, 4, 2, 3]})
print("This is DataFrame-5:")
print(df5)
# Combine this newly created DataFrame-3
# to the existing DataFrame along horizontal axis
# using join() function
df = df.join(df5)
print("The resultant DataFrame:")
print(df)
Output:

Conclusion
In this tutorial, we have learned the following things:
- What’s a DataFrame object in Python
- Importance of combining pandas DataFrames
- Different methods to combine pandas Dataframes