How to combine DataFrames in Python?

Combine Dataframes In Python

Hello Readers! In this tutorial, we are going to learn the different ways to combine DataFrames in Python.


What are DataFrames in Python?

In Python, DataFrames are the structured, two-dimensional Python objects that are used to store the data in the tabular format i.e. using rows and columns. To work with DataFrames, we need the pandas Python module. We can create a Pandas DataFrame from the various Python objects such as list, dictionary, NumPy ndarray, another DataFrame, etc. using the pandas.DataFrame() function. Following is the command to install the pandas Python module:

C:\Users\Guest> pip install pandas

Let’s create two pandas DataFrames which we will be using in our further discussions. Python code to create pandas DataFrames.

# Import pandas Python module
import pandas as pd 

# Create two datasets using dictionary of list
data1 = {"name": ['Sanjay Kumar', 'Shreya Mohan', 'Abhishek Kumar', 'Sameer Singh', 'Sumit Kumar'],
        "roll_no": [101, 102, 103, 104, 105]}

data2 = {"state": ['Bihar', 'Jharkhand', 'Maharashtra', 'Haryana', 'Punjab'],
        "City": ['Nalanda', 'Deoghar', 'Pune', 'Kunjpura', 'Jalandhar']}

# Create DataFrame-1
df1 = pd.DataFrame(data1)
print("This is DataFrame-1:")
print(df1)

# Create DataFrame-2
df2 = pd.DataFrame(data2)
print("This is DataFrame-2:")
print(df2)

Output:

Dataframe Creation

Methods to Combine DataFrames in Python

The process of combining the two or more DataFrames along either axis is one of the core data preprocessing techniques used in data analysis. A Data Scientist or Data Analyst has to combine the data present in the form of pandas DataFrames frequently using different methods. It becomes a very crucial step to perform during data analysis when the different data is being collected from multiple sources and are in different formats. As we have created our two pandas DataFrames, let’s start discussing the different methods to combine DataFrames in Python one by one.

Method 1: Using concat() function

In Python, the concat() function is defined in the pandas module and is used to combine two or more pandas DataFrames along the specified axis. Axis = 0 means vertical axis and axis = 1 means horizontal axis.

By concatenating them the function returns a new DataFrame object. It can be used to combine either rows or columns of one DataFrame to another DataFrame. Let’s write the Python code to implement the concat() function on pandas DataFrames.

# Combine the DataFrame-1 & DataFrame-2
# along horizontal axis using concat() function
df = pd.concat([df1, df2], axis = 1)
print("The resultant DataFrame:")
print(df)

Output:

Dataframe Concatenate

Method 2: Using append() function

In Python, the append() function is also used to combine two or more pandas DataFrames by appending them along either axis (horizontal or vertical axis). Let’s implement the append() function on pandas DataFrames through the Python code.

# Create a new DataFrame-3
df3 = pd.DataFrame({"name": ['Ravi', 'Shantanu', 'Shiv'],
                    "roll_no": [106, 107, 108],
                    "state": ['Bihar', 'UP', 'Bihar'],
                    "City": ['Muzaffarpur', 'Agra', 'Bakhtiarpur']},
                    index = [5, 6, 7])
print("This is DataFrame-3:")
print(df3)

# Combine this newly created DataFrame-3
# to the existing DataFrame along vertical axis
# using append() function
df = df.append(df3)
print("The resultant DataFrame:")
print(df)

Output:

Dataframe Append

Method 3: Using merge() function

In Python, the pandas module provides the merge() function to combine Dataframes in Python by merging them using the database-style joins. By default, it uses an “inner join” operation to merge the pandas Dataframes.

It can be used to combine Dataframes depending on the column name or index but the passed column name or index level must be present in both the DataFrame. Let’s understand the Python code to implement the merge() function on pandas DataFrames.

# Create a new DataFrame-4
df4 = pd.DataFrame({"roll_no": [101, 102, 103, 104, 105, 106, 107, 108],
                    "cgpa": [8.15, 8.18, 9.41, 8.56, 7.67, 9.36, 9.52, 7.35]})
print("This is DataFrame-4:")
print(df4)

# Combine this newly created DataFrame-3
# to the existing DataFrame along horizontal axis
# using merge() function
df = pd.merge(df, df4, on = "roll_no")
print("The resultant DataFrame:")
print(df)

Output:

Dataframe Merge

Method 4: Using join() function

In Python, the pandas module provides the join() function which can efficiently combine two or more pandas DataFrames by joining them either on a specified column or index level. By default, it joins the pandas Dataframe objects by the index level. Let’s see the Python code to implement the join() function on pandas DataFrames.

# Create a new DataFrame-5
df5 = pd.DataFrame({"branch": ['ECE', 'ECE', 'CSE', 'EE', 'ICE', 'ME', 'TT', 'CHE'],
                    "year": [3, 3, 2, 1, 1, 4, 2, 3]})
print("This is DataFrame-5:")
print(df5)

# Combine this newly created DataFrame-3
# to the existing DataFrame along horizontal axis
# using join() function
df = df.join(df5)
print("The resultant DataFrame:")
print(df)

Output:

Dataframe Join

Conclusion

In this tutorial, we have learned the following things:

  • What’s a DataFrame object in Python
  • Importance of combining pandas DataFrames
  • Different methods to combine pandas Dataframes