Pandas merge_ordered – Perform a merge for ordered data with optional filling/interpolation

pandas merge_ordered

Today we will discuss a pandas method, Pandas merge_ordered() used to merge ordered data. As pandas is a python package it provides a convenient data structure to process relational data similarly merge_orderd() offers multiple manipulations to get the expected result. The sorted result makes it easy to work on time series and ordered data.

Syntax of Pandas merge_ordered

pandas.merge_ordered(left, right, on=None, left_on=None, right_on=None, left_by=None, right_by=None, fill_method=None, suffixes=(‘_x’, ‘_y’), how=’outer’)

Parameters

ParameterDescription
leftleft dataframe
rightright dataframe
onlabels of the column to join on
left_oncolumn name on the left to perform join
right_oncolumn name on the right to perform join
left_bycolumn name on the left to expand
right_bycolumn name on the right to expand
fill_methodto fill the non-existing data with either nan or previous non-nan value (ffill)
suffixesStrings, used for distinguishing overlapping column labels.
howTo specify which type of join to perform with default value = outer
syntax parameters

Return Value

Merged DataFrame

Examples of Pandas merge_ordered

We import the pandas library, define and print two DataFrames:- ‘ customer’ and ‘product’ containing the customer details which are the ID, name, age, and product details ie Price, brought by, and product name respectively.

import pandas as pd

df_customer =pd.DataFrame({ "ID" : [101,102,103,104] ,
                            "Name" : ["Jake","Ethan","Reece","Jacob"],
                            "Age" : [28,32,33,54]})

df_product= pd.DataFrame({  "Price" : [90000,22000,12000,35000],
                            "bought_by": ["Ethan","Reece", "Jake","Jacob"],#
                            "product": ["computer", "smartphone", "speakers","laptop"]}) #

print(df_customer)
print("\n")      
print(df_product)

Output:

    ID   Name  Age
0  101   Jake   28
1  102  Ethan   32
2  103  Reece   33
3  104  Jacob   54


   Price bought_by     product
0  90000     Ethan    computer
1  22000     Reece  smartphone
2  12000      Jake    speakers
3  35000     Jacob      laptop

Example 1: Basic implementation of Pandas merge_ordered

Below we perform an outer join on ‘customer’ and ‘ product’ data frames based on columns ‘bought_by and Name’.

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer")
print(merge_df)
Basic Usage 1

If we interchange the sequence of occurrence of the data frames we get a different output.

merge_df=pd.merge_ordered(df_customer,df_product,left_on="Name", right_on="bought_by", how="outer")
print(merge_df)
Basic Usage 2

Since we mentioned the customer data frame before the product data frame the details of that data frame are displayed accordingly.

Example 2: Specifying left_by and right_by

By setting left_by="product" we repeat each product item for every row and map NaN for non-existing data.

Note: The value of left_by must be the column name of the first mentioned data frame ie the left data frame. Here we have df_product as our left data frame.

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",left_by="product")                
print(merge_df) 
Left By

When we set right_by="Name" we repeat each name for every row. Other values of that data frame column are left at none since they have no common data in the product data frame. Eg. Since Jake has bought only speakers the columns for computer, laptop, and smartphone are left, NaN.

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",right_by="Name")                
print(merge_df) 
Right By

Example 3: Specifying fill_method.

We consider the following data frames:

                      
    ID   Name  Age
0  101   Jake   28
1  102  Ethan   32
2  103  Reece   33
3  104  Jacob   54

   Price bought_by     product
0  90000       Rob    computer
1  22000     Reece  smartphone
2  12000      Jake    speakers
3  35000     Jacob      laptop

The default value for fill_method =None thus resulting Nan for any non-existing data

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer")
print(merge_df)
Fill Method Default

In the customer data frame there is no data about the ‘Rob’ customer thus leaving it to none by default.

To fill those NaN we set the fill_method=ffill and they are replaced by previous non-nan values.

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",fill_method="ffill")
print(merge_df)
Fill Method Ffill

Note: This is an illustration of the working of filling and practical use is not preferred.

Example 4: Specifying suffixes.

suffixes is used to distinguish overlapping column labels between two data frames.

Consider the following data frames

    ID   Name  Age
0  101   Jake   28
1  102  Ethan   32
2  103  Reece   33
3  104  Jacob   54

     ID bought_by     product
0  1001     Ethan    computer
1  1062     Reece  smartphone
2  1037      Jake  headphones
3  1054     Jacob      laptop

In suffixes=[_x,_y], _x is appended to the overlapping column in a left data frame and _y to the right data frame. The default values suffixes are _x,_y

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer")
print(merge_df)
Suffix Default

We can replace the default values by specifying our own suffixes. eg we replaced ID_x to ID_product .

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",suffixes=["_product","_cust"])
print(merge_df)
Suffix Value

We can also pass None instead of a string.

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",suffixes=[None,"_cust"])
print(merge_df)
Suffix None

Example 5: Specifying on.

To merge two ordered data frames with similar number of data we can make use of on. Consider below data frames

    ID   Name  Age
0  101   Jake   28
1  102  Ethan   32
2  103  Reece   33
3  104  Jacob   54

     ID bought_by     product
0  1001     Ethan    computer
1  1062     Reece  smartphone
2  1037      Jake  headphones
3  1054     Jacob      laptop

Below we set the value of on=ID to merge it via the ID column.

merge_df =pd.merge_ordered(df_customer,df_product,on="ID")  
print(merge_df)
On ID

Conclusion

We implemented merge_ordered on two data frames and implemented various examples by manipulating multiple parameters to get desired output. Browse more articles at AskPython.

Reference

Official documentation