Pandas merge_ordered - Perform a merge for ordered data with optional filling/interpolation

Today we will discuss a pandas method, Pandas merge_ordered() used to merge ordered data. As pandas is a python package it provides a convenient data structure to process relational data similarly merge_orderd() offers multiple manipulations to get the expected result. The sorted result makes it easy to work on time series and ordered data.

Syntax of Pandas merge_ordered

pandas.merge_ordered(left, right, on=None, left_on=None, right_on=None, left_by=None, right_by=None, fill_method=None, suffixes=(‘_x’, ‘_y’), how=’outer’)

Parameters

Parameter	Description
left	left dataframe
right	right dataframe
on	labels of the column to join on
left_on	column name on the left to perform join
right_on	column name on the right to perform join
left_by	column name on the left to expand
right_by	column name on the right to expand
fill_method	to fill the non-existing data with either `nan` or previous `non-nan` value (`ffill`)
suffixes	Strings, used for distinguishing overlapping column labels.
how	To specify which type of join to perform with default value = `outer`

syntax parameters

Return Value

Merged DataFrame

Examples of Pandas merge_ordered

We import the pandas library, define and print two DataFrames:- ‘ customer’ and ‘product’ containing the customer details which are the ID, name, age, and product details ie Price, brought by, and product name respectively.

import pandas as pd

df_customer =pd.DataFrame({ "ID" : [101,102,103,104] ,
                            "Name" : ["Jake","Ethan","Reece","Jacob"],
                            "Age" : [28,32,33,54]})

df_product= pd.DataFrame({  "Price" : [90000,22000,12000,35000],
                            "bought_by": ["Ethan","Reece", "Jake","Jacob"],#
                            "product": ["computer", "smartphone", "speakers","laptop"]}) #

print(df_customer)
print("\n")      
print(df_product)

Output:

    ID   Name  Age
0  101   Jake   28
1  102  Ethan   32
2  103  Reece   33
3  104  Jacob   54


   Price bought_by     product
0  90000     Ethan    computer
1  22000     Reece  smartphone
2  12000      Jake    speakers
3  35000     Jacob      laptop

Example 1: Basic implementation of Pandas merge_ordered

Below we perform an outer join on ‘customer’ and ‘ product’ data frames based on columns ‘bought_by and Name’.

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer")
print(merge_df)

If we interchange the sequence of occurrence of the data frames we get a different output.

merge_df=pd.merge_ordered(df_customer,df_product,left_on="Name", right_on="bought_by", how="outer")
print(merge_df)

Since we mentioned the customer data frame before the product data frame the details of that data frame are displayed accordingly.

Example 2: Specifying left_by and right_by

By setting left_by="product" we repeat each product item for every row and map NaN for non-existing data.

Note: The value of left_by must be the column name of the first mentioned data frame ie the left data frame. Here we have df_product as our left data frame.

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",left_by="product")                
print(merge_df)

When we set right_by="Name" we repeat each name for every row. Other values of that data frame column are left at none since they have no common data in the product data frame. Eg. Since Jake has bought only speakers the columns for computer, laptop, and smartphone are left, NaN.

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",right_by="Name")                
print(merge_df)

Example 3: Specifying fill_method.

We consider the following data frames:

                      
    ID   Name  Age
0  101   Jake   28
1  102  Ethan   32
2  103  Reece   33
3  104  Jacob   54

   Price bought_by     product
0  90000       Rob    computer
1  22000     Reece  smartphone
2  12000      Jake    speakers
3  35000     Jacob      laptop

The default value for fill_method =None thus resulting Nan for any non-existing data

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer")
print(merge_df)

In the customer data frame there is no data about the ‘Rob’ customer thus leaving it to none by default.

To fill those NaN we set the fill_method=ffill and they are replaced by previous non-nan values.

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",fill_method="ffill")
print(merge_df)

Note: This is an illustration of the working of filling and practical use is not preferred.

Example 4: Specifying suffixes.

suffixes is used to distinguish overlapping column labels between two data frames.

Consider the following data frames

    ID   Name  Age
0  101   Jake   28
1  102  Ethan   32
2  103  Reece   33
3  104  Jacob   54

     ID bought_by     product
0  1001     Ethan    computer
1  1062     Reece  smartphone
2  1037      Jake  headphones
3  1054     Jacob      laptop

In suffixes=[_x,_y], _x is appended to the overlapping column in a left data frame and _y to the right data frame. The default values suffixes are _x,_y

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer")
print(merge_df)

We can replace the default values by specifying our own suffixes. eg we replaced ID_x to ID_product .

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",suffixes=["_product","_cust"])
print(merge_df)

We can also pass None instead of a string.

merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",suffixes=[None,"_cust"])
print(merge_df)

Example 5: Specifying on.

To merge two ordered data frames with similar number of data we can make use of on. Consider below data frames

    ID   Name  Age
0  101   Jake   28
1  102  Ethan   32
2  103  Reece   33
3  104  Jacob   54

     ID bought_by     product
0  1001     Ethan    computer
1  1062     Reece  smartphone
2  1037      Jake  headphones
3  1054     Jacob      laptop

Below we set the value of on=ID to merge it via the ID column.

merge_df =pd.merge_ordered(df_customer,df_product,on="ID")  
print(merge_df)

Conclusion

We implemented merge_ordered on two data frames and implemented various examples by manipulating multiple parameters to get desired output. Browse more articles at AskPython.