Today we will discuss a pandas method, Pandas merge_ordered() used to merge ordered data. As pandas is a python package it provides a convenient data structure to process relational data similarly merge_orderd() offers multiple manipulations to get the expected result. The sorted result makes it easy to work on time series and ordered data.
Syntax of Pandas merge_ordered
pandas.merge_ordered(left, right, on=None, left_on=None, right_on=None, left_by=None, right_by=None, fill_method=None, suffixes=(‘_x’, ‘_y’), how=’outer’)
Parameters
Parameter | Description |
left | left dataframe |
right | right dataframe |
on | labels of the column to join on |
left_on | column name on the left to perform join |
right_on | column name on the right to perform join |
left_by | column name on the left to expand |
right_by | column name on the right to expand |
fill_method | to fill the non-existing data with either nan or previous non-nan value (ffill ) |
suffixes | Strings, used for distinguishing overlapping column labels. |
how | To specify which type of join to perform with default value = outer |
Return Value
Merged DataFrame
Examples of Pandas merge_ordered
We import the pandas library, define and print two DataFrames:- ‘ customer’ and ‘product’ containing the customer details which are the ID, name, age, and product details ie Price, brought by, and product name respectively.
import pandas as pd
df_customer =pd.DataFrame({ "ID" : [101,102,103,104] ,
"Name" : ["Jake","Ethan","Reece","Jacob"],
"Age" : [28,32,33,54]})
df_product= pd.DataFrame({ "Price" : [90000,22000,12000,35000],
"bought_by": ["Ethan","Reece", "Jake","Jacob"],#
"product": ["computer", "smartphone", "speakers","laptop"]}) #
print(df_customer)
print("\n")
print(df_product)
Output:
ID Name Age
0 101 Jake 28
1 102 Ethan 32
2 103 Reece 33
3 104 Jacob 54
Price bought_by product
0 90000 Ethan computer
1 22000 Reece smartphone
2 12000 Jake speakers
3 35000 Jacob laptop
Example 1: Basic implementation of Pandas merge_ordered
Below we perform an outer join on ‘customer’ and ‘ product’ data frames based on columns ‘bought_by and Name’.
merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer")
print(merge_df)

If we interchange the sequence of occurrence of the data frames we get a different output.
merge_df=pd.merge_ordered(df_customer,df_product,left_on="Name", right_on="bought_by", how="outer")
print(merge_df)

Since we mentioned the customer data frame before the product data frame the details of that data frame are displayed accordingly.
Example 2: Specifying left_by and right_by
By setting left_by="product"
we repeat each product item for every row and map NaN for non-existing data.
Note: The value of left_by
must be the column name of the first mentioned data frame ie the left data frame. Here we have df_product as our left data frame.
merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",left_by="product")
print(merge_df)

When we set right_by="Name"
we repeat each name for every row. Other values of that data frame column are left at none since they have no common data in the product data frame. Eg. Since Jake has bought only speakers the columns for computer, laptop, and smartphone are left, NaN.
merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",right_by="Name")
print(merge_df)

Example 3: Specifying fill_method.
We consider the following data frames:
ID Name Age
0 101 Jake 28
1 102 Ethan 32
2 103 Reece 33
3 104 Jacob 54
Price bought_by product
0 90000 Rob computer
1 22000 Reece smartphone
2 12000 Jake speakers
3 35000 Jacob laptop
The default value for fill_method
=None
thus resulting Nan
for any non-existing data
merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer")
print(merge_df)

In the customer data frame there is no data about the ‘Rob’ customer thus leaving it to none by default.
To fill those NaN
we set the fill_method=ffill
and they are replaced by previous non-nan values.
merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",fill_method="ffill")
print(merge_df)

Note: This is an illustration of the working of filling and practical use is not preferred.
Example 4: Specifying suffixes.
suffixes
is used to distinguish overlapping column labels between two data frames.
Consider the following data frames
ID Name Age
0 101 Jake 28
1 102 Ethan 32
2 103 Reece 33
3 104 Jacob 54
ID bought_by product
0 1001 Ethan computer
1 1062 Reece smartphone
2 1037 Jake headphones
3 1054 Jacob laptop
In suffixes=[_x,_y], _x is appended to the overlapping column in a left data frame and _y
to the right data frame. The default values suffixes
are _x,_y
merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer")
print(merge_df)

We can replace the default values by specifying our own suffixes. eg we replaced ID_x to ID_product .
merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",suffixes=["_product","_cust"])
print(merge_df)

We can also pass None
instead of a string.
merge_df=pd.merge_ordered(df_product,df_customer,left_on="bought_by", right_on="Name", how="outer",suffixes=[None,"_cust"])
print(merge_df)

Example 5: Specifying on.
To merge two ordered data frames with similar number of data we can make use of on
. Consider below data frames
ID Name Age
0 101 Jake 28
1 102 Ethan 32
2 103 Reece 33
3 104 Jacob 54
ID bought_by product
0 1001 Ethan computer
1 1062 Reece smartphone
2 1037 Jake headphones
3 1054 Jacob laptop
Below we set the value of on=ID to merge it via the ID column.
merge_df =pd.merge_ordered(df_customer,df_product,on="ID")
print(merge_df)

Conclusion
We implemented merge_ordered on two data frames and implemented various examples by manipulating multiple parameters to get desired output. Browse more articles at AskPython.