Pandas merge_asof – Perform a merge by key distance.

Featured Image

Today we implement a pandas function merge_asof function to perform a merge asof. It is similar to the left join except it matches the value to the nearest key rather than the equal key within a given certain tolerance limit. This is mostly implemented to compare cumulative data from two different data frames.

Syntax of Pandas merge_asof

pandas.merge_asof(left, right, on=None, left_on=None, right_on=None, left_index=False, right_index=False, by=None, left_by=None, right_by=None, suffixes=(‘_x’, ‘_y’), tolerance=None, allow_exact_matches=True, direction=’backward’, )

Parameters

ParameterDescription
leftfirst dataframe
rightsecond dataframe
left_oncolumn to join on in left dataframe
right_oncolumn to join on in right dataframe
left_indexindex of left dataframe used as the join key
right_indexindex of the right dataframe used as the join key
bycolumns to be matched for merging
left_bycolumns to match on in left dataframe
right_bycolumn to match in right dataframe
suffixesvariables are given to similar column name
toleranceasof tolerance within this range and must be compatible with the merge index.
allow_exact_matchesTrue – allows matching with the same ‘on’ value
False – doesn’t match the same ‘on’ value
DirectionInforms whether to search previous, subsequent, or nearest value
syntax parameters

Return Value: Merged dataframe

Also read: Pandas read_sql: Read SQL query/database table into a DataFrame

Examples of Pandas merge_asof

We would consider the below date-time series data frames.

import pandas as pd
df1=pd.DataFrame({
    'time': [pd.Timestamp('20-01-2023 06:57:00'),
            pd.Timestamp('20-01-2023 06:57:02'),
            pd.Timestamp('20-01-2023 06:57:04'),
            pd.Timestamp('20-01-2023 06:57:06'),],
    
    'price': [12,14,8,7]})
print("df1 \n")
print(df1)

Output:

Merge Asof Df1
df2=pd.DataFrame({
    'time': [pd.Timestamp('20-01-2023 06:57:01'),
            pd.Timestamp('20-01-2023 06:57:03'),
            pd.Timestamp('20-01-2023 06:57:05'),
            pd.Timestamp('20-01-2023 06:57:07'),],
    
    'price': [5,7,9,12]})
print("df2 \n")
print(df2)

Output:

Merge Asof Df2 pandas merge_asof()

Example 1: Basic usage of Pandas merge_asof

merge_df=pd.merge_asof(df1,df2,on='time')
print(merge_df,"\n")

We input the first two arguments as the input data frames and specify on=time the label of the column we want to merge on. The time columns have timestamps of data frame 1 as we set this data frame as the first argument. The price columns have suffixes _x and _y as both have a similar label.

Output:

               time  price_x  price_y
0 2023-01-20 06:57:00       12      NaN
1 2023-01-20 06:57:02       14      5.0
2 2023-01-20 06:57:04        8      7.0
3 2023-01-20 06:57:06        7      9.0

The interesting column is the column price_y. We have specified the value of price_x for the timestamp 2023-01-20 06:57:02 although there is no specific value for price_y thus merge_asof performs a backward search and thus it takes the value assigned to backward nearest timestamp, here it would be 2023-01-20 06:57:01.

We have not assigned any timestamp value before 2023-01-20 06:57:00 and thus price_y is NaN.

Examples 2: Reversing the order of input values

We’ll now pass df2 and df1 instead of going serially like we did above. The function now gives us a different out.

merge_df=pd.merge_asof(df2,df1,on='time')
print(merge_df,"\n")

We interchange the dataframe position in this example thus price_x will contain values from df 2

               time  price_x  price_y
0 2023-01-20 06:57:01        5       12
1 2023-01-20 06:57:03        7       14
2 2023-01-20 06:57:05        9        8
3 2023-01-20 06:57:07       12        7

Example 3: Setting direction = forward

The default value for the search direction is backward, in this example, we specify direction=forward and get the resulting output

merge_df=pd.merge_asof(df2,df1,on='time',direction='forward')
print(merge_df,"\n")

price_y displays the value from df2 and since there is no value for timestamp 2023-01-20 06:57:08 in df2 we get NaN as output. The algorithm for the search is the same ie to return the nearest timestamp value we have only changed the direction from backward to forward.

Output:

               time  price_x  price_y
0 2023-01-20 06:57:01        5     14.0
1 2023-01-20 06:57:03        7      8.0
2 2023-01-20 06:57:05        9      7.0
3 2023-01-20 06:57:07       12      NaN

Example 4: Setting direction = nearest

Value for price_y is different compared to when direction =forward. For 06:57:01 df 2 has a value of 5 and for df1 the nearest value would be 06:57:00 which is 12, similarly for 06:57:03 df2 has the nearest value at 06:57:02 which is 14, and so on.

merge_df=pd.merge_asof(df2,df1,on='time',direction='nearest')
print(merge_df,"\n")

Output:

                 time  price_x  price_y
0 2023-01-20 06:57:01        5       12
1 2023-01-20 06:57:03        7       14
2 2023-01-20 06:57:05        9        8
3 2023-01-20 06:57:07       12        7

Example 5: Using the allow_exact_matches parameter of Pandas merge_asof

If we don’t want to include exact matches we can use the allow_exact_matches parameter which accepts a boolean value.

merge_df=pd.merge_asof(df2,df1,on='time',allow_exact_matches=False)
print(merge_df,"\n")

Output:

             time  price_x  price_y
0 2023-01-20 06:57:01        5       12
1 2023-01-20 06:57:03        7       14
2 2023-01-20 06:57:05        9        8
3 2023-01-20 06:57:07       12        7

Modify dataframes to add category

Let’s modify our dataframes to contain a category parameter that contains the data. We’ll use this dataframe for our final example of the Pandas merge_asof() function.

df1['category']= ['A','A','B','C']
print(df1)
df2['category']= ['A','B','A','C']
print(df2)

Output :

                time  price category
0 2023-01-20 06:57:00     12        A
1 2023-01-20 06:57:02     14        A
2 2023-01-20 06:57:04      8        B
3 2023-01-20 06:57:06      7        C
                 time  price category
0 2023-01-20 06:57:01      5        A
1 2023-01-20 06:57:03      7        B
2 2023-01-20 06:57:05      9        A
3 2023-01-20 06:57:07     12        C

Example 6: Merging the modified data frames

merge_df=pd.merge_asof(df1,df2,on='time',by='category')
print(merge_df,"\n")

The search is directed backwards but the values should be in the same category
Eg for 06:57:06 we have df1 value 7 and df2 9 for 06:57:05 but since this value is not in the ‘C’ category it is shown NaN

                 time  price_x category  price_y
0 2023-01-20 06:57:00       12        A      NaN
1 2023-01-20 06:57:02       14        A      5.0
2 2023-01-20 06:57:04        8        B      7.0
3 2023-01-20 06:57:06        7        C      NaN

If we change the category of 06:57:05 df2 9 from A to C we get the following result.

                 time  price_x category  price_y
0 2023-01-20 06:57:00       12        A      NaN
1 2023-01-20 06:57:02       14        A      5.0
2 2023-01-20 06:57:04        8        B      7.0
3 2023-01-20 06:57:06        7        C      9.0

Conclusion

In this article, we have implemented merge_asof a pandas function to get output based on timestamps. We have manipulated the direction parameter to either backward, forward, or nearest to get the desired output.
Browse more articles at AskPython .

Also read: Pandas merge_ordered – Perform a merge for ordered data with optional filling/interpolation

Reference

https://pandas.pydata.org/pandas-docs/version/0.25.0/reference/api/pandas.merge_asof.html