Today we implement a pandas function merge_asof
function to perform a merge asof. It is similar to the left join except it matches the value to the nearest key rather than the equal key within a given certain tolerance limit. This is mostly implemented to compare cumulative data from two different data frames.
Syntax of Pandas merge_asof
pandas.merge_asof(left, right, on=None, left_on=None, right_on=None, left_index=False, right_index=False, by=None, left_by=None, right_by=None, suffixes=(‘_x’, ‘_y’), tolerance=None, allow_exact_matches=True, direction=’backward’, )
Parameters
Parameter | Description |
left | first dataframe |
right | second dataframe |
left_on | column to join on in left dataframe |
right_on | column to join on in right dataframe |
left_index | index of left dataframe used as the join key |
right_index | index of the right dataframe used as the join key |
by | columns to be matched for merging |
left_by | columns to match on in left dataframe |
right_by | column to match in right dataframe |
suffixes | variables are given to similar column name |
tolerance | asof tolerance within this range and must be compatible with the merge index. |
allow_exact_matches | True – allows matching with the same ‘on’ value False – doesn’t match the same ‘on’ value |
Direction | Informs whether to search previous, subsequent, or nearest value |
Return Value: Merged dataframe
Also read: Pandas read_sql: Read SQL query/database table into a DataFrame
Examples of Pandas merge_asof
We would consider the below date-time series data frames.
import pandas as pd
df1=pd.DataFrame({
'time': [pd.Timestamp('20-01-2023 06:57:00'),
pd.Timestamp('20-01-2023 06:57:02'),
pd.Timestamp('20-01-2023 06:57:04'),
pd.Timestamp('20-01-2023 06:57:06'),],
'price': [12,14,8,7]})
print("df1 \n")
print(df1)
Output:

df2=pd.DataFrame({
'time': [pd.Timestamp('20-01-2023 06:57:01'),
pd.Timestamp('20-01-2023 06:57:03'),
pd.Timestamp('20-01-2023 06:57:05'),
pd.Timestamp('20-01-2023 06:57:07'),],
'price': [5,7,9,12]})
print("df2 \n")
print(df2)
Output:

Example 1: Basic usage of Pandas merge_asof
merge_df=pd.merge_asof(df1,df2,on='time')
print(merge_df,"\n")
We input the first two arguments as the input data frames and specify on=time the label of the column we want to merge on. The time columns have timestamps of data frame 1 as we set this data frame as the first argument. The price columns have suffixes _x and _y
as both have a similar label.
Output:
time price_x price_y
0 2023-01-20 06:57:00 12 NaN
1 2023-01-20 06:57:02 14 5.0
2 2023-01-20 06:57:04 8 7.0
3 2023-01-20 06:57:06 7 9.0
The interesting column is the column price_y. We have specified the value of price_x for the timestamp 2023-01-20 06:57:02 although there is no specific value for price_y thus merge_asof
performs a backward search and thus it takes the value assigned to backward nearest timestamp, here it would be 2023-01-20 06:57:01.
We have not assigned any timestamp value before 2023-01-20 06:57:00 and thus price_y is NaN.
Examples 2: Reversing the order of input values
We’ll now pass df2 and df1 instead of going serially like we did above. The function now gives us a different out.
merge_df=pd.merge_asof(df2,df1,on='time')
print(merge_df,"\n")
We interchange the dataframe position in this example thus price_x will contain values from df 2
time price_x price_y
0 2023-01-20 06:57:01 5 12
1 2023-01-20 06:57:03 7 14
2 2023-01-20 06:57:05 9 8
3 2023-01-20 06:57:07 12 7
Example 3: Setting direction = forward
The default value for the search direction is backward, in this example, we specify direction=forward
and get the resulting output
merge_df=pd.merge_asof(df2,df1,on='time',direction='forward')
print(merge_df,"\n")
price_y displays the value from df2 and since there is no value for timestamp 2023-01-20 06:57:08 in df2 we get NaN as output. The algorithm for the search is the same ie to return the nearest timestamp value we have only changed the direction from backward to forward.
Output:
time price_x price_y
0 2023-01-20 06:57:01 5 14.0
1 2023-01-20 06:57:03 7 8.0
2 2023-01-20 06:57:05 9 7.0
3 2023-01-20 06:57:07 12 NaN
Example 4: Setting direction = nearest
Value for price_y is different compared to when direction =forward. For 06:57:01 df 2 has a value of 5 and for df1 the nearest value would be 06:57:00 which is 12, similarly for 06:57:03 df2 has the nearest value at 06:57:02 which is 14, and so on.
merge_df=pd.merge_asof(df2,df1,on='time',direction='nearest')
print(merge_df,"\n")
Output:
time price_x price_y
0 2023-01-20 06:57:01 5 12
1 2023-01-20 06:57:03 7 14
2 2023-01-20 06:57:05 9 8
3 2023-01-20 06:57:07 12 7
Example 5: Using the allow_exact_matches parameter of Pandas merge_asof
If we don’t want to include exact matches we can use the allow_exact_matches parameter which accepts a boolean value.
merge_df=pd.merge_asof(df2,df1,on='time',allow_exact_matches=False)
print(merge_df,"\n")
Output:
time price_x price_y
0 2023-01-20 06:57:01 5 12
1 2023-01-20 06:57:03 7 14
2 2023-01-20 06:57:05 9 8
3 2023-01-20 06:57:07 12 7
Modify dataframes to add category
Let’s modify our dataframes to contain a category parameter that contains the data. We’ll use this dataframe for our final example of the Pandas merge_asof() function.
df1['category']= ['A','A','B','C']
print(df1)
df2['category']= ['A','B','A','C']
print(df2)
Output :
time price category
0 2023-01-20 06:57:00 12 A
1 2023-01-20 06:57:02 14 A
2 2023-01-20 06:57:04 8 B
3 2023-01-20 06:57:06 7 C
time price category
0 2023-01-20 06:57:01 5 A
1 2023-01-20 06:57:03 7 B
2 2023-01-20 06:57:05 9 A
3 2023-01-20 06:57:07 12 C
Example 6: Merging the modified data frames
merge_df=pd.merge_asof(df1,df2,on='time',by='category')
print(merge_df,"\n")
The search is directed backwards but the values should be in the same category
Eg for 06:57:06 we have df1 value 7 and df2 9 for 06:57:05 but since this value is not in the ‘C’ category it is shown NaN
time price_x category price_y
0 2023-01-20 06:57:00 12 A NaN
1 2023-01-20 06:57:02 14 A 5.0
2 2023-01-20 06:57:04 8 B 7.0
3 2023-01-20 06:57:06 7 C NaN
If we change the category of 06:57:05 df2 9 from A to C we get the following result.
time price_x category price_y
0 2023-01-20 06:57:00 12 A NaN
1 2023-01-20 06:57:02 14 A 5.0
2 2023-01-20 06:57:04 8 B 7.0
3 2023-01-20 06:57:06 7 C 9.0
Conclusion
In this article, we have implemented merge_asof
a pandas function to get output based on timestamps. We have manipulated the direction parameter to either backward, forward, or nearest to get the desired output.
Browse more articles at AskPython .
Also read: Pandas merge_ordered – Perform a merge for ordered data with optional filling/interpolation
Reference
https://pandas.pydata.org/pandas-docs/version/0.25.0/reference/api/pandas.merge_asof.html