Pandas isin() function – A Complete Guide

Copy Of Gensim Word2Vec

Hello everyone! In this tutorial, we will learn about isin() method present in Pandas module and we will look into behaviour of this function when different types of values are passed. So let’s get started.

DataFrame.isin() method

Pandas isin() method is used to filter the data present in the DataFrame. This method checks whether each element in the DataFrame is contained in specified values. This method returns the DataFrame of booleans. If the element is present in the specified values, the returned DataFrame contains True, else it shows False. Thus this method is useful in filtering the dataframes as we will see through examples below.

Syntax of isin() method is shown below. It takes only 1 parameter:

DataFrame.isin(values)

Here the parameter values could be any one of them:

  • List or Iterable
  • Dictionary
  • Pandas Series
  • Pandas DataFrame

Lets see the result of isin() method when different values are passed to the method.

Examples of the isin() method

Let’s consider some examples of isin() method by passing values of different types. For the examples below, we will use the following data:

import pandas as pd

data = pd.DataFrame({
  'Name': ['John', 'Sam', 'Luna', 'Harry'],
  'Age': [25, 45, 23, 32],
  'Department': ['Sales', 'Engineering', 'Engineering', 'Human Resource']
})

print(data)
    Name  Age      Department
0   John   25           Sales
1    Sam   45     Engineering
2   Luna   23     Engineering
3  Harry   32  Human Resource

isin() method when value is a List

When a list is passed as a parameter value to the isin() method, it checks whether each element in the DataFrame is present in the list, and if found, shows True. For example, if we pass a list of values containing some departments, those values in Department column will be marked as True.

import pandas as pd
# Creating DataFrame
data = pd.DataFrame({
  'Name': ['John', 'Sam', 'Luna', 'Harry'],
  'Age': [25, 45, 23, 32],
  'Department': ['Sales', 'Engineering', 'Engineering', 'Human Resource']
})

#List of Departments to filter
departments_to_filter = ['Engineering', 'Sales', 'Finance']

result = data.isin(departments_to_filter)

print(result)
    Name    Age  Department
0  False  False        True
1  False  False        True
2  False  False        True
3  False  False       False

So, using this way, we can also filter the DataFrame depending on the situation. For example, we want to find employees between age 20 to 30, we can use isin() method on Age column.

import pandas as pd
# Creating DataFrame
data = pd.DataFrame({
  'Name': ['John', 'Sam', 'Luna', 'Harry'],
  'Age': [25, 45, 23, 32],
  'Department': ['Sales', 'Engineering', 'Engineering', 'Human Resource']
})

start_age=20
end_age=30
# Using isin() method to filter employees on age
age_filter = data['Age'].isin(range(start_age, end_age+1))
# Using the filter to retrieve the data
result = data[ age_filter ]

print(result)
   Name  Age   Department
0  John   25        Sales
2  Luna   23  Engineering

isin() method when value is a Dictionary

When a dictionary is passed as a parameter value to the isin() method, the data range to search for will be different for different columns of the DataFrame. Thus we can search for each column separately. For example, in a dictionary, we can pass a list for Name and Department with their own values to search as shown below.

import pandas as pd
# Creating DataFrame
data = pd.DataFrame({
  'Name': ['John', 'Sam', 'Luna', 'Harry'],
  'Age': [25, 45, 23, 32],
  'Department': ['Sales', 'Engineering', 'Engineering', 'Human Resource']
})

#Dictionary data to filter DataFrame
dict_data_to_filter = {'Name': ['Sam', 'Harry'], 'Department': ['Engineering']}

result = data.isin(dict_data_to_filter)

print(result)
    Name    Age  Department
0  False  False       False
1   True  False        True
2  False  False        True
3   True  False       False

isin() method when value is a Series

When a Pandas Series is passed as a parameter value to the isin() method, the order in which values are written in Series becomes important. Each column of the DataFrame will be checked one by one with the values present in the Series in the order in which they are written. Consider the example below.

import pandas as pd
# Creating DataFrame
data = pd.DataFrame({
  'Name': ['John', 'Sam', 'Luna', 'Harry'],
  'Age': [25, 45, 23, 32],
  'Department': ['Sales', 'Engineering', 'Engineering', 'Human Resource']
})

#Series data, changing index of Sam and Luna
series_data = pd.Series(['John', 'Luna', 'Sam', 'Harry'])

result = data.isin(series_data)

print(result)
    Name    Age  Department
0   True  False       False
1  False  False       False
2  False  False       False
3   True  False       False

Although, the values present in the Series contain all the Names present in data DataFrame, the result at index 1 and 2 contains False because we interchanged the index of ‘Sam’ and ‘Luna’. Hence index matters when the Series is passed as value.

isin() method when value is a DataFrame

When a Pandas DataFrame is passed as a parameter value to the isin() method, both index and column of the passed DataFrame must match. If both the DataFrames are same but column names don’t match, the result will show False for those columns. If data in both DataFrames are same, but the order is different, the result will be False for those rows that are different. Thus both index and column are important if DataFrame is passed. Consider the example.

import pandas as pd
# Creating DataFrame
data = pd.DataFrame({
  'Name': ['John', 'Sam', 'Luna', 'Harry'],
  'Age': [25, 45, 23, 32],
  'Department': ['Sales', 'Engineering', 'Engineering', 'Human Resource']
})

# DataFrame to filter, here column name Age to lowercased to age
df = pd.DataFrame({
  'Name': ['John', 'Sam', 'Luna', 'Harry'],
  'age': [25, 45, 23, 32],
  'Department': ['Sales', 'Engineering', 'Engineering', 'Human Resource']
})

result = data.isin(df)
print(result)

print("-----------------")

# DataFrame to filter, here last 2 rows are swapped
df = pd.DataFrame({
  'Name': ['John', 'Sam', 'Harry', 'Luna'],
  'Age': [25, 45, 32, 23],
  'Department': ['Sales', 'Engineering', 'Human Resource', 'Engineering']
})

result = data.isin(df)
print(result)
   Name    Age  Department
0  True  False        True
1  True  False        True
2  True  False        True
3  True  False        True
-----------------
    Name    Age  Department
0   True   True        True
1   True   True        True
2  False  False       False
3  False  False       False

Conclusion

In this tutorial, we learned about Pandas isin() method, its different use cases, and how this method is helpful in filtering out data from a DataFrame. So now you know how to use isin() method and you can filter data easily in a DataFrame, so Congratulations.

Thanks for reading!!