Pandas DataFrame Indexing: Set the Index of a Pandas Dataframe

Set Index Of A Dataframe In Python

Hello Readers! In this tutorial, we are going to discuss the different ways to set the index of a Pandas DataFrame object in Python.


What do we mean by indexing of a Pandas Dataframe?

In Python, when we create a Pandas DataFrame object using the pd.DataFrame() function which is defined in the Pandas module automatically (by default) address in the form of row indices and column indices is generated to represent each data element/point in the DataFrame that is called index.

But, the row indices are called the index of the DataFrame, and column indices are simply called columns. The index of a Pandas DataFrame object uniquely identifies its rows. Let’s start our core discussion about the different ways to set the index of a Pandas DataFrame object in Python.

Set index of the DataFrame while creating

In Python, we can set the index of the DataFrame while creating it using the index parameter. In this method, we create a Python list and pass it to the index parameter of the pd.DataFrame() function to its index. Let’s implement this through Python code.

# Import Pandas module
import pandas as pd 

# Create a Python dictionary
data = {'Name': ['Rajan', 'Raman', 'Deepak', 'David', 'Shivam'],
        'Marks': [93, 88, 95, 75, 99],
        'City': ['Agra', 'Pune', 'Delhi', 'Sivan', 'Delhi']}

# Create a Python list of Roll NOs
Roll = [11, 12, 13, 14, 15]

# Create a DataFrame from the dictionary
# and set Roll column as the index
# using DataFrame() function with index parameter
df = pd.DataFrame(data, index = Roll)
print(df)

Output:

Set Index Using Index Parameter

Set index of the DataFrame using existing columns

In Python, we can easily set any existing column or columns of a Pandas DataFrame object as its index in the following ways.

1. Set column as the index (without keeping the column)

In this method, we will make use of the inplace parameter which is an optional parameter of the set_index() function of the Python Pandas module. By default the value of the inplace parameter is False. But here we will set the value of inplace as True. So that the old index of the DataFrame is replaced by the existing column which has been passed to the pd.set_index() function as the new index. Let’s implement this through Python code.

# Import Pandas module
import pandas as pd 

# Create a Python dictionary
data = {'Name': ['Rajan', 'Raman', 'Deepak', 'David'],
        'Roll': [11, 12, 13, 14],
        'Marks': [93, 88, 95, 75]}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)
print("\nThis is the initial DataFrame:")
print(df)

# Set the Roll column as the index
# using set_index() function
df = df.set_index('Roll')
print("\nThis is the final DataFrame:")
print(df)

Output:

Set Column As Index

2. Set column as the index (keeping the column)

In this method, we will make use of the drop parameter which is an optional parameter of the set_index() function of the Python Pandas module. By default the value of the drop parameter is True. But here we will set the value of the drop parameter as False. So that the column which has been set as the new index is not dropped from the DataFrame. Let’s implement this through Python code.

# Import Pandas module
import pandas as pd 

# Create a Python dictionary
data = {'Roll': [111, 112, 113, 114],
        'Name': ['Rajan', 'Raman', 'Deepak', 'David'],
        'Marks': [93, 88, 95, 75]}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)
print("\nThis is the initial DataFrame:")
print(df)

# Set the Name column as the index
# using set_index() function with drop
df = df.set_index('Name', drop = False)
print("\nThis is the final DataFrame:")
print(df)

Output:

Set Index Using Drop Parameter

3. Set multiple columns as the index of the DataFrame

In this method, we can set multiple columns of the Pandas DataFrame object as its index by creating a list of column names of the DataFrame then passing it to the set_index() function. That’s why in this case, the index is called multi-index. Let’s implement this through Python code.

# Import Pandas module
import pandas as pd 

# Create a Python dictionary
data = {'Roll': [111, 112, 113, 114],
        'Name': ['Rajan', 'Raman', 'Deepak', 'David'],
        'Marks': [93, 88, 95, 75],
        'City': ['Agra', 'Pune', 'Delhi', 'Sivan']}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)
print("\nThis is the initial DataFrame:")
print(df)

# Set the Roll & Name column as the multi-index
# using set_index() function and list of column names
df = df.set_index(['Roll', 'Name'])
print("\nThis is the final DataFrame:")
print(df)

Output:

Set Columns As Multi Index

Set index of the DataFrame using Python objects

In Python, we can set any Python object like a list, range, or series as the index of the Pandas DataFrame object in the following ways.

1. Python list as the index of the DataFrame

In this method, we can set the index of the Pandas DataFrame object using the pd.Index(), range(), and set_index() function. First, we will create a Python sequence of numbers using the range() function then pass it to the pd.Index() function which returns the DataFrame index object. Then we pass the returned DataFrame index object to the set_index() function to set it as the new index of the DataFrame. Let’s implement this through Python code.

# Import Pandas module
import pandas as pd 

# Create a Python dictionary
data = {'Roll': [111, 112, 113, 114, 115],
        'Name': ['Rajan', 'Raman', 'Deepak', 'David', 'Shivam'],
        'Marks': [93, 88, 95, 75, 99],
        'City': ['Agra', 'Pune', 'Delhi', 'Sivan', 'Delhi']}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)
print("\nThis is the initial DataFrame:")
print(df)

# Create a Python list
list = ['I', 'II', 'III', 'IV', 'V']

# Create a DataFrame index object
# using pd.Index() function
idx = pd.Index(list)

# Set the above DataFrame index object as the index
# using set_index() function
df = df.set_index(idx)
print("\nThis is the final DataFrame:")
print(df)

Output:

Set List As Index

2. Python range as the index of the DataFrame

In this method, we can set the index of the Pandas DataFrame object using the pd.Index() and set_index() function. First, we will create a Python list then pass it to the pd.Index() function which returns the DataFrame index object. Then we pass the returned DataFrame index object to the set_index() function to set it as the new index of the DataFrame. Let’s implement this through Python code.

# Import Pandas module
import pandas as pd 

# Create a Python dictionary
data = {'Roll': [111, 112, 113, 114, 115],
        'Name': ['Rajan', 'Raman', 'Deepak', 'David', 'Shivam'],
        'Marks': [93, 88, 95, 75, 99],
        'City': ['Agra', 'Pune', 'Delhi', 'Sivan', 'Delhi']}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)
print("\nThis is the initial DataFrame:")
print(df)

# Create a DataFrame index object
# using pd.Index() & range() function
idx = pd.Index(range(1, 6, 1))

# Set the above DataFrame index object as the index
# using set_index() function
df = df.set_index(idx)
print("\nThis is the final DataFrame:")
print(df)

Output:

Set Range As Index

3. Python series as the index of the DataFrame

In this method, we can set the index of the Pandas DataFrame object using the pd.Series(), and set_index() function. First, we will create a Python list and pass it to the pd.Series() function which returns a Pandas series that can be used as the DataFrame index object. Then we pass the returned Pandas series to the set_index() function to set it as the new index of the DataFrame. Let’s implement this through Python code.

# Import Pandas module
import pandas as pd 

# Create a Python dictionary
data = {'Roll': [111, 112, 113, 114, 115],
        'Name': ['Rajan', 'Raman', 'Deepak', 'David', 'Shivam'],
        'Marks': [93, 88, 95, 75, 99],
        'City': ['Agra', 'Pune', 'Delhi', 'Sivan', 'Delhi']}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)
print("\nThis is the initial DataFrame:")
print(df)

# Create a Pandas series
# using pd.Series() function & Python list
series_idx = pd.Series([5, 4, 3, 2, 1])

# Set the above Pandas series as the index
# using set_index() function
df = df.set_index(series_idx)
print("\nThis is the final DataFrame:")
print(df)

Output:

This is the initial DataFrame:
   Roll    Name  Marks   City
0   111   Rajan     93   Agra
1   112   Raman     88   Pune
2   113  Deepak     95  Delhi
3   114   David     75  Sivan
4   115  Shivam     99  Delhi

This is the final DataFrame:
   Roll    Name  Marks   City
5   111   Rajan     93   Agra
4   112   Raman     88   Pune
3   113  Deepak     95  Delhi
2   114   David     75  Sivan
1   115  Shivam     99  Delhi

4. Set index of the DataFrame keeping the old index

In this method, we will make use of the append parameter which is an optional parameter of the set_index() function of the Python Pandas module. By default the value of the append parameter is False. But here we will set the value of the append parameter as True. So that the old index of the DataFrame is appended by the new index which has been passed to the set_index() function. Let’s implement this through Python code.

# Import Pandas module
import pandas as pd 

# Create a Python dictionary
data = {'Roll': [111, 112, 113, 114, 115],
        'Name': ['Rajan', 'Raman', 'Deepak', 'David', 'Shivam'],
        'Marks': [93, 88, 95, 75, 99],
        'City': ['Agra', 'Pune', 'Delhi', 'Sivan', 'Delhi']}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)
print("\nThis is the initial DataFrame:")
print(df)

# Set Roll column as the index of the DataFrame
# using set_index() function & append
df = df.set_index('Roll', append = True)
print("\nThis is the final DataFrame:")
print(df)

Output:

Set Index Using Append Parameter

Conclusion

In this tutorial we have learned the following things:

  • What is the index of a Pandas DataFrame object?
  • How to set index while creating a DataFrame?
  • How to set existing columns of DataFrame as index or multi-index?
  • How to set the Python objects like list, range, or Pandas series as index?
  • How to set new index keeping the older one?