How to Get Unique Values from a Dataframe in Python?

Get Unique Values From A Dataframe

Hello, readers! In this article, we will be focusing on how to get unique values from a DataFrame in Python.

So, let us get started!


What is a Python DataFrame?

Python Pandas module offers us various data structures and functions to store and manipulate a huge volume of data.

DataFrame is a data structured offers by Pandas module to deal with large datasets in more than one dimension such as huge csv or excel files, etc.

As we can store a large volume of data in a data frame, we often come across a situation to find the unique data values from a dataset which may contain redundant or repeated values.

This is when pandas.dataframe.unique() function comes into picture.

Let us now focus on the functioning of unique() function in the upcoming section.


Python pandas.unique() Function to Get Unique Values From a Dataframe

The pandas.unique() function returns the unique values present in a dataset.

It basically uses a technique based on hash tables to return the non-redundant values from the set of values present in the data frame/series data structure.

Let us try to understand the role of unique function through an example–

Consider a dataset containing values as follows: 1,2,3,2,4,3,2

Now, if we apply unique() function, we would obtain the following result: 1,2,3,4. By this, we have found the unique values of the dataset easily.

Now, let us discuss the structure of pandas.unique() function in the next section.


Syntax of Python unique() function

Have a look at the below syntax:

pandas.unique(data)

The above syntax is useful when the data is of 1-Dimensional. It represents the unique value from the 1-Dimensional data values(Series data structure).

But, what if the data contains more than a single dimension i.e. rows and columns? Yes, we do have a solution for that in the below syntax–

pandas.dataframe.column-name.unique()

This syntax enables us to find unique values from the particular column of a dataset.

It is good for the data to be of categorical type for the unique function to avail proper results. Moreover, the data gets displayed in the order of its occurrence in the dataset.


Python unique() function with Pandas Series

In the below example, we have created a list which contains redundant values.

Further, we have converted the list into a series data structure because it has a single dimension. Finally, we have applied the unique() function to fetch the unique values from the data.

Example:

lst = [1,2,3,4,2,4]
df = pandas.Series(lst)
print("Unique values:\n")
print(pandas.unique(df))

Output:

Unique values:
[1 2 3 4]

Python unique() function with Pandas DataFrame

Let us first load the dataset into the environment as shown below–

import pandas
BIKE = pandas.read_csv("Bike.csv")

You can find the dataset here.

The pandas.dataframe.nunique() function represents the unique values present in each column of the dataframe.

BIKE.nunique()

Output:

season          4
yr              2
mnth           12
holiday         2
weathersit      3
temp          494
hum           586
windspeed     636
cnt           684
dtype: int64

Further, we have represented the unique values presents in the column ‘season’ using the below piece of code–

BIKE.season.unique()

Output:

array([1, 2, 3, 4], dtype=int64)

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to Python, Stay tuned and till then, Happy Learning!! 🙂