Data analysis has become an important part of our everyday life. Every day we deal with different kinds of data from different domains. One of the major challenges in data analysis is the presence of missing values or (NA) in the data. In this article, we will learn how we can handle the missing values in a dataset with the help of the fillna() method. Let’s get started!
What Is the Pandas fillna() Method and Why Is It Useful?
The Pandas Fillna() is a method that is used to fill the missing or NA values in your dataset. You can either fill the missing values like zero or input a value. This method will usually come in handy when you are working with CSV or Excel files.
Don’t get confused with the dropna() method where we remove the missing values. In this case, we will replace the missing values with zero or with an input value from the user.
Let’s look at the syntax of the fillna() function.
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)
Let’s look at the examples below of how you can use the fillna () method for different scenarios.
Pandas DataFrame fillna() method
In the following example, we will fill the place of NAN values with zeros.
import pandas as pd
import numpy as np
df = pd.DataFrame([[np.nan, 300, np.nan, 330],
[589, 700, np.nan, 103],
[np.nan, np.nan, np.nan, 675],
[np.nan, 3]],
columns=list('abcd'))
print(df)
#Filling the NaN values with zeros.
print("\n")
print(df.fillna(0))
Output
a b c d
0 NaN 300.0 NaN 330.0
1 589.0 700.0 NaN 103.0
2 NaN NaN NaN 675.0
3 NaN 3.0 NaN NaN
a b c d
0 0.0 300.0 0.0 330.0
1 589.0 700.0 0.0 103.0
2 0.0 0.0 0.0 675.0
3 0.0 3.0 0.0 0.0
Applying fillna() method to only one column
df = pd.DataFrame([[np.nan, 300, np.nan, 330],
[589, 700, np.nan, 103],
[np.nan, np.nan, np.nan, 675],
[np.nan, 3]],
columns=list('abcd'))
print(df)
#Filling the NaN value
print("\n")
newDF = df['b'].fillna(0)
print(newDF)
Output
a b c d
0 NaN 300.0 NaN 330.0
1 589.0 700.0 NaN 103.0
2 NaN NaN NaN 675.0
3 NaN 3.0 NaN NaN
0 300.0
1 700.0
2 0.0
3 3.0
Name: b, dtype: float64
You can also use the limit method to specify which rows you want to fill the NAN values.
import pandas as pd
import numpy as np
df = pd.DataFrame([[np.nan, 300, np.nan, 330],
[589, 700, np.nan, 103],
[np.nan, np.nan, np.nan, 675],
[np.nan, 3]],
columns=list('abcd'))
print(df)
# Filing the NaN value
print("\n")
print(df.fillna(0, limit=2))
Output
a b c d
0 NaN 300.0 NaN 330.0
1 589.0 700.0 NaN 103.0
2 NaN NaN NaN 675.0
3 NaN 3.0 NaN NaN
a b c d
0 0.0 300.0 0.0 330.0
1 589.0 700.0 0.0 103.0
2 0.0 0.0 NaN 675.0
3 NaN 3.0 NaN 0.0
In the above method, we have applied limit=2 which means we have replaced NAN values in only the first two rows.
Conclusion
In summary, we learned different methods to fill NAN values in a DataFrame. All these, methods will come in handy in any of your data analysis projects.