In this article, we will work on an important concept – Data Type Conversion of columns in a DataFrame using Python astype() method in detail.
Understanding Python astype() function
Before diving deep into the concept of Data type conversion with the Python astype() method, let us first consider the below scenario.
In the domain of Data Science and Machine Learning, we often come across a stage where we need to pre-process and transform the data. In fact, to be precise, the transformation of data values is the keen step towards modeling.
This is when Conversion of data columns comes into picture.
Python astype() method enables us to set or convert the data type of an existing data column in a dataset or a data frame.
By this, we can change or transform the type of the data values or single or multiple columns to altogether another form using astype() function.
Let us now focus on the syntax of astype() function in detail in the upcoming section.
Syntax – astype() function
Have a look at the below syntax!
DataFrame.astype(dtype, copy=True, errors=’raise’)
- dtype: The data type we want to apply to the entire data frame.
- copy: By setting it to True, it creates another copy of the dataset inculcating the changes to it.
- errors: By setting it to ‘raise‘, we allow the exceptions to be raised by the function. If not, we can set it to ‘ignore‘.
Having understood the syntax of the function, let us now focus on the implementation of the same!
1. Python astype() with a DataFrame
In this example, we have created a DataFrame from the dictionary as shown below using pandas.DataFrame()
method.
Example:
import pandas as pd
data = {"Gender":['M','F','F','M','F','F','F'], "NAME":['John','Camili','Rheana','Joseph','Amanti','Alexa','Siri']}
block = pd.DataFrame(data)
print("Original Data frame:\n")
print(block)
block.dtypes
Output:
Let us have a look at the original data types of the keys.
Original Data frame:
Gender NAME
0 M John
1 F Camili
2 F Rheana
3 M Joseph
4 F Amanti
5 F Alexa
6 F Siri
Gender object
NAME object
dtype: object
Now, we have applied astype() method on the ‘Gender’ column and have changed the data type to ‘category’.
block['Gender'] = block['Gender'].astype('category')
block.dtypes
Output:
Gender category
NAME object
dtype: object
2. Implementing Python astype() with a Dataset
Here, we have imported the dataset using pandas.read_csv() function. You can find the dataset here.
Example:
import pandas
BIKE = pandas.read_csv("Bike.csv")
BIKE.dtypes
The original data types of the columns–
temp float64
hum float64
windspeed float64
cnt int64
season_1 int64
season_2 int64
season_3 int64
season_4 int64
yr_0 int64
yr_1 int64
mnth_1 int64
mnth_2 int64
mnth_3 int64
mnth_4 int64
mnth_5 int64
mnth_6 int64
mnth_7 int64
mnth_8 int64
mnth_9 int64
mnth_10 int64
mnth_11 int64
mnth_12 int64
weathersit_1 int64
weathersit_2 int64
weathersit_3 int64
holiday_0 int64
holiday_1 int64
dtype: object
Now, we have tried to change the data type of the variables ‘season_1’ and ‘temp’. Thus, we say that with astype() function, we can change the data types of multiple columns in a single go!
BIKE = BIKE.astype({"season_1":'category', "temp":'int64'})
BIKE.dtypes
Output:
temp int64
hum float64
windspeed float64
cnt int64
season_1 category
season_2 int64
season_3 int64
season_4 int64
yr_0 int64
yr_1 int64
mnth_1 int64
mnth_2 int64
mnth_3 int64
mnth_4 int64
mnth_5 int64
mnth_6 int64
mnth_7 int64
mnth_8 int64
mnth_9 int64
mnth_10 int64
mnth_11 int64
mnth_12 int64
weathersit_1 int64
weathersit_2 int64
weathersit_3 int64
holiday_0 int64
holiday_1 int64
dtype: object
Conclusion
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For more such posts related to Python, Stay tuned and till then, Happy learning!! 🙂