Python astype() – Type Conversion of Data columns

Python Astype() Method

In this article, we will work on an important concept – Data Type Conversion of columns in a DataFrame using Python astype() method in detail.


Understanding Python astype() function

Before diving deep into the concept of Data type conversion with the Python astype() method, let us first consider the below scenario.

In the domain of Data Science and Machine Learning, we often come across a stage where we need to pre-process and transform the data. In fact, to be precise, the transformation of data values is the keen step towards modeling.

This is when Conversion of data columns comes into picture.

Python astype() method enables us to set or convert the data type of an existing data column in a dataset or a data frame.

By this, we can change or transform the type of the data values or single or multiple columns to altogether another form using astype() function.

Let us now focus on the syntax of astype() function in detail in the upcoming section.


Syntax – astype() function

Have a look at the below syntax!

DataFrame.astype(dtype, copy=True, errors=’raise’)
  • dtype: The data type we want to apply to the entire data frame.
  • copy: By setting it to True, it creates another copy of the dataset inculcating the changes to it.
  • errors: By setting it to ‘raise‘, we allow the exceptions to be raised by the function. If not, we can set it to ‘ignore‘.

Having understood the syntax of the function, let us now focus on the implementation of the same!


1. Python astype() with a DataFrame

In this example, we have created a DataFrame from the dictionary as shown below using pandas.DataFrame() method.

Example:

import pandas as pd 
data = {"Gender":['M','F','F','M','F','F','F'], "NAME":['John','Camili','Rheana','Joseph','Amanti','Alexa','Siri']}

block = pd.DataFrame(data)
print("Original Data frame:\n")
print(block)
block.dtypes

Output:

Let us have a look at the original data types of the keys.

Original Data frame:

  Gender    NAME
0      M    John
1      F  Camili
2      F  Rheana
3      M  Joseph
4      F  Amanti
5      F   Alexa
6      F    Siri

Gender    object
NAME      object
dtype: object

Now, we have applied astype() method on the ‘Gender’ column and have changed the data type to ‘category’.

block['Gender'] = block['Gender'].astype('category')
block.dtypes

Output:

Gender    category
NAME        object
dtype: object

2. Implementing Python astype() with a Dataset

Here, we have imported the dataset using pandas.read_csv() function. You can find the dataset here.

Example:

import pandas 
BIKE = pandas.read_csv("Bike.csv")
BIKE.dtypes

The original data types of the columns–

temp            float64
hum             float64
windspeed       float64
cnt               int64
season_1          int64
season_2          int64
season_3          int64
season_4          int64
yr_0              int64
yr_1              int64
mnth_1            int64
mnth_2            int64
mnth_3            int64
mnth_4            int64
mnth_5            int64
mnth_6            int64
mnth_7            int64
mnth_8            int64
mnth_9            int64
mnth_10           int64
mnth_11           int64
mnth_12           int64
weathersit_1      int64
weathersit_2      int64
weathersit_3      int64
holiday_0         int64
holiday_1         int64
dtype: object

Now, we have tried to change the data type of the variables ‘season_1’ and ‘temp’. Thus, we say that with astype() function, we can change the data types of multiple columns in a single go!

BIKE = BIKE.astype({"season_1":'category', "temp":'int64'}) 
BIKE.dtypes

Output:

temp               int64
hum              float64
windspeed        float64
cnt                int64
season_1        category
season_2           int64
season_3           int64
season_4           int64
yr_0               int64
yr_1               int64
mnth_1             int64
mnth_2             int64
mnth_3             int64
mnth_4             int64
mnth_5             int64
mnth_6             int64
mnth_7             int64
mnth_8             int64
mnth_9             int64
mnth_10            int64
mnth_11            int64
mnth_12            int64
weathersit_1       int64
weathersit_2       int64
weathersit_3       int64
holiday_0          int64
holiday_1          int64
dtype: object

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to Python, Stay tuned and till then, Happy learning!! 🙂