Pandas to_numeric - Convert the argument to a numeric type.

Let’s attempt to comprehend one of the Pandas package’s general functions, to_numeric() in this article. The Pandas software package for the Python programming language is used to manipulate and analyze data. Both “Panel Data” and “Python Data Analysis” are referred to as “Pandas.” It provides particular methods and data structures for dealing with mathematical tables and time series. It is software that is freely available.

This function helps in converting the data type of provided input. Let’s try to understand this function’s use cases, syntax, and implementation in Python.

Also read: Pandas to_timedelta – Convert argument to timedelta.

Why is Pandas to_numeric used?

This function is used to convert the data type of the passed input argument into a numeric type. One can also downcast the numeric type using this function. For example, the data type is float64, using this function one can downcast it to float32 or to the smallest dtype form. The default data type of output is float64 for floating input and int64 for integer input.

Please be aware that passing in really massive numbers may result in accuracy loss. Due to the intrinsic constraints of the n-dimensional array, it is extremely likely that values are passed in between the range of-9223372036854775808 (np.iinfo(np.int64).min) and 18446744073709551615 (np.iinfo(np.uint64).max) will be converted to floats in order to be stored in an n-dimensional array. Because Series internally employs an n-dimensional array, these cautions also apply to it.

Syntax of Pandas to_numeric

pandas.to_numeric(arg, errors='raise', downcast=None)

arg: scalar, list, tuple, 1-d array, or Series, Required

Input argument to be converted

errors: {ignore, raise, coerce}, default ‘raise’, Optional

Invalid parsing will cause an exception if “raise” is set.
Invalid parsing will be set as NaN if “coerce” is selected.
Invalid parsing will return the input if ignore is selected.

downcast: str, default None, Optional

It can be assigned as “float,” “signed,” “unsigned,” or “signed integer.” If not None, downcast the resulting data to the smallest numerical dtype possible in accordance with the following principles if the data has been successfully cast to a numerical dtype (or if the data was numerical, to begin with).
- ‘integer’ or ‘signed’: smallest signed int dtype (minimum: np.int8)
- ‘unsigned’: smallest unsigned int dtype (minimum: np.uint8)
- ‘float’: smallest float dtype (minimum: np.float32)

Any downcasting issues will be presented irrespective of the value of the ‘errors’ input because this behavior is distinct from the fundamental conversion to numeric values. Furthermore, downcasting won’t be applied to the data if none of the data types tested meet the requirement that the generated data type’s size is strictly more significant than the dtype it is to be recast.

Implementing Pandas to_numeric

Make sure to import the Pandas package in your IDE before implementing the function. To do so, run the following code line first.

import pandas as pd

Example 1: passing series as the only parameter

Here pd.Series() is used to create a series

x = pd.Series(['-5','0.00','5'])
pd.to_numeric(x)

Example: passing series as the only parameter

Example 2: passing downcast parameter

 #converting into the smallest signed int dtype (int8)
x = pd.Series([3.0, -2, 7])
pd.to_numeric(x, downcast='signed')

Example: passing downcast 'signed' parameter — Example: passing downcast ‘signed’ parameter

#converting into the smallest unsigned int dtype (minimum: int8)
x_input = pd.Series([3.0, -2, 7])
x_output = pd.to_numeric(x_input, downcast='unsigned')

y_input =pd.Series([2,3,4])
y_output = pd.to_numeric(y_input, downcast='unsigned')

print("\nOUTPUT: 1\n",x_output)
print("\nOUTPUT: 2\n",y_output)

Example: passing downcast 'unsigned' parameter — Example: passing downcast ‘unsigned’ parameter

Note in the above example, in the first case as the decimal value was present in the input, hence the conversion remains as default. But in the second case the data type changes from default to the smallest unsigned int data type which is int8.

#downcasting of nullable integer and float data type
x = pd.Series([-1,0,1,2], dtype="float128")
x_out = pd.to_numeric(x, downcast='float')

y = pd.Series([5, 10, 15, 20], dtype="int64")
y_out = pd.to_numeric(y, downcast='integer')

print(x_out, "\n", y_out)

Example: downcasting of nullable integer and float data type

Example 3: Passing the error parameter

#invalid parsing will raise an exception.
x = pd.Series(['-2','0.01','askPython'])
pd.to_numeric(x, errors="raise")

Example: invalid parsing will raise an exception.

# invalid parsing will return the input.
x = pd.Series(['-2','0.01','askPython'])
pd.to_numeric(x, errors="ignore")

Note: The data type is converted into the object when the errors parameter is set to ‘ignore’

Example: invalid parsing will return the input.

#invalid parsing will be set as NaN.
x = pd.Series(['-2','0.01','askPython'])
pd.to_numeric(x, errors="coerce")

Example: invalid parsing will be set as NaN.

Summary

Pandas help to work on data efficiently. One such function is discussed in this article. It helps in the easy conversion of data types to the numeric form – integer as well as float. Also downcasting the data type is possible using this general function of Pandas’ open-sourced library. For more such tutorials on Pandas and Python language click here!

Reference

https://pandas.pydata.org/docs/reference/api/pandas.to_numeric.html