Pandas to_datetime - Convert argument to datetime.

In this article, let’s try to understand yet another general function of the Pandas package. The Pandas software package for the Python programming language is used to manipulate and interpret information. Both “Panel Data” and “Python Data Analysis” are referred to as “Pandas.” It includes particular approaches and data structures for mathematical tables and time series.

One such function for conversion of the data type of the arguments is to_datetime(). The type of input is modified into the datetime format. When working with Time Series data, this function is incredibly beneficial.

Also read: Pandas date_range – Return a fixed frequency DatetimeIndex

Why is Pandas to_datetime used?

This function is used to change the data type of input that can be scalar, array-like, series, list, or dict-like value/s int datetime object. The return type of scalar input is TimeStamp or datetime.datetime. For array-like input, the return type is DatetimeIndex; for series or data frame input, the return type is series or datetime64.

The function will raise ValueErrorIf argument is a string or list-like type, define a date parse order. If True, dates are processed starting with the year, so “10/11/12” would be parsed as 2010-11-12.

Dayfirst is preceded by YearFirst if both are True (same as dateutil). if a subsequent datetime conversion blunder occurs.

For instance, when any of the DataFrame’s “year,” “month,” or “day” columns are absent, or when a datetime with Timezone Awareness is used. Datetime is located in what looks like an array of mixed time offsets, with utc=False.

Also, check out the to_timedelta() function of the Pandas package. It is similar to the to_datetime() function, the only difference is that it converts the argument to timedelta.

Syntax of Pandas to_datetime

pandas.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=True)

arg: int, float, str, datetime, list, tuple, 1-dimensional array, Series, DataFrame/dict-like
- the object that has to be transformed into a datetime. If a DataFrame is given, the method requires at a minimum the columns “year,” “month,” and “day.”
errors: {‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’
- Invalid parsing will cause an exception if “raise” is set.
- Invalid parsing will be set as NaT if “coerce” is selected.
- Invalid parsing will return the input if ignore is selected.
dayfirst: bool, default False
- If an argument is a string or list-like, specify a date parse order. Dates are parsed with the day first if this is True; for example, “28/04/20” is interpreted as 2020-04-28.
- Day-first parsing is preferred but not required by dayfirst=True. A warning will be displayed if a delimited date string, such as to datetime([’31-12-2021′]), cannot be parsed using the dayfirst option.
yearfirst: bool, default False
- If an argument is a string or list-like type, define a date parse order. If True, dates are processed starting with the year, so “2/1/1” would be parsed as 2002-1-1. Dayfirst is preceded by YearFirst if both are True (same as dateutil).
utc: bool, default None
- Control the conversion, localization, and parsing of timezone information.
  - If True, the function always outputs a UTC-localized Timestamp, Series, or DatetimeIndex that is timezone aware. This is accomplished by localizing timezone-naive inputs as UTC and converting timezone-aware inputs to UTC.
  - When False (the default), inputs are not forced to use UTC. Timezone-unaware inputs will maintain their time offsets, while timezone-aware ones will remain unaffected. There are restrictions for mixed offsets (usually, daylight savings time)
format: str, default None
- the strftime, which can be “%d/%m/%Y,” to parse time. Keep in mind that “%f” will parse down to nanoseconds.
exact: bool, default True
- Control the use of the format. Need an exact format match if True. If False, the target string may contain any location where the format matches.
unit: str, default ‘ns’
- An integer or float number is used as the unit of the argument (D,s, ms, us, ns). The foundation for this is the source. For instance, this would determine the number of milliseconds till the unix epoch started if unit=’ms’ and origin=’unix’.
infer_datetime_format: bool, default False
- If True and no format is specified, try to deduce the format from the first non-NaN element and, if successful, switch to a quicker method of parsing the datetime strings. This can sometimes speed up parsing by 5 to 10 times.
origin: scalar, default ‘unix’
- Define the reference date. Since this reference date, the number of units (specified by unit) would be determined by parsing the numerical data.
  - If using POSIX or Unix time, the origin is 1970-01-01.
  - If “julian,” the unit must be “D,” and the origin must be the start of the Julian Calendar. The day begins at noon on January 1, 4713 BC has Julian day number 0.
  - origin is set to the Timestamp indicated by the origin of the timestamp is convertible.
cache: bool, default True
- If True, do the datetime conversion using a cache of distinct converted dates. Parsing duplicate date strings, especially those with timezone offsets, may result in a noticeable speedup. Only when there are at least 50 values is the cache employed. Out-of-bounds values will make the cache useless and can make parsing take longer.

Implementing Pandas to_datetime with Examples

Prior to creating the method, make sure you import the Pandas package into your IDE. Run the following code line first to accomplish this.

import pandas as pd

Example 1: Passing string input

input = '2023-01-18'
x = type(input)
print(x)

output = pd.to_datetime(input)
y = type(output)
print("Input: ", input," ", x, "\nOutput: ", output, " ", y)

Note the output/return type here is TimeStamp.

Example 2: Passing array-like input

input = ['2023-01-01', '2023-01-02']
x = type(input)

output = pd.to_datetime(input)
y = type(output)
print("Input: ", input," ", x, "\nOutput: ", output, " ", y)

Note the return type here is DatetimeIndex

Example 3: Passing series input

input = pd.Series(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04'])
x = type(input)

output = pd.to_datetime(input)
y = type(output)
print("Input: ", input," Type: ", x, "\nOutput: ", output, " ", y)

Note the return type here is Series.

Example 4: Passing other parameters

x = pd.to_datetime(1999, unit='D', origin='unix')        #origin is set to 1970-01-01
y = pd.to_datetime(1999, unit='D', origin='2000-04-10')  #providing reference date
print(x, " \n", y)

x = pd.to_datetime('23/01/02', dayfirst=True, utc=True)   #parses dates with the day first
y = pd.to_datetime('23/01/02', yearfirst=True, utc=True)  #parses dates with the year first
print(x, " \n", y)

Summary

When dealing with Time Series, to_datetime() is one of the essential functions of the Pandas package. It helps in the easy conversion and modification of dates and time series of datasets.

To learn more about Pandas built-in functions and Python language click here!

Reference

Official documentation