Implementing Pandas read_fwf() in Python

To read data from fixed-width formatted text files to DataFrame, pandas have introduced a function called read_fwf().So what are fixed-width formatted text files? Fixed-width formatted text files are files where each field in a row has a fixed width or fixed number of characters.

These types of files were primarily used in legacy systems for data storage and exchange. To read data from such files and convert them into a DataFrame, the Pandas library provides a function called read_fwf(). This function takes a filename or a file handle as input and returns a DataFrame. It also supports optional parameters for customization, such as specifying column widths, header rows, data types, and more.

Advantages of Using read_fwf() Function

The read_fwf() function is advantageous in handling data with a consistent format and structure, making it easy to read and parse the data. Additionally, the function can efficiently handle large datasets with high speed, which is essential when working with big data.

Implementing the read_fwf() function

In this section, we will demonstrate the implementation of the read_fwf() function through three examples. We will use a sample “data.txt” file containing fixed-width formatted values for illustration purposes. You can download the file from here:

Data.txt Download

This file contains a single column of fixed-width values, with a width of 10 characters per value.

Example 1: Reading a Single-Column Fixed-Width File

import pandas as pd

df = pd.read_fwf('data.txt', colspecs='infer', header=None)
print(df)

We import the pandas librabary as pd.We then use pd.read_fwf() function to read a file named data.txt that contains fixed-width formatted lines into the pandas data frame pd.read_fwf() contains the name or path of the file to be read, this is an optional argument that specifies the widths of the columns in the files. infer automatically infers the column widths from the input file and the last argument too is optional it specifies the presence of a header or not in this case it’s none. At last, we print the result.

Output:

Example 2: Reading a Multi-Column Fixed-Width File

import pandas as pd

colspecs = [(0, 5), (5, 10), (10, 15)]  # define column widths

df = pd.read_fwf('data.txt', colspecs=colspecs, header=None,
                 names=['col1', 'col2', 'col3'])
print(df)

In this example, we specify the widths of the column using the colspecs parameter. We define a list of tuples that contain the start and end positions of each column in the input file. Like before the header is set to none to indicate no header row in the file. [‘col1’, ‘col2’, ‘col3’] are the column names in the resulting DataFrame. At last, we print the result.

Output:

Example 3: Skipping Rows and Specifying Data Types

import pandas as pd

colspecs = [(0, 5), (5, 10), (10, 15)]  # define column widths

df = pd.read_fwf('data.txt', colspecs=colspecs, header=None,
                 names=['col1', 'col2', 'col3'], dtype={'col1': int, 'col2': float})
print(df)

In this code we specify dtype that is the data types of the columns in the resulting DataFrame.(0, 5), (5, 10), (10, 15) are arguments that specify the widths of each column in the input file. In dtype we specify the datatype float and integer of columns where each key is a column name and each value is the corresponding data type. At last, we print the result.

Output:

Conclusion

In this article we implemented read_fwf() method in pandas which is a powerful tool for reading fixed-width formatted files and creating DataFrame.It allows for easy and efficient parsing of data with a consistent structure and can handle large datasets with high speed and efficiency. We went through a brief overview of it initially and later saw three methods to implement it. What other ways can you apply read_fwf() in your data processing tasks?

Read more interesting articles below: