How to Read and Parse a Text File in Python?

READING AND PARSING A TEXT FILE

The text file is one of the oldest and most efficient storage formats. Introduced way before to what we now know as the Comma Separated Value format (CSV), text files have been used because of their simplicity. They are saved with a txt extension (dAta.txt) and are mostly used to introduce file handling in various programming languages.

Not sure about File Handling? No worries!

How about we learn how to read text files and parse them in different formats using Python?

How Do We Read a Text File?

If you want to read a .txt file available in your local storage area and also wish to bring it to your coding environment for further tasks, the ultimate approach is to use the read function.

Before that, let us become accustomed to the text file. The text file I created for this tutorial is called Details.txt and it looks something like this:

Sample Text File
Sample Text File

To read this file, follow the code below.

f = open("Details.txt","r")
print(f.read())

We are searching for the file in our storage and opening it.Then we are reading it with the help of read() function.

Reading The Text File
Reading The Text File

Well, we have read the file successfully in our environment, but as you can see, we need to perform some cleaning on our file. If you observe, there are uneven lengths of white space in between the columns. Let’s see how we can get rid of them.

fp = 'Details.txt'
with open(fp, 'r') as f:
    for line in f:
        cf= ','.join(part.strip() for part in line.split(','))
        print(cf)

Firstly, we are getting the path to the file, which we need to store in a variable called fp. In the second line, we are opening the file and reading it as f for convenience.

Since the issue of extra spaces lies in each line of the text file, we are accessing one line at a time, removing the extra spaces using the strip function and separating each word with a comma. This result is stored in a variable called cf.

The cleaned text is printed on the last line.

Cleaning the text file
Cleaning the text file

Although text files are simple to use and understand, we sometimes need the data in a more structured way. Examples of structured storage formats are CSV, Data Frames, and JSON.

Let us see how to parse the text file as a data frame and a JSON string.

Parsing the Text File as a Data Frame

Dataframes are 2D-labeled data structures with columns that can be of different types. It is the most used storage entity in the Pandas library.

Know more about data frame here

We can use the code below to render a data frame from the text file.

import pandas as pd
fp = 'Details.txt'
data = []
with open(fp, 'r') as f:
    header = f.readline().strip().split(',')
    for line in f:
        val = line.strip().split(',')
        data.append(dict(zip(header, val)))
df = pd.DataFrame(data)
print(df)

In the first line, we are importing the Pandas library with its alias name.

Next, we stored the path of our file in a variable called fp.

We are initializing an empty dictionary called data to store the information in the text file. Next, we read the data line by line, remove any white spaces, and separate the columns with commas. The same is done with the values in the columns. Then, we append the column names and values by zipping them together. The dictionary is then converted into a data frame and printed on the last line.

Parsing the text file as a Data frame
Parsing the text file as a Data frame

Parsing the Text File as Json

JSON (JavaScript Object Notation) is a lightweight data-interchange format that stores data in the form of a key-value pair (name:xyz).

Follow through this article to know how to read a JSON file.

Let us see the code to parse the text file as a JSON string.

import json
fp = 'Details.txt'
data = []
with open(fp, 'r') as f:
    header = f.readline()
    for line in f:
        values = line.strip().split(',')
        name, age, date, gender = values
        columns= {
            'Name': name,
            'Age': int(age),
            'Date': date,
            'Gender': gender
        }
        data.append(columns)
jsonstr= json.dumps(data, indent=4)
print(jsonstr)

Observe line nine. We are defining the column names that must be included in the JSON string. These column names and values are appended to the dictionary data, and then the json.dumps method is used to convert the dictionary into a JSON string.

Parsing the text file as a JSON string
Parsing the text file as a JSON string

Conclusion

To summarize, we have learned what a text file is and how it was the most commonly used format for storing data before a few advanced structures like CSV and JSON were rolled out.

Then we saw how to read a text file and open it. We also saw how we can remove unnecessary white spaces present in the text file.

Lastly, we learned how to parse the text file we created as a data frame and JSON string.

References

Learn more about the modes of reading and writing here

Stack Overflow on the same topic