How to Read and Parse a Text File in Python?

I’ve parsed more text files than I can count over the years – logs, configs, CSV exports, API responses saved to disk. The job is always the same: open the file, read the content, turn raw text into something my program can actually work with. Python makes this straightforward, but there are a few patterns worth knowing cold.

This is a complete guide to reading and parsing text files in Python, covering everything from basic file opening to converting structured text into Pandas DataFrames and JSON.

Reading and parsing text files in Python

TLDR

  • Use with open(file, "r") as f to guarantee files close properly, even when exceptions occur
  • Iterate directly over the file object with for line in f — covered in depth in the while loop guide for memory-efficient line-by-line reading
  • Use path.read_text() from pathlib for a concise single-call read
  • Pass encoding explicitly to open() to avoid mismatches on non-UTF-8 files
  • Build lists of dictionaries for structured text files, then convert to JSON or Pandas as needed

How to Read a Text File in Python

The built-in open() function is the starting point. Pass the file path and "r" for read mode, then call read() on the file object to pull all content into memory at once.

Read an entire file into a single string.


with open("data.txt", "r") as f:
    content = f.read()
    print(content)

The with statement opens the file and guarantees it closes when the block exits, even if an exception fires. f.read() loads the entire file as one string. Without with, you must call f.close() manually or the handle may stay open.


Name,Age,City,Score
Alice,28,New York,92
Bob,35,Chicago,78
Carol,24,Boston,95

That reads the full file as a single string. Sometimes that is exactly what you need. More often, you need line-by-line access. There are three ways to do this.

Reading Lines One at a Time with readline()

The readline() method pulls exactly one line from the file, including the trailing newline character \n. Call it repeatedly to process the file piece by piece.

Pull lines one at a time using readline() in a while loop.


with open("data.txt", "r") as f:
    line = f.readline()
    while line:
        print(line.strip())
        line = f.readline()

readline() returns an empty string "" when it reaches the end of the file. The while loop stops when line is falsy. Each line gets .strip() called on it to remove the trailing newline.


Name,Age,City,Score
Alice,28,New York,92
Bob,35,Chicago,78
Carol,24,Boston,95

This approach works well when you want lazy processing without loading everything into memory. It is the most memory-efficient option for large files.

Reading All Lines at Once with readlines()

The readlines() method reads every line and returns them as a list, where each element is one line including its newline character.

Load all lines into a list with readlines(), then iterate.


with open("data.txt", "r") as f:
    lines = f.readlines()

for line in lines:
    print(line.strip())

readlines() reads the entire file and returns a list of strings. This is convenient, but it loads the full file into RAM. Fine for small files, a problem for files that are hundreds of megabytes.


Name,Age,City,Score
Alice,28,New York,92
Bob,35,Chicago,78
Carol,24,Boston,95

Iterating Over a File with a for Loop

The cleanest and most Pythonic approach is to iterate directly over the file object. Python treats file objects as iterators, yielding one line at a time.

Iterate over the file object directly – the idiomatic way.


with open("data.txt", "r") as f:
    for line in f:
        print(line.strip())

This is equivalent to the readline() loop but without the boilerplate. Under the hood, Python calls readline() repeatedly, but the file object handles all the mechanics. It is readable, concise, and memory-efficient.


Name,Age,City,Score
Alice,28,New York,92
Bob,35,Chicago,78
Carol,24,Boston,95

Reading a File Without Newlines

Every line returned from a file includes a trailing newline character \n. Calling .strip() removes it, but .splitlines() is a more targeted option that strips all common line endings at once.

Read the whole file and split on newlines without keeping them.


with open("data.txt", "r") as f:
    lines = f.read().splitlines()

print(lines)

f.read() returns the full file as a string. .splitlines() breaks it on both \n and \r\n characters, returning a list of lines without any trailing newlines.


['Name,Age,City,Score', 'Alice,28,New York,92', 'Bob,35,Chicago,78', 'Carol,24,Boston,95']

Cleaning Text Files During Reading

Raw text files almost always need cleaning before parsing. Extra whitespace is the most common issue – inconsistent spacing between delimiters, leading or trailing blanks on values. Here is a pattern that normalizes comma-separated data on the fly.

Strip whitespace from each field in a CSV-like file while reading.


with open("data.txt", "r") as f:
    for line in f:
        cleaned = ",".join(part.strip() for part in line.split(","))
        print(cleaned)

Each line is split on commas, producing a list of field strings. .strip() removes leading and trailing whitespace from each field. The join() reassembles the line with single commas and no extra padding.


Name,Age,City,Score
Alice,28,New York,92
Bob,35,Chicago,78
Carol,24,Boston,95

Parsing a Text File into a Python Dictionary

When each line in a file represents a record with known fields, converting to a pythonic list of dictionaries is often the fastest path to something usable. The first line serves as column headers.

Parse a CSV-like file into a list of dictionaries, one per record.


records = []

with open("data.txt", "r") as f:
    header = f.readline().strip().split(",")
    for line in f:
        values = [v.strip() for v in line.strip().split(",")]
        record = dict(zip(header, values))
        records.append(record)

for r in records:
    print(r)

f.readline() reads the header row, which gets split into column names. Every subsequent line is split into values, zipped with the header to form key-value pairs, and appended to the list as a dictionary.


{'Name': 'Alice', 'Age': '28', 'City': 'New York', 'Score': '92'}
{'Name': 'Bob', 'Age': '35', 'City': 'Chicago', 'Score': '78'}
{'Name': 'Carol', 'Age': '24', 'City': 'Boston', 'Score': '95'}

Parsing a Text File as a Pandas DataFrame

For anything beyond simple ad-hoc parsing, Pandas is worth reaching for. A DataFrame gives you a tabular view with built-in operations for filtering, grouping, and transforming.

Load a clean CSV file into a Pandas DataFrame.


import pandas as pd

df = pd.read_csv("data.txt")
print(df)

pd.read_csv() assumes the first row is a header, infers column types, and splits on commas automatically. For non-standard files, parameters like sep, header, and names handle custom formats.


    Name  Age       City  Score
0  Alice   28   New York     92
1    Bob   35    Chicago     78
2  Carol   24     Boston     95

Here is how to handle a file with a different delimiter and no header row.

Load a pipe-delimited file with no header, assigning column names.


import pandas as pd

df = pd.read_csv(
    "data.txt",
    sep="|",
    header=None,
    names=["Name", "Age", "City", "Score"]
)
print(df)

sep="|" splits on pipe characters instead of commas. header=None tells Pandas there is no header row. names provides column names explicitly.


    Name  Age       City  Score
0   Alice   28   New York     92
1     Bob   35    Chicago     78
2   Carol   24     Boston     95

Parsing a Text File as JSON

JSON is the standard format for structured data exchange. After parsing a text file into a list of dictionaries, converting to JSON is one json.dumps() call away.

Parse a CSV file into dicts and dump to a formatted JSON string.


import json

records = []

with open("data.txt", "r") as f:
    header = f.readline().strip().split(",")
    for line in f:
        values = [v.strip() for v in line.strip().split(",")]
        record = dict(zip(header, values))
        records.append(record)

json_output = json.dumps(records, indent=4)
print(json_output)

The file is parsed the same way as before, producing a list of dictionaries. json.dumps() converts that list to a JSON string. indent=4 pretty-prints it with 4-space indentation.


[
    {
        "Name": "Alice",
        "Age": "28",
        "City": "New York",
        "Score": "92"
    },
    {
        "Name": "Bob",
        "Age": "35",
        "City": "Chicago",
        "Score": "78"
    },
    {
        "Name": "Carol",
        "Age": "24",
        "City": "Boston",
        "Score": "95"
    }
]

To write the JSON directly to a file instead of printing it, use json.dump() which writes to a file object.

Write the parsed records directly to a JSON file on disk.


import json

records = []

with open("data.txt", "r") as f:
    header = f.readline().strip().split(",")
    for line in f:
        values = [v.strip() for v in line.strip().split(",")]
        record = dict(zip(header, values))
        records.append(record)

with open("output.json", "w") as f:
    json.dump(records, f, indent=4)

The second with open() block opens output.json in write mode. json.dump() writes the JSON representation directly to the file handle rather than building a string in memory first.


(Writes to output.json with formatted JSON content)

Reading Files with pathlib

The pathlib module, covered in the frozenset reference, added in Python 3.4, provides an object-oriented interface for file paths. A Path object can read or write a file in a single call.

Read entire file content with Path.read_text().


from pathlib import Path

p = Path("data.txt")
content = p.read_text()
print(content)

Path.read_text() opens the file, reads all content, and closes it – all in one method call. The encoding parameter can be passed if the file is not UTF-8.


Name,Age,City,Score
Alice,28,New York,92
Bob,35,Chicago,78
Carol,24,Boston,95

You can also combine read_text() with .splitlines() to get a list without the trailing newlines.

Read file as text and split into lines with splitlines().


from pathlib import Path

p = Path("data.txt")
lines = p.read_text().splitlines()
print(lines)

read_text() returns the full file as a string. .splitlines() splits it on line boundaries, returning a list with no newline characters.


['Name,Age,City,Score', 'Alice,28,New York,92', 'Bob,35,Chicago,78', 'Carol,24,Boston,95']

For more on working with files, see the user input guide and pythonic code patterns.

FAQ

Q: What is the difference between read() and readlines()?

read() loads the entire file as a single string. readlines() returns a list where each element is one line from the file. Both load the full file into memory, but readlines() splits it into lines first.

Q: How do I handle encoding errors in text files?

Pass the encoding parameter explicitly to open(). Common values are "utf-8", "latin-1", and "cp1252". If Python raises a UnicodeDecodeError, try errors="ignore" to skip unrecognizable characters or errors="replace" to substitute them with placeholders.

Q: Can I read a file without the with statement?

Yes, but the file handle stays open until f.close() is called explicitly. If the script exits before that call, the handle may not be released promptly. The with statement guarantees closure even when exceptions occur.

Q: What is the most memory-efficient way to read a large text file?

Iterate over the file object directly with for line in f. This yields one line at a time without loading the full file into memory. Using readline() in a loop produces the same result with more boilerplate.

Q: How do I append to a text file instead of overwriting it?

Open the file with mode="a" instead of "r". Any write operations append to the end of the file without truncating existing content. Use mode="w" to overwrite.

Text files predate every modern data format, and they still show up everywhere. Once you know how to open them correctly, read them efficiently, and transform raw text into structured data, you can handle logs, configs, exports, and flat-file databases without reaching for a library you do not need.

Vignya Durvasula
Vignya Durvasula
Articles: 125