5 Techniques for Reading Multiple Lines from Files in Python

5 WAYS TO READ MULTIPLE LINES OF A FILE IN PYTHON

File handling is one of the most fundamental concepts taught in any programming language course. Knowing how to open, read, and manipulate files and close them is important.

Domains like machine learning and data science always require us to read multiple files, sometimes all at once. Machine learning engineers and data scientists might have to read a few files and save their outputs in new files. Hence, it is crucial to know the mechanisms related to file systems.

Any programming language has a dedicated system for file handling. There are mainly two operations we can perform on a file – read a file and write something onto the file. These operations are accompanied by three different modes – read, write, and append, which can be further extended to a few other modes based on the language.

Reading multiple lines from a file can be achieved through various methods including for loops, readlines, list comprehension, read method, and itertools’ islice. Each method offers unique advantages, such as memory efficiency or code conciseness, making them suitable for different scenarios. This guide explores these five efficient techniques, enhancing your file handling skills in Python.

Also read: Fastest way to write huge data onto a file here

In this post, we will explore all possible ways to read multiple lines of a file in Python. Before that, let us take a look at the file we are going to use for this scenario.

Our Sample Dataset

The file I have used for this demonstration is a collection of country names and codes that have used the hashtags – #coronavirus, #coronavirusoutbreak, #coronavirusPandemic, #covid19, #covid_19. Such datasets are widely used for sentiment analysis and other natural language processing tasks, but we are using it for reading multiple lines.

Let us take a look at the file. Since it is a csv file, we would have to store it in the form of a data frame.

import pandas as pd
df = pd.read_csv('/content/Countries.CSV')
df
Data frame
Data frame

It has 230 rows and 2 columns, and in the next section, we are going to try to read multiple lines/rows of this file.

How to Read Multiple Lines of a File?

In this section, we are going to discuss five approaches to read multiple lines of the dataset at once and print them.

Method 1: Iterating with a For Loop

In this approach, we are going to use the reading mode of the file handling system, iterate through each line or row, and print the line to the screen.

file_path = '/content/Countries.CSV'
with open(file_path, 'r') as file:
    line_count = 0
    for line in file:
        if line_count < 25:
            print(line.strip())
            line_count += 1
        else:
            break

In addition to reading the file, we are also removing any whitespace from the lines and printing the clean output. We are printing the first 25 lines of the file.

Read the first 25 lines using for loop
Read the first 25 lines using for loop

Method 2: Utilizing the Readlines Function

The Readlines method is used to read all the lines in a file and return a list.

file_path = '/content/Countries.CSV'
with open(file_path, 'r') as f:
    lines = f.readlines()[:25] 
print(lines)

When we print the lines, the output we get is a list that consists of the first 25 lines of the file.

Output:
['country,country_code\n', ',\n', 'Afghanistan,AF\n', 'Aland Islands,AX\n', 'Albania,AL\n', 'Algeria,DZ\n', 'American Samoa,AS\n', 'Andorra,AD\n', 'Angola,AO\n', 'Antarctica,AQ\n', 'Antigua and Barbuda,AG\n', 'Argentina,AR\n', 'Armenia,AM\n', 'Aruba,AW\n', 'Australia,AU\n', 'Austria,AT\n', 'Azerbaijan,AZ\n', 'Bahamas,BS\n', 'Bahrain,BH\n', 'Bangladesh,BD\n', 'Barbados,BB\n', 'Belgium,BE\n', 'Belize,BZ\n', 'Benin,BJ\n', 'Bermuda,BM\n']

Method 3: Streamlining with List Comprehension

List comprehension is a great approach to reducing the lines in the code by writing it simply and in one or two lines. List comprehension also returns the output as a list.

file_path = '/content/Countries.CSV'
with open(file_path, 'r') as file:
    lines = [line.strip() for line in file]
print(lines[:25])

Instead of writing a for loop of two lines for iterating through each line, we can do this simply as in line 3 of the code.

['country,country_code', ',', 'Afghanistan,AF', 'Aland Islands,AX', 'Albania,AL', 'Algeria,DZ', 'American Samoa,AS', 'Andorra,AD', 'Angola,AO', 'Antarctica,AQ', 'Antigua and Barbuda,AG', 'Argentina,AR', 'Armenia,AM', 'Aruba,AW', 'Australia,AU', 'Austria,AT', 'Azerbaijan,AZ', 'Bahamas,BS', 'Bahrain,BH', 'Bangladesh,BD', 'Barbados,BB', 'Belgium,BE', 'Belize,BZ', 'Benin,BJ', 'Bermuda,BM']

Method 4: Employing the Read Method

Another traditional way is to use the read method. Along with this line, we are also using the splitlines method to return the output as a list. It is generally used to split a string into a list.

file_path = '/content/Countries.CSV'
with open(file_path) as f:
    content = f.read()
    lines = content.splitlines()
print(lines[:100])
['country,country_code', ',', 'Afghanistan,AF', 'Aland Islands,AX', 'Albania,AL', 'Algeria,DZ', 'American Samoa,AS', 'Andorra,AD', 'Angola,AO', 'Antarctica,AQ', 'Antigua and Barbuda,AG', 'Argentina,AR', 'Armenia,AM', 'Aruba,AW', 'Australia,AU', 'Austria,AT', 'Azerbaijan,AZ', 'Bahamas,BS', 'Bahrain,BH', 'Bangladesh,BD', 'Barbados,BB', 'Belgium,BE', 'Belize,BZ', 'Benin,BJ', 'Bermuda,BM', 'Bhutan,BT', 'Bolivia,BO', '"Bonaire, Sint Eustatius and Saba",BQ', 'Bosnia and Herzegovina,BA', 'Botswana,BW', 'Brazil,BR', 'British Virgin Islands,VG', 'Brunei,BN', 'Bulgaria,BG', 'Burkina Faso,BF', 'Burundi,BI', 'Cambodia,KH', 'Cameroon,CM', 'Canada,CA', 'Cape Verde,CV', 'Cayman Islands,KY', 'Central African Republic,CF', 'Chad,TD', 'Chile,CL', 'Colombia,CO', 'Comoros,KM', 'Congo Brazzaville,CG', 'Costa Rica,CR', 'Cuba,CU', 'Curaçao,CW', 'Cyprus,CY', 'Czech Republic,CZ', 'Democratic Republic of Congo,CD', 'Denmark,DK', 'Djibouti,DJ', 'Dominica,DM', 'Dominican Republic,DO', 'East Timor,TL', 'Ecuador,EC', 'Egypt,EG', 'El Salvador,SV', 'Equatorial Guinea,GQ', 'Estonia,EE', 'Ethiopia,ET', 'Falkland Islands (Malvinas),FK', 'Faroe Islands,FO', 'Fiji,FJ', 'Finland,FI', 'Former Yugoslav Republic of Macedonia,MK', 'France,FR', 'French Guiana,GF', 'French Polynesia,PF', 'Gabon,GA', 'Gambia,GM', 'Georgia,GE', 'Germany,DE', 'Ghana,GH', 'Gibraltar,GI', 'Greece,GR', 'Greenland,GL', 'Grenada,GD', 'Guadeloupe,GP', 'Guam,GU', 'Guatemala,GT', 'Guernsey,GG', 'Guinea,GN', 'Guinea Bissau,GW', 'Guyana,GY', 'Haiti,HT', 'Hashemite Kingdom of Jordan,JO', 'Hellas,GR', 'Honduras,HN', 'Hong Kong,HK', 'Hungary,HU', 'Iceland,IS', 'India,IN', 'Indonesia,ID', 'Iraq,IQ', 'Ireland,IE', 'Islamic Republic of Iran,IR']

Method 5: Leveraging Itertools’ Islice

The slice method from the itertools is used to return a slice of the iterator. It returns the selected part from the iterator, which makes it popular to use when we only require a certain part of the information.

from itertools import islice
file_path = '/content/Countries.CSV'
with open(file_path, 'r') as f:
    lines = islice(f, 25)
    for line in lines:
        print(line.strip())
Islice
Islice

Summary

In this post, we have discussed the possible ways to read multiple lines of a file in Python. We have used a csv file, but these methods can be valid for text files such as txt as well. Would you like to try these methods for text files?

References

Dataset

Covid19 Tweets