While Python is all game for analyzing data, it is not inherently blessed with the sources of data sets that ought to be analysed. All these data sets are to be sourced from elsewhere & are to be fed into Python for the magic to happen. We shall explore one such technique for importing data into Python using one of its in-built features. The file of interest in this article shall also be a bit specific – a CSV file with headers!
We shall demonstrate the sequence of operations using the following dataset in which each entry in a row is separated from each other by a ‘tab’. Let’s get started!
Also read: Python Pandas Module Tutorial
One shall get things started by importing the Pandas library into the active Python window using the below code.
import pandas as pd
Hit enter once done & wait for a few moments while the software loads the ‘Pandas’ library in the backend. This can very well be spotted by the arrowheads preceding every line of code. These arrows shall not appear in the new line before the ‘Pandas’ are fully loaded.
Only upon successful loading of the Pandas, these arrowheads shall appear as shown in the below image.
Using read_csv() to read CSV files with headers
CSV stands for comma-separated values. Which values, you ask – those that are within the text file!
What it implies is that the values within the text file are separated by a comma to isolate one entry from the other. Though it states only ‘comma’ as a separator, CSV is broadly used to denote the text files within which the separation is carried out by tabs or spaces or even colons, to name a few.
Following is the syntax of read_csv().
df = pd.read_csv(“filename.txt”,sep=”x”, header=y, names=[‘name1’, ‘name2’…])
- df – dataframe
- filename.txt – name of the text file that is to be imported.
- x – type of separator used in the .csv file.
- “\t” – tab
- “,” – comma
- “ “ – space & so on
- y – type of header in the data
- None – if the entries in the first row are not headers
- 0 – if the entries in the first row are headers
Now we shall apply this syntax for importing the data from the text file shown earlier in this article.
The “filename.txt” is replaced by “Sales Data.txt”, “x” is replaced by “\t” & y is replaced by 0 (zero) since the data contain a header row. After these replacements, the resulting code shall be as follows,
df = pd.read_csv("Sales Data.txt", sep="\t", header=0)
Hit ENTER & one shall know that there aren’t any errors if the arrowheads appear after a few moments of utter silence.
The arrowheads tell that the data has been successfully imported into Python but would it give us any sort of satisfaction, had we not sneaked a peek into it?
The print() command available in Python serves the purpose. The data frame to which the data was loaded onto using the read_csv() command can now be viewed using,
Hit ENTER after typing the above & the imported data shall appear as shown below.
It is also to be noted that even if the header=0 is skipped in the code, the read_csv() is set to choose 0 as the header (i.e) the first row as a header by default so that the data is imported considering the same.
Now that we have reached the end of this article, hope it has elaborated on how to read CSV files with Headers using Pandas in Python. Here’s another article which details the usage of fillna() method in Pandas. There are numerous other enjoyable & equally informative articles in AskPython that might be of great help for those who are looking to level up in Python.