This is article is a simple tutorial about how we can read text files using the Pandas library in Python. Text files nowadays help to store a lot of raw information. They are one of the simplest ways of accessing a particular piece of information. They can hold the following:
- Raw info
- Messages and many more
So, to manage it there are some tools and techniques through which we can easily extract what we need. One of those is through computer programming. Let us see in more detail how it works.
Steps to Read Text Files using Python Pandas
A file in a computer can store a various number of files and extensions. In general, files are for storing information about anything. So, there is no specific definition for that. But, their extensions speak a lot about them. Every extension defines a different bit of data stored in it.
For example, a file of a particular programming language like python has an extension of .py. Extensions are simply to poster how the type of file it is and what data it represents.
Creating a sample.txt file in windows
The process is very simple to create a text file in windows. Follow the steps below:
- Go to the windows search bar and type in Notepad. Click on that.
- It opens with a blank page. There we can put any of the text or info we want and make changes to it anytime.
- After finishing the work, press Ctrl+S or go to the File option in the top left corner and click on Save to save the file in your desired location.
Read text files in Pandas
Pandas is a library in Python that covers some of the necessary data. It is mainly in use in the fields of Data Science and Machine Learning. It is an open-source project just like Python where anyone can contribute to the development.
Go to this link for more info. Following are its uses:
- Data analysis
- Data preprocessing
- Data cleaning
- Data wrangling
- Accessing information from files embedded on external links
- Extracting data from JSON, SQL, Excel file formats.
Purely built-in Python and other supportive libraries it provides a best workspace for managing a ton of data
Text-file Methods in Python Pandas
In Data Science the amount of information we fetch is huge so it is all enclosed in a file called a dataset. This dataset can be of thousands of rows and columns with various inputs. Pandas provide a lot more functions and methods for processing our data.
- read_excel() : read an excel file
- read_csv() : read a comma separated value file
- info() : display the information about all columns
- isna() : check the missing values
- sum() : sum of the values of any column of various data types
- dropna() : dropping a column
- head() : return first 5 rows of the dataset. But, can return according to the number when we give inside the braces.
These are the main functions. To know more about the library in advance, visit this link for the getting started guide.
Python PIP command namely ‘package installer for python’ makes it easy to install Pandas in any system. But, there are some limitations to this. But first, go to command prompt and type in
Make sure you have Python 3.6 or later.
Next type pip install pandas as shown below:
Installing Pandas using Anaconda
Note: For this you need Anaconda installed on your system.
Pandas come preinstalled with Anaconda but for reference, we shall know how to add new libraries through the conda prompt.
So, open the Anaconda prompt and type in this command
conda install pandas
Thus, we confirmed that this library is already present in conda environment.
So, after installing and getting some rough info it’s time to get more familiar with it. The first thing to do is import the library and check whether it is correctly installed or not.
If it gives no error after installation then it’s ready to use.
Reading a file in pandas
The tutorial is very simple about reading a file. We will read three types of files in this.
- Coma Separated Value files
- Excel files
- Text files
There are special functions for reading each file. As discussed earlier it has read_excel() and read_csv(). environment – ‘Jupyter Notebooks’
Reading an excel file in Python:
Sample file used –“train.xlsx”
import pandas as pd data = pd.read_excel('train.xlsx') data.head()
Reading a text file in Python:
Sample file used – “titanic.txt”
import pandas as pd data_1 = read_csv('titanic.txt') data_1.head()
Here, we conclude this topic. So, in this way, we can read some of the files through pandas and make our data science and machine learning journey smoother. I think this is the most relevant way to getting started with pandas and configuring it in the system.