How to read a CSV from a URL using Pandas?

Read Csv From Url

The most useful, easy to understand, and efficient open-source library tool for handling data, cleaning the data, modification, and analysis in Python is Pandas. As a result, Pandas is quite useful when working with large datasets.

Pandas’ built-in function facilitates reading datasets in a variety of formats. Python users can read CSV files (Comma Separated Values files) in numerous ways with the help of the read_csv() function of the Pandas package. Let’s try to figure out how to read a CSV file from a given URL using the read_csv() function in this post.

Also check: How to Read CSV with Headers Using Pandas?

What is the read_csv() function?

It is one of the pre-defined functions of the Pandas package. It converts a .csv file into Pandas DataObject hence making it readable in Python language. The parameters passed to this function can be altered in numerous ways to achieve the user’s desired output format.

One can also pass a URL of the dataset to this function and access the data in their working IDE. An URL is like the address of a webpage and stands for Uniform Resource Locators.

To have a better understanding of the read_csv() function, here is a detailed article on the same.

Prerequisites

In order to avoid errors that might occur while reading a csv file from provided URL, make sure to understand and implement the following steps. Before we start with the implementation, do make sure to install and import the Pandas package into your system. To do so you can follow the following steps in your working IDE.

# To install the package
pip install pandas

# To import and rename 
import pandas as pd

Once you have imported the Pandas package, check the version of the installed package because the method we are going to discuss further requires Pandas version 0.19.2 or above. To check the version of your installed package run the following line of code.

# To print version in colab python notebook
print(pd.__version__)
# To check version from teminal
pip show pandas

If the Pandas package in your system is below the 0.19.2 version make sure to upgrade the package. For the Colab Python notebook, the packages are pre-install, if you need to upgrade the package install the latest version manually by specifying the version, and then do not forget to restart the runtime from the toolbar to notice the upgradation.

!pip install pandas==[version]
# Rutime -> Restart Runtime (or simply press Ctrl + M)

To upgrade the package using the terminal run the following line of code

pip install --upgrade pandas

(If the above line throws an error try the below code)
pip3 install --upgrade pandas

With the help of other packages and combining various functions together, one can read csv files from URLs using the outdated version (below 0.19.2 version) of Pandas, but it is preferred to upgrade your modules as the discussed method is much more efficient and straightforward than others.

Implementation of the read_csv() function

We will be using this URL in this article to demonstrate the implementation

https://raw.githubusercontent.com/Tanishqa-10/AskPython/main/Sampledata.csv

# assigning url to a variable
url="https://raw.githubusercontent.com/Tanishqa-10/AskPython/main/Sampledata.csv"

# passing parameter to the function
x =pd.read_csv(url)
print(x)

OUTPUT

Read CSV From URL Implementation
Read CSV From URL Implementation

Customizing the Output

One can also pass additional parameters to the read_csv() function along the URL to view data in a particular desired way. One can select the names of columns that one needs to display and pass them along with the URL as a parameter to the function for viewing only those particular columns. Take a look at the code below for a better understanding.

display = ['Name', 'Code', 'Amount']
url="https://raw.githubusercontent.com/Tanishqa-10/AskPython/main/Sampledata.csv"

print(pd.read_csv(url, usecols=display))

OUTPUT

Customizing Output
Customizing Output

The read_csv() function provides many different parameters that can be used along with the URL to customize your output. Also, take a look at this article to understand how to deal with delimiters in the csv file and make efficient use of this function.

Conclusion

The tools that Pandas offers for reading and writing data are perhaps its essential functionalities. The read_csv() method in pandas can read data that is available in a tabular form and stored as a CSV file in memory. In this article, we learned how to make use of this same function to read a csv file from provided URL in Python. At times while working with packages in Python there are chances of failures, we looked into all precautions that need to be taken in order to avoid any kinds of errors while working on the implementation.

To learn from more such detailed and easy-to-understand articles on various topics related to the Pandas package and Python programming language, visit here.

Reference

Official Documentation