Pandas read_csv() With Custom Delimiters

pandas_read_csv_delimiter-title.png

In this article, we will understand how to use the read_csv() function with custom delimiters. To start with, let’s first understand the basics.

If you already know the basics, please skip to using custom delimiters with Pandas read_csv()

What is Pandas?

There are many types of data structures in use today, some we might know and some may not. Pandas is a very popular Python library that mainly allows us to create data structures of two types:

  • Data frames
  • Series

Data frames are matrices of rows and columns that store data in a table-like format. The number of items in a data frame needs to be equally quantized, i.e. every column must have the same number of items in it.

Series are single-dimensional data structures, which are moreover like an array that can store items of different data types. It is mainly created by constructor Pandas.

What is a CSV file?

CSV stands for comma-separated values.

For example, let’s say that a file exists, which is filled with multiple random values but when viewed together, it does not make any sense. But if we separate all the values with a comma, it turns out to be a school record, filled with a database of students, their names, roll numbers, addresses, etc.

What is a delimiter?

A delimiter is a special character or a punctuation mark, which is used to segregate or display differences between two words or numbers. In most cases, commas are used as delimiters, but other characters can also be used.

As we observed in the above example, a bunch of data having no particular meaning starts to make sense once it gets segregated with the use of commas, the same way, in a .csv text file, when commas are filled between data, it takes a form of a table with rows and columns.

So, the process of turning a file with random values into a table that makes sense is called delimiting.

Delimiting is generally done by commas, but in certain cases, it can be done with operators, punctuation marks as well as special characters too.

Now let’s understand what is read_csv() function is and how it works.

Using the Pandas read_csv() method

This Pandas function is used to read (.csv) files. But you can also identify delimiters other than commas. This feature makes read_csv a great handy tool because with this, reading .csv files with any delimiter can be made very easy.

Let’s look at a working code to understand how the read_csv function is invoked to read a .csv file. We have a pre-ready .csv file that contains car data of a number of car companies and it is named ‘Car_sales.csv’.

Example code

import pandas as pd

CarData = pd.read_csv('Car_sales.csv')

In the above code, we initialized a variable named ‘CarData’ and then used it to store all the values from ‘Car_sales.csv’ in it. The values in the .csv file are comma-separated so we did not need to specify any more iterations inside the read_csv parameter to the compiler.

The read_csv function allows choosing from a great list of parameters and then using it whenever necessary or on a makeshift basis. There is only one parameter that is mandatory to use, which is specifying file name or file path. (Note: When recreating the above code, you need to mention the file path, as the file name can only be used when both the Python .txt file and the .csv file are present in the same directory).

Using Custom Delimiters With read_csv()

Let’s now learn how to use a custom delimiter with the read_csv() function. We’ll show you how different commonly used delimiters can be used to read the CSV files. You can replace these delimiters with any custom delimiter based on the type of file you are using.

1. Semicolon delimiter

As we know, there are a lot of special characters which can be used as a delimiter, read_csv provides a parameter ‘sep’ that directs the compiler to take characters other than commas as delimiters. let’s understand how can we use that.

Suppose we have a database with the contents, and the file is named ‘Book1.csv’:

Name;Age;Grade
Jay;18;12
Shiv;18;12
Abin;16;10
Shweta;14;9
Shreya;10;5

Now, If we go by the conventional norms, then using:

import pandas as pd
df = pd.read_csv('Book1.csv')
print(df)

Will produce an output:

semicolon-nondelimited-output.png

But, if we add ‘sep’ to our read_csv syntax, the end result changes:

Code:

import pandas as pd
df = pd.read_csv('Book1.csv', sep=';')
print(df)

Output:

semicolon-delimitation-output.png

2. Vertical Bar delimiter

If a file is separated with vertical bars, instead of semicolons or commas, then that file can be read using the following syntax:

import pandas as pd
df = pd.read_csv('Book1.csv', sep='|')
print(df)

3. Colon delimeter

In a similar way, if a file is colon-delimited, then we will be using the syntax:

import pandas as pd
df = pd.read_csv('Book1.csv', sep=':')
print(df)

Conclusion

Delimitation is a very important function of .csv files, and a lot of .csv files requires delimitation. All possible ways of delimitation are explained in this article to make you grasp the concepts.

We touched the very basics, starting with explaining about Pandas and CSV’s and then we progresses towards delimitation and how it is done. We also learned about different kinds of delimiters like – semicolons, commas, vertical bars, and colons.

I hope this article helped you in learning these concepts easily.