Pandas is an extensive library for external data preprocessing and internal dataset creation. It is one of the main packages that help in preprocessing information and cleaning it for better use.
The best feature is that it enables to read and fetch a large amount of data from the servers.
This helps a lot better in Python’s web scraping and collection of critical points online. This article speaks about one of the notable features of this module which is The Panda’s Shape Attribute.
Before we start the main thing is we need to check out tools and weapons for this game. So, let us make sure of it.
Tools and technologies:
- Python: version 3.6 or above
- IDE: Jupyter Notebooks
- Browser: Google Chrome
- Environment: Anaconda
- Supportive packages: Numpy and Matplotlib
- A stable internet connection (necessary only to read data from the server).
Also we will make sure what are we going to cover in this article:
What we’ll cover in this article:
- What is the shape attribute in Pandas
- Reading a dataset
- Using shape in that dataset
Now we are ready for this action so let us jump right in!
What is the shape attribute in Pandas?
A data frame is the actual representation of information about a specific topic. This can be from various data streams and industry sections. Probably every individual and organization from particular sectors in this world of modernization maintains critical data. Its principal or major format is Tabular. But this tabular data is in various extensions like SQL, Excel, JSON, etc. The below image shows the actual picture:
It can be either small or large. In most cases, the datasheet is very larger than we expect. Thus, some human mistakes may happen while taking into the record the count of rows and columns.
So, to tackle this difficulty, the shape attribute in the pandas library is for checking the actual number of rows and columns inside a dataset or a data frame.
Syntax to read any dataset’s shape – This is the general syntax to read the shape of the dataset:
Reading a dataset in Pandas
The dataset reading is bringing into the picture what actually exists inside it. This is performed using the read function in Pandas. It has different forms for different file extensions. We will read three datasets to check each one’s shape.
General syntax to read a dataset:
import pandas as pd
data_variable = pd.read_file('filename.extension')
# The read_file method is an example. There are different methods for each file extension.
In the above image, we can see how the shape attribute works. It returns a tuple that has two values. Remember that the first value denotes the number of rows and the second value denotes the number of columns. In short, this tells us that the dataset is much larger. It has 2,671 rows and 10 columns.
Its name is salary.csv this dataset shape is (16, 4). Thus it has 16 rows and 4 columns.
This dataset is titanic.csv. From the shape attribute, we can see that it has 418 rows and 12 columns present in this dataset.
Some different ways to use the shape attribute
Now that we came to know how to use shape through these three examples. There are some notable key points that we can make use of for this attribute.
- To retrieve only row count.
- To retrieve only column count.
As we know that it returns a tuple of rows, columns. So, we can use index slicing for this. tuples are immutable but, the elements are accessible through indexing methods. It is the same as we do with the lists. Let us see with a codebase example:
tupple_1 = (12, 42, 45,90)
To retrieve row count access the zeroth index and for the column count access the first index
data.shape # returns number of rows
data.shape # returns number of columns
This is how the shape attribute performs in Pandas. It is a very important and one of the key functions that we use for the data preprocessing.