**Python Seaborn module** serves the purpose of Data Visualization at an ease with higher efficiency. In order to represent the variations in a huge data set, `data visualization`

is considered as the best way to depict and analyze the data.

Seaborn stands out to have a better set of functions to carry out data visualization than Matplotlib in an optimized and efficient manner. It supports NumPy and Pandas data structure to represent the data sets.

But, in order to get started with the Seaborn module, I would strongly recommend the readers to understand the Python Matplotlib module.

## Getting started with Python Seaborn

In order to get started with the functionalities of Seaborn module, we need to install the module in our environment using the below command:

```
pip install Seaborn
```

Seaborn module requires the following modules installed to work in a smooth manner:

I’ve linked the bullet points with the relevant articles for reference.

## Data Files Used Throughout the Tutorial

We’ll be working with CSV files throughout the tutorial, so this section highlights the files that we’ll be using throughout.

Wherever you see a reference to the following file names, you can look back at this section to understand the data that’s being passed.

**Book1.csv:**

**tips.csv**:

## Python Seaborn For Statistical Analysis

**Statistical Analysis** is the basic `estimation`

out of some parameters of the data-set to a large extent. Data Visualization can be considered as the best way to perform statistical analysis i.e. predict the outcome or the cause based on diagrammatic values.

Either of the following ways can be taken into consideration during the statistical analysis:

**seaborn.scatterplot()****seaborn.lineplot()**

### 1. seaborn.scatterplot()

The `seaborn.scatterplot()`

function is basically used to depict the relationship between the parameters on the given axes respectively. Every point on the graph depicts a value corresponding to it.

**Syntax:**

```
seaborn.scatterplot(x=value, y=value, data=data)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
csv = pandas.read_csv(r'C:\Book1.csv')
res = seaborn.scatterplot(x="Name", y="Age", data=csv)
plt.show()
```

In the above example, we have imported **Python Pandas module** in order to use the `read_csv()`

function to read the contents of the data set.

The column-‘Name’ is represented by the x-axis and the column-‘Age’ by the y-axis.

**Output:**

### 2. seaborn.lineplot()

The `seaborn.lineplot()`

function can be extensively used in situations wherein we feel the need to check the dependency of a parameter on the other in a continuous manner relative to time.

**Syntax:**

```
seabron.lineplot(x=value, y=value, data=data)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
csv = pandas.read_csv(r'C:\Book1.csv')
res = seaborn.lineplot(x="Name", y="Age", data=csv)
plt.show()
```

**Output:**

## Categorical Scatter Plot

Categorical data divides and represents itself in the form of discrete groups i.e. a subset of the original data.

Python Seaborn module contains the following methods to represent and visualize categorical data:

**seaborn.catplot()****seaborn.stripplot()****seaborn.swarmplot()**

### 1. seaborn.catplot()

The `seaborn.catplot()`

function, as mentioned above, is one of the techniques to analyze the relationship between a numeric value and a categorical group of values together.

**Syntax:**

```
seaborn.catplot(x=value, y=value, data=data)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
csv = seaborn.load_dataset("tips")
res = seaborn.catplot(x="tip", y="sex", data=csv)
plt.show()
```

**Output:**

### 2. seaborn.stripplot()

The `seaborn.stripplot()`

function considers one of the input columns as categorical data input and then it plots the points accordingly in an ordinal fashion despite the different data type of the input.

**Syntax:**

```
seaborn.stripplot(x=value, y=value, data=data)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
csv = seaborn.load_dataset("tips")
res = seaborn.stripplot(x="tip", y="sex", data=csv,jitter=0.05)
plt.show()
```

The parameter `jitter`

is useful when the data set consists of data points that overlap. In such cases, setting a jitter value can help them get **uniformly distributed**.

**Output:**

### 3. seaborn.swarmplot()

The seaborn.swarmplot() function resembles the **seaborn.stripplot()** function with a slight difference. The `seaborn.swarmplot()`

function plots the data values along the categorical axis chosen. Thus, it completely avoids **overlapping**.

**Syntax:**

```
seaborn.swarmplot(x=value, y=value, data=data)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
csv = seaborn.load_dataset("tips")
res = seaborn.swarmplot(x="tip", y="sex", data=csv)
plt.show()
```

In the above example, I have passed the column ‘sex’ as the only categorical data and have plotted against the same along the x-axis, respectively.

**Output:**

## Categorical Distribution Plots

**Categorical Distribution data** basically refers to the type of data wherein the result describes the certain possibility of the random/chosen variable to belong to one of the given **possible categories**.

Python Seaborn has the following functions to represent the categorical distributed data efficiently:

**seaborn.violinplot()****seaborn.boxplot()****seaborn.boxenplot()**

### 1. seaborn.violinplot()

The `seaborn.violinplot()`

function represents the underlying distribution of the data. It depicts and represents the distribution of data against different categorical data input.

**Syntax:**

```
seaborn.violinplot(x=value, y=value, data=data)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
csv = pandas.read_csv("C:\\Book1.csv")
res = seaborn.violinplot(x=csv['Age'])
plt.show()
```

In the above example, we have considered the distribution of data along the column-‘Age’, respectively.

**Output:**

### 2. seaborn.boxplot()

The `seaborn.boxplot()`

function represents the **categorical distribution** of data and sets comparison among the different categorical data inputs.

The **‘box’ structure** represents the **main quartile of the data input** while the **‘line’ structure** represents the rest of the **distribution **of data. The **outliers **are represented by points using an **inter-quartile function**.

**Syntax:**

```
seaborn.boxplot(x=value, y=value, data=data)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
csv = pandas.read_csv("C:\\Book1.csv")
res = seaborn.boxplot(x=csv['Age'])
plt.show()
```

In the above example, we have used Book1.csv file as the input data set.

If you try to analyze the data-set, you will find the Age-12 to be an outlier type of data and the rest of the data ranging between 15-27. This is represented well by the **seaborn.boxplot()** function.

**Output:**

### 3. seaborn.boxenplot()

The `seaborn.boxenplot()`

function is quite similar to **seaborn.boxplot()** function with a slight difference in the representation.

The **seaborn.boxenplot()** function represents the distribution of the categorical data in a way where the **large quartiles **represent the features corresponding to the actual data observations. It presents the data in a format that gives us a **detailed information in a visualized form** about the entire distribution of data.

**Syntax:**

```
seaborn.boxenplot(x=value, y=value, data=data)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
csv = pandas.read_csv("C:\\Book1.csv")
res = seaborn.boxenplot(x=csv['Age'])
plt.show()
```

If you analyze and compare the below output with the input data set, it is clearly understood that **boxenplot** represents the entire distribution of the data points ranging between 12-27, along with the distribution of the categorical data with a large quartile-box structure.

**Output:**

## Categorical estimate plots

The estimation of categorical data basically refers to the representation of certain estimation or prediction of the categorical data values to the corresponding data variable.

Python Seaborn has the following functions to be used for the estimation of categorical data:

**seaborn.countplot()****seaborn.barplot()****seaborn.pointplot()**

### 1. seaborn.countplot()

The `seaborn.counplot()`

function is used to estimate and represent the categorical variable in terms of the frequency or count of it.

**Syntax:**

```
seaborn.countplot(x=value, y=value, data=data)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
csv = pandas.read_csv("C:\\Book1.csv")
res = seaborn.countplot(x=csv['Age'])
plt.show()
```

**Output:**

As seen clearly in the above image, the **countplot() function** has basically counted the frequency of the input data field and represented it along the y-axis while the data field – ‘Age’ being represented along the x-axis.

### 2. seaborn.barplot()

The `seaborn.barplot()`

function basically represents the estimated data in the form of the central tendency of the data representation.

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
csv = pandas.read_csv("C:\\Book1.csv")
res = seaborn.barplot(x=csv['Name'], y=csv['Age'])
plt.show()
```

**Output:**

### 3. seaborn.pointplot()

The `seaborn.pointplot()`

function represents the estimation of the central tendency of the distribution with the help of scatter points and lines joining them.

**Syntax:**

```
seaborn.pointplot(x=value, y=value, data=data)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
csv = pandas.read_csv("C:\\Book1.csv")
res = seaborn.pointplot(x=csv['Name'], y=csv['Age'])
plt.show()
```

**Output:**

## Customized Styles and Themes in Seaborn

Python Seaborn has in-built functions and themes to visualize the data in a better and attractive manner.

The `seaborn.set()`

function is used for the **default **theme acquisition of the output visualization.

**Syntax:**

```
seaborn.set()
```

```
import seaborn
import pandas
import matplotlib.pyplot as plt
seaborn.set()
csv = pandas.read_csv("C:\\Book1.csv")
res = seaborn.pointplot(x=csv['Name'], y=csv['Age'])
plt.show()
```

**Output:**

Python Seaborn provides us with the following themes to work with and represent, visualize the data:

**Ticks****Whitegrid theme****Darkgrid theme****Dark****White**

**Syntax:**

```
seaborn.set_style("theme-name")
```

**Example: 1-** `The dark theme`

```
import seaborn
import pandas
import matplotlib.pyplot as plt
seaborn.set_style("dark")
csv = pandas.read_csv("C:\\Book1.csv")
res = seaborn.pointplot(x=csv['Name'], y=csv['Age'])
plt.show()
```

**Output: **

**Example: 2- **`The whitegrid theme`

```
import seaborn
import pandas
import matplotlib.pyplot as plt
seaborn.set_style("whitegrid")
csv = pandas.read_csv("C:\\Book1.csv")
res = seaborn.pointplot(x=csv['Name'], y=csv['Age'])
plt.show()
```

**Output:**

## Multi-Plot grids in Seaborn

In order to represent the large data set with categorical values in a precise manner, we can draw **multiple plots of the sub-sets of data** to visualize it.

**Syntax:**

```
seaborn.FacetGird(data, col=value, col_wrap=value)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
seaborn.set_style("whitegrid")
csv = pandas.read_csv("C:\\Book1.csv")
res = seaborn.FacetGrid(csv, col="Age", col_wrap=3)
res.map(seaborn.barplot, "Name", "Age")
plt.show()
```

The `FacetGrid class`

is used to extensively represent the data with multiple plots against the sub-sets of data. It can be represented along the following dimensions:

**row****col****hue**

The parameter `col_wrap`

basically represents the number of rows along which the graphs need to be represented.

The `FacetGrid.map()`

function is used to apply a plotting technique to every subset of the data.

**Output:**

## Plotting univariate distributions with Seaborn

**Univariate distribution** basically refers to the **distribution **of the data with respect to a **single random variable/data item**.

Python Seaborn module’s `seaborn.distplot()`

function can be used to represent the univariate distribution of data set.

**Syntax:**

```
seaborn.distplot(data-column)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
seaborn.set_style("whitegrid")
csv = pandas.read_csv("C:\\Book1.csv")
res=seaborn.distplot(csv['Age'])
plt.show()
```

**Output:**

## Depicting bivariate distributions with Seaborn

**Bivariate distribution** refers to the visualization of data with respect to **two data columns or items of the data set**.

The `seaborn.jointplot()`

can be used to depict the relationship between the two data variables.

**Syntax:**

```
seaborn.jointplot(x=variable1, y=variable2)
```

**Example:**

```
import seaborn
import pandas
import matplotlib.pyplot as plt
seaborn.set_style("darkgrid")
csv = pandas.read_csv("C:\\Book1.csv")
res=seaborn.jointplot(x=csv['Age'], y=csv['Age'])
plt.show()
```

In the above example, We have used both the variables as ‘Age’ just for the sake of simplicity to depict the visualization of data.

**Output:**

## Conclusion

Thus, in this article, we have understood the basic functionality offered by **Python Seaborn for data visualization**.

## References

- Python Seaborn-Official Documentation
- Python Seaborn tutorial-JournalDev