Introduction to Visualization

1
2
3
4
5
6
7
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
%matplotlib inline
pd.plotting.register_matplotlib_converters()
print("Setup Complete")
Setup Complete
1
data = pd.read_csv('vtest.csv', index_col=0)
1
data.head()
pp1 pp2 pp3 pp4 pp5 pp6
0 -94 70 -53 38 -66 -56
1 -75 18 -38 88 35 -5
2 -21 -91 14 46 43 22
3 -22 65 3 -15 57 20
4 95 -42 -18 67 81 60
1
data.shape
(12, 6)

Line Chart

1
2
3
4
5
6
7
8
# Set the width and height of the figure
plt.figure(figsize=(10, 6))

# Add titile
plt.title("Line Chart")

# Plot line chart
sns.lineplot(data=data)
1
2
3
4
5
6
# if you not explictly designate the label parameter, it will use column's title as default
plt.figure(figsize=(10, 6))
sns.lineplot(data=data.pp1, label="pp1")
sns.lineplot(data=data.pp2, label="pp2")

plt.xlabel("Month")

Bar Chart

1
2
3
4
5
6
7
8
9
10
11
# Set the width and height of the figure
plt.figure(figsize=(10,6))

# Add title
plt.title("Bar Chart")

# Plot Bar chart
sns.barplot(x=data.index, y=data.pp3)

# Add label for vertical axis
plt.xlabel("Month")

Heatmap

1
2
3
4
5
6
7
8
9
10
11
# Set the width and height of the figure
plt.figure(figsize=(10, 6))

# Add title
plt.title("Heatmap")

# Plot heatmap
sns.heatmap(data=data, annot=True)

# Add label for horizontal axis
plt.ylabel("Month")

Scatter Plots

Normal scatter chart

Use scatter plots to display the relationships between two data

1
2
3
4
plt.figure(figsize=(10, 6))
sns.scatterplot(x=data.pp3, y=data.pp4)
plt.xlabel("pp3-value-range")
plt.ylabel("pp4-value-range")

Color-coded scatter plots

We can use scatter plots to display the relationships between (not two, but...) three variables! One way of doing this is by color-coding the points.

1
2
plt.figure(figsize=(10, 6))
sns.scatterplot(x=data.pp3, y=data.pp4, hue=np.random.randint(0, 2, size=data.shape[0]))

Regression line

Plot the line that best fits the data

1
2
3
4
plt.figure(figsize=(10, 6))
sns.regplot(x=data.index, y=data.pp3)
sns.regplot(x=data.index, y=data.pp4)
plt.xlabel("Month")

Compare two regression line

we can use the sns.lmplot command to add two regression lines, comparing two lines' relation strength

1
2
3
plt.figure(figsize=(10, 6))
data['YN'] = np.random.randint(0, 2, size=data.shape[0])
sns.lmplot(x='pp3', y='pp4', hue='YN', data=data)

Categorical scatter plot

We can adapt the design of the scatter plot to feature a categorical variable (like labels) on one of the main axes. We'll refer to this plot type as a categorical scatter plot, and we build it with the sns.swarmplot command.

1
2
3
4
plt.figure(figsize=(10, 6))
category = np.random.randint(0, 3, size=1000)
value = np.random.randint(0, 100, size=1000)
sns.swarmplot(x=category, y=value)

Distributions

Histograms

1
2
3
plt.figure(figsize=(10, 6))
value = np.random.randint(0, 10, size=100)
sns.distplot(a=value, kde=False)

kde=False is something we'll always provide when creating a histogram, as leaving it out will create a slightly different plot. Withkde=True, kernel density estimate (KDE) curve would generate at the same time

1
2
plt.figure(figsize=(10, 6))
sns.distplot(a=value, kde=True)

Density plots

The next type of plot is a kernel density estimate (KDE) plot. In case you're not familiar with KDE plots, you can think of it as a smoothed histogram.

Setting shade=True colors the area below the curve

1
2
plt.figure(figsize=(10, 6))
sns.kdeplot(data=value, shade=True)

2D KDE plots

1
2
3
4
plt.figure(figsize=(10, 6))
x_value = np.random.randint(0, 10, size=100)
y_value = np.random.randint(10, 20, size=100)
sns.jointplot(x=x_value, y=y_value, kind="kde")

Color-coded plots

We create a different histogram for each species by using the sns.distplot command three times. We use label= to set how each histogram will appear in the legend.

1
2
3
4
5
plt.figure(figsize=(10, 6))
sns.distplot(a=np.random.randint(0, 10, size=50), label="data1", kde=False)
sns.distplot(a=np.random.randint(5, 15, size=50), label="data2", kde=False)
sns.distplot(a=np.random.randint(15, 25, size=50), label="data3", kde=False)
plt.legend()
1
2
3
4
5
plt.figure(figsize=(10, 6))
sns.kdeplot(data=np.random.randint(0, 10, size=50), label="data1", shade=True)
sns.kdeplot(data=np.random.randint(5, 15, size=50), label="data2", shade=True)
sns.kdeplot(data=np.random.randint(15, 25, size=50), label="data3", shade=True)
plt.legend()

Changing styles with seaborn

Seaborn has five different themes: (1)"darkgrid", (2)"whitegrid", (3)"dark", (4)"white", and (5)"ticks"

1
2
3
4
5
6
style = ["darkgrid", "whitegrid", "dark", "white", "ticks"]
plt.figure(figsize=(30, 5))
for i in range(i):
plt.subplot(1, 5, i + 1)
sns.set_style(style[i])
sns.lineplot(data=np.random.randint(10, size=20))