When data is being extracted and analysed, this very often falls to people who will not necessarily take decisions based on them. This, typically, means that you, the data analyst, need to present the data in a clear and concise manner to the people who will act on them. Unsurprisingly, the visualization of data in graphs is an essential skill here, no matter what you are trying to show. In this matplotlib data visualization tutorial, I want to explore the creation of a bar chart with the help of the Pandas library, Matplotlib and the Jupyter notebook.
I previously looked at data extraction of a CSV file using Python and the Pandas library from the World Bank data and I will utilize this extracted information for the purpose of creating a stacked bar chart.
Stacked bar charts are useful when comparing totals vs. its parts as Smashingmagazine put it so aptly. We will use it here, to look at some real world data and visualise how the demographics of Germany has changed over time.
The Data Set
We will start this section off with a Pandas dataframe that looks at the demographics developments in Germany from 1960 all the way to 2010 with 10 years as the separator, called
popln. The form of this dataframe is as follows:
0-19 20-34 35-44 45-54 55-64 65+ Year 1960 26.159470 20.394139 12.062372 15.297906 13.230887 12.855226 1970 27.870625 19.589520 11.887648 11.065785 13.642741 15.943682 1980 25.142428 19.382537 14.004962 11.693726 10.517114 19.259232 1990 20.335919 23.155676 12.355902 13.876143 11.174396 19.101963 2000 19.994729 19.081005 15.803437 12.317716 13.187927 19.615186 2010 17.922409 17.387989 13.948844 15.783613 11.770580 23.186565
It shows the percentage distribution of each age group over the course of 50 years. A very easy way to visualise this development over time, is to create a stacked bar chart, where each bar represents one year.
A simple Stacked Bar Chart
To my surprise I have found out that it was very easy to do in the version of Pandas (0.20.3) and Matplotlib (2.0.2) that I use. After all, I come across several different and fairly complex code snippets on how to create a stacked bar chart. In fact, all you have to do is set a flag in the bar chart option and you are good to go. If you type in the below code, you will see the chart below.
import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline title = 'Development of Age Composition in the Female German Population from 1960 to 2010' popln.plot.bar(stacked=True, title=title) plt.legend(loc='upper right', bbox_to_anchor=(1.22, 1))
Please note here, as I said,
poplnis the Pandas dataframe that I talked about and the line starting with
plt.legendis positioning the legend on the top right-hand side of the chart for easier readability.
With this code, you should receive an image similar to this one.
As we can already see from this graph, the 65 years olds and above age group has been growing steadily since the start of the counting in the 1960s. In 1970, we can see the last remnant of the baby-boomers, where the birthrate has been on the rise. Then we see a steady decrease of the birth rates, with a gradual increase of the advanced age groups.
Color schemes and styles
Additionally, if you want to adjust the visuals a little bit, you can think about adding an edgecolor with the respective flag. Also how about changing the colourmap using the cmap flag? Adding a black edge and the inferno colormap, will, for example, yield the following.
popln.plot.bar(stacked=True, title=title, edgecolor='black', cmap='inferno')
Error bars and other tweakables.
Last but certainly not least, if you need to add an error bar, the
yerrargument comes in handy, that takes a list or tuple with the corresponding error ranges for each data set. In this case, it is not really necessary, but for the sake of argument, we will do it anyway. Going back to the original style and by only adding the
yerrvariable, we get the following:
popln.plot.bar(stacked=True, title=title, yerr=(1,2,3,4,5,6))
In summary, stacked bar charts are a useful way look at the different sections that make up the whole of a data set. Here we had a look at real world data, to visualise a stacked bar chart.
Naturally, these were not all ways to modify a stacked bar chart. For more information on the possibilities, check out the matplotlib info page. Can you already guess what the
hatchargument will adjust?
Did you find this post useful? Let me know what you think of this matplotlib data visualization tutorial by leaving a comment. Thanks for reading 🙂