How to create a stacked barchart with python and matplotlib

When data is being extracted and analysed, this very often falls to people who will not necessarily take decisions based on them. This, typically, means that you, the data analyst, need to present the data in a clear and concise manner to the people who will act on them. Unsurprisingly, the visualization of data in graphs is an essential skill here, no matter what you are trying to show. In this matplotlib data visualization tutorial, I want to explore the creation of a bar chart with the help of the Pandas library, Matplotlib and the Jupyter notebook.

I previously looked at data extraction of a CSV file using Python and the Pandas library from the World Bank data and I will utilize this extracted information for the purpose of creating a stacked bar chart.

Stacked bar charts are useful when comparing totals vs. its parts as Smashingmagazine put it so aptly. We will use it here, to look at some real world data and visualise how the demographics of Germany has changed over time.

The Data Set

We will start this section off with a Pandas dataframe that looks at the demographics developments in Germany from 1960 all the way to 2010 with 10 years as the separator, called popln. The form of this dataframe is as follows:

 	0-19    	20-34   	35-44 	        45-54 	        55-64    	65+
1960 	26.159470 	20.394139 	12.062372 	15.297906 	13.230887 	12.855226
1970 	27.870625 	19.589520 	11.887648 	11.065785 	13.642741 	15.943682
1980 	25.142428 	19.382537 	14.004962 	11.693726 	10.517114 	19.259232
1990 	20.335919 	23.155676 	12.355902 	13.876143 	11.174396 	19.101963
2000 	19.994729 	19.081005 	15.803437 	12.317716 	13.187927 	19.615186
2010 	17.922409 	17.387989 	13.948844 	15.783613 	11.770580 	23.186565

It shows the percentage distribution of each age group over the course of 50 years. A very easy way to visualise this development over time, is to create a stacked bar chart, where each bar represents one year.

A simple Stacked Bar Chart

To my surprise I have found out that it was very easy to do in the version of Pandas (0.20.3) and Matplotlib (2.0.2) that I use. After all, I come across several different and fairly complex code snippets on how to create a stacked bar chart. In fact, all you have to do is set a flag in the bar chart option and you are good to go. If you type in the below code, you will see the chart below.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

title = 'Development of Age Composition in the Female German Population from 1960 to 2010', title=title)
plt.legend(loc='upper right', bbox_to_anchor=(1.22, 1))

Please note here, as I said, poplnis the Pandas dataframe that I talked about and the line starting with plt.legendis positioning the legend on the top right-hand side of the chart for easier readability.

With this code, you should receive an image similar to this one.

Simple Stacked Bar Chart

As we can already see from this graph, the 65 years olds and above age group has been growing steadily since the start of the counting in the 1960s. In 1970, we can see the last remnant of the baby-boomers, where the birthrate has been on the rise. Then we see a steady decrease of the birth rates, with a gradual increase of the advanced age groups.

Color schemes and styles

Additionally, if you want to adjust the visuals a little bit, you can think about adding an edgecolor with the respective flag. Also how about changing the colourmap using the cmap flag? Adding a black edge and the inferno colormap, will, for example, yield the following., title=title, edgecolor='black', cmap='inferno')

Stacked Bar Chart with Inferno Colormap

Error bars and other tweakables.

Last but certainly not least, if you need to add an error bar, the yerrargument comes in handy, that takes a list or tuple with the corresponding error ranges for each data set. In this case, it is not really necessary, but for the sake of argument, we will do it anyway. Going back to the original style and by only adding the yerrvariable, we get the following:, title=title, yerr=(1,2,3,4,5,6))

Stacked Bar Chart with Error Bars


In summary, stacked bar charts are a useful way look at the different sections that make up the whole of a data set. Here we had a look at real world data, to visualise a stacked bar chart.

Naturally, these were not all ways to modify a stacked bar chart. For more information on the possibilities, check out the matplotlib info page. Can you already guess what the hatchargument will adjust?

Did you find this post useful? Let me know what you think of this matplotlib data visualization tutorial by leaving a comment. Thanks for reading đŸ™‚

This entry was posted in Data Science, Pandas, Tutorial and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *