flynn.gg

Christopher Flynn

Machine Learning
Systems Architect,
PhD Mathematician

Home
Projects
Open Source
Blog
Résumé

GitHub
LinkedIn

Blog


Creating better matplotlib charts.

2017-07-10 Feed

I use matplotlib’s pyplot regularly to generate plots for product analytics or business intelligence, whether the data comes from internal data warehousing or some external API. The barebones matplotlib package is fine for generating quick and dirty plots, but there are a lot of simple features that can provide some nice enhancements to these figures, making them much easier to comprehend (and making them easier on the eyes). Here is a rundown of the post:

Chart styles

Perhaps one of the simplest changes you can make to your plots is changing the built-in style that pyplot uses to draw plots. There are a few built-in styles that you can choose from. You can see a gallery of most of them here. Many of them are based off of the seaborn package, a statistical visualization library based on matplotlib. To see a list of built-in styles:

import matplotlib.style

print(matplotlib.style.available)

As of writing this post, the styles include:

To change to one of these styles is only one line of code.

import matplotlib.pyplot as plt

plt.style.use('ggplot')

You can also temporarily use one of these styles for a single plot using the style as a context manager.

with plt.style.context(('bmh')):
    plt.plot(x, y)

Furthermore, you can design your own fully custom style using a matplotlibrc file. You can find a full example file here.

Plotting with datetimes

I do a lot of analysis of time series data and more often than not plotting that data means that the x-axis values are string representations of dates or timestamps. Luckily matplotlib handles this data graciously as long as it consists of date or datetime objects.

If you have string represented timestamps, you can use python’s built-in datetime module to convert the values. The method is datetime.strptime() shown here. If the strings are in a pandas dataframe, you can use the pandas method .to_datetime() to convert them, documented here. Matplotlib also provides its own helper functions to convert datetimes to Gregorian ordinals and vice versa. Example:

import datetime

import pandas as pd

# Some dates
dates = ['2017-01-01', '2017-01-02', '2017-01-03']

# Convert using strptime()
datetimes = [datetime.datetime.strptime(d, '%Y-%m-%d') for d in dates]
# print(datetimes)
# [datetime.datetime(2017, 1, 1, 0, 0), datetime.datetime(2017, 1, 2, 0, 0), datetime.datetime(2017, 1, 3, 0, 0)]


# Create a dataframe with column 'date'
df = pd.DataFrame(dates, columns=['date'])

# Convert the column of strings to timestamps
df.loc[:, 'date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
# print(df['date'].values)
# array(['2017-01-01T00:00:00.000000000', '2017-01-02T00:00:00.000000000',
#        '2017-01-03T00:00:00.000000000'], dtype='datetime64[ns]')

Once you’ve converted to datetime objects, you can plot just as you normally would.

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot(df['date'], df['y_values'])

Changing tick properties

One of the downsides of plotting dates on the x-axis is that the x-tick labels usually end up crowding and overlapping. Using the setp function of pyplot, we can set properties of plot objects. We can pass the x-tick labels as a first argument. The subsequent arguments are the properties of the tick label text objects that we want to change. We set the rotation to 30 degrees (counter-clockwise), and then set ha, horizontal alignment, to right. This sets the right end of the tick label to the position it is labelling.

plt.setp(ax.get_xticklabels(), rotation=30, ha='right')

It turns out that matplotlib also has a function to do this automatically called autofmt_xdate(). Here is how you use it.

fig.autofmt_xdate()

Using tick locators

Tick locators can be used to programmatically set major and minor ticks and tick labels. The matplotlib ticker module provides a number of tick locators for finding ticks numerically or stripping them altogether. There is also the date module which is great for setting tick marks on specific dates, i.e. every Monday. This is something I like to use on plots that span several months of time.

from matplotlib.dates import WeekdayLocator, MO  # TU, WE, TH, FR, SA, SU

ax.xaxis.set_major_locator(WeekdayLocator(byweekday=MO))

Using tick formatters

You might not need the full timestamp for tick labels, especially for data that you know is fairly recent. Sometimes the month and day are sufficient. For this there is a date formatter as part of the date API. This allows you to use a datetime datetime.strftime() string to specify the way the dates should appear.

from matplotlib.dates import DateFormatter

ax.xaxis.set_major_formatter(DateFormatter('%m-%d'))

I also use the function formatter to handle how large numbers are displayed. This formatter is part of the ticker API and accepts a function with two arguments (a tick value x and a position pos) which returns a string formatted output. Here we format numbers on the y-axis to use 1 decimal point of precision and thousand-based order of magnitude suffixes (e.g. 4,200 = 4.2K, and 5,000,000 = 5.0M, etc.) in lieu of using commas in the full value.

from matplotlib.ticker import FuncFormatter

def number_formatter(number, pos=None):
    """Convert a number into a human readable format."""
    magnitude = 0
    while abs(number) >= 1000:
        magnitude += 1
        number /= 1000.0
    return '%.1f%s' % (number, ['', 'K', 'M', 'B', 'T', 'Q'][magnitude])

ax.yaxis.set_major_formatter(FuncFormatter(number_formatter))

There are two other formatters, StrMethodFormatter and FormatStrFormatter, which accept new and old style format strings as arguments to define the format.

Using \( \LaTeX \)

Sometimes it’s good to provide some simple formulas to explain what’s going on in the plots. You probably want to use \( \LaTeX \) to do it. This requires altering the matplotlib rc (run-config) to allow the use of the markup.

from matplotlib import rc

rc('text', usetex=True)

ax.set_title(r"Euler's formula: $e^{i\pi} + 1 = 0$")

Be sure to indicate to python that you’re using a raw string with the r prefix. This means that python treats the string exactly as-is so you can type pure \( \LaTeX \) markup without worrying about which characters to escape.

Unobstructive legend

If you can’t find a spot for the legend that doesn’t obstruct some of your data, you can add some transparency.

ax.legend(loc='best', framealpha=0.5)

Conditional background color

For some plots it might be useful to indicate immediately to the viewer which days are part of the weekend. For this we can use the weekday() method of datetime objects, which returns a value from 0 to 6 corresponding to Monday through Sunday for a particular datetime. Then use the built in date2num() function from the matplotlib dates module to get the plot position. Last use the axvspan() axes method to indicate the start, end, and color of the background adjustment to make around the data point.

from matplotlib.dates import date2num

for d in df['date']:
    if d.weekday() in [5, 6]:
        pos = date2num(d)
        ax.axvspan(pos - 0.5, pos + 0.5, color='#DDDDDD')

Tightening the layout

At the end of the majority of my plots and before saving to an image I use the tight_layout() method. This method automatically adjusts the plots to fit tightly within the entire figure space. It makes better use of the full space of the figure and can minimize cropping and overlapping between subplots, but just beware that it doesn’t always work.

plt.tight_layout()

A complete example

Here’s a simple example using some generated data. We use almost all of the topics discussed in this post.

matplotlib

source:

import datetime
import random

from matplotlib.dates import DayLocator, DateFormatter, date2num
from matplotlib.ticker import FuncFormatter
from matplotlib import rc
import matplotlib.pyplot as plt
import pandas as pd


# Activate latex and the bmh style
rc('text', usetex=True)
plt.style.use('bmh')

# Define the nice number formatter
def number_formatter(number, pos=None):
    """Convert larger number into a human readable format."""
    magnitude = 0
    while abs(number) >= 1000:
        magnitude += 1
        number /= 1000.0
    return '%.1f%s' % (number, ['', 'K', 'M', 'B', 'T', 'Q'][magnitude])

# Create some data and the dates
data = [400000 + 50* t ** 3 + random.normalvariate(0, 50000) for t in range(30)]
dend = datetime.date.today()
delta = datetime.timedelta(days=1)
dates = [dend - (29 - t) * delta for t in range(30)]

# Set up the dataframe
df = pd.DataFrame(columns=['date', 'visitors'])
df['date'] = dates
df['visitors'] = data

# Create the plot
fig, ax = plt.subplots(figsize=(10,3))
ax.plot(df['date'], df['visitors'], label='Blog Visitors')

# X-axis
plt.setp(ax.get_xticklabels(), rotation=30, ha='right')
ax.xaxis.set_major_locator(DayLocator())
ax.xaxis.set_major_formatter(DateFormatter('%m-%d'))
ax.set_xlim(df['date'].values[0], df['date'].values[-1])

# Y-axis
ax.yaxis.set_major_formatter(FuncFormatter(number_formatter))
ax.set_ylim(bottom=0)

# Labels
ax.set_title(r"Daily Blog Views: $\sum_{i=0}^{\infty} v_i \to \infty$")
ax.set_xlabel('Date')
ax.set_ylabel('Visitors')
ax.legend(loc='best', framealpha=0.5)

# Weekend indicators
for d in df['date']:
    if d.weekday() in [5, 6]:
        pos = date2num(d)
        ax.axvspan(pos - 0.5, pos + 0.5, color='#DDDDDD')

# Use tight layout
fig.tight_layout()

plt.show()

Happy (and better) plotting!

Further reading

Python

R

matplotlib

Back to the posts.