Notebooks

Introducing "Reporter"

03 Sep 2013 — Last updated: 04 Sep 2013, 04:26PM

As a programmer and journalist, I constantly grapple with a dual-audience problem. Fellow programmers (including my future self, circling back on earlier work) want to see the entire code and process behind any given conclusion, chart, or number. These details assist debugging, fact-checking, and knowledge-sharing.

But non-programmer collaborators and readers just as strongly and legitimately want the opposite: only the facts, with none of the confusing, distracting code.

So, in my spare time, I've been building Reporter, an attempt to resolve this tension. With Reporter, you don't need to choose between the comprehensive and the comprehensible; you can have both at once. This site, notebooks.jsvine.com, is built on Reporter, so you're experiencing a live demonstration. Take this chart, for example:

import pandas as pd
import matplotlib as mpl
import mplstyle, mplstyle.styles.simple
mplstyle.set(mplstyle.styles.simple)
mplstyle.set({"figure.figsize": (6, 4)})
# Data from the Social Security Administration
# Source data: http://www.ssa.gov/oact/babynames/limits.html
# Parsed data: https://github.com/jsvine/babynames
names = pd.read_csv("/Users/jsvine/Dropbox/babynames/data/name-counts.csv")
# Select only newborns named Barack
baracks = names[names["name"] == "Barack"].set_index("year")
# Plot this data as a bar chart
ax = baracks["count"].plot(kind="bar", 
    color="teal", 
    alpha=0.5)

# Give the chart a little breathing room at the top
ax.set_ylim(0, baracks["count"].max() * 1.1)

# Turn off the vertical grid lines
ax.xaxis.grid(False)

# Remove the pesky, mysterious stray line
ax.lines[0].remove()

# Fix the x-axis tick rotation
mpl.pyplot.xticks(rotation=0)

# Label the chart
ax.set_title("American Newborns Named Barack, 2007-2012\n")
ax.set_xlabel("Year")
ax.set_ylabel("Newborns")

# Add numbers to the top of each bar
for i in range(len(baracks)):
    bar = ax.patches[i]
    xpos = bar.get_x() + bar.get_width() / 2
    count = baracks["count"].iget(i)
    ax.text(xpos, count + 1, str(count), ha="center")
    
pass

You can take my word that this chart is accurate and not totally made up. Or, you can click the "Show Code" button at the bottom of this page, and see for yourself.

A basic bar chart is a trivial example. But as analyses grow in complexity, "showing your work" — in a way that doesn't alienate non-technical readers — grows in importance.

The show/hide-code toggle is Reporter's main public-facing feature. There are a few others, including easy-to-read and easy-to-download data tables, such as the one below:

baracks.reset_index()
year name sex count
0 2007 Barack M 5
1 2008 Barack M 52
2 2009 Barack M 69
3 2010 Barack M 28
4 2011 Barack M 15
5 2012 Barack M 16

But most of the work happens behind the scenes, parsing data-analysis files into their inputs and outputs. For now, Reporter accepts just one type of input, iPython notebooks. Soon, though, I'm hoping to support other formats, including those built on the R statistical progamming language.

Reporter is free and open-source. You can publish your own set of notebooks anywhere that supports file hosting. You can find instructions for getting started on the project's GitHub page. Please don't hesitate to email me with feedback, or to submit a GitHub issue.


Show CodeHide Code