Notebooks

Facebook Government Requests Per User

09 Sep 2013 — Last updated: 09 Sep 2013, 08:45AM

import pandas as pd
import matplotlib as mpl
import numpy as np
import requests
import lxml.html
import json
from StringIO import StringIO
from IPython.display import HTML
from markdown import markdown
md = lambda x: HTML(markdown(x))
import mplstyle, mplstyle.styles.simple
mplstyle.set(mplstyle.styles.simple)
mplstyle.set({
    "figure.figsize": (10, 15),
})
base_styles = mplstyle.get()

Facebook released its first Global Government Requests Report in late August. The report tallies the number of requests for user data that each national government has made through the first six months of 2013.

FB_GOV_URL = "https://www.facebook.com/about/government_requests/download.php"
fb_gov_csv = requests.get(FB_GOV_URL).text
fb_gov = pd.read_csv(StringIO(fb_gov_csv), 
    names=["country", "requests", "users_requested", "percent_complied"], 
    header=False).set_index("country")
def parse_fb_counts(s):
    commaless = str(s).replace(",", "")
    split = commaless.split(" - ")
    mean = np.mean(map(float, split))
    return mean
fb_gov["requests"] = fb_gov["requests"].apply(parse_fb_counts)
fb_gov["users_requested"] = fb_gov["users_requested"].apply(parse_fb_counts)

During that time, the U.S. government requested data on far more accounts than any other country. (The government hasn't allowed Facebook to publish the specific numbers, only ranges, for the U.S. requests. For the purposes of this analysis, we'll use the average across those ranges.) Here are the top 10 countries, by number of user accounts requested:

fb_gov.sort("users_requested", ascending=False)["users_requested"].head(10).reset_index()
country users_requested
0 United States 20500
1 India 4144
2 United Kingdom 2337
3 Italy 2306
4 Germany 2068
5 France 1598
6 Brazil 857
7 Spain 715
8 Australia 601
9 Chile 340

But there's a crucial flaw with this metric. The countries at the top of this list tend to be large and relatively developed: the sorts of countries we'd expect to have the most Facebook users overall. It's not particularly interesting they're also the countries requesting the most user data. So let's try controlling for the number of Facebook users per country, scraped from Wikipedia's Facebook statistics page, which has end-of-2012 data.

FB_USERS_URL = "http://en.wikipedia.org/wiki/Facebook_statistics"
fb_users_text = requests.get(FB_USERS_URL).text
fb_users_dom = lxml.html.fromstring(fb_users_text)
fb_user_rows = fb_users_dom.cssselect("table.wikitable")[0].cssselect("tr")[1:]
def parse_user_row(row):
    cells = row.cssselect("td")
    return {
        "country": cells[1].cssselect("a")[0].text,
        "fb_users": int(cells[4].text.replace(",", ""))
    }
fb_users = pd.DataFrame(map(parse_user_row, fb_user_rows)).set_index("country")
fb = fb_gov.join(fb_users, how="left")
missing_usage = fb[fb["fb_users"].isnull()]
md("The two datasets match up well; there are only %d countries in the government-requests dataset missing from the usage dataset:\n\n %s" % 
(len(missing_usage), "\n".join("- %s" % x for x in missing_usage.index)))

The two datasets match up well; there are only 2 countries in the government-requests dataset missing from the usage dataset:

  • Kosovo
  • Ivory Coast

To check our previous hunch, we can plot (the base-10 logarithms of) each country's number of Facebook users vs. the number of accounts that country requested.

def plot_fb_scatter():
    mplstyle.set({ "figure.figsize": (8, 8) })
    ax = mpl.pyplot.scatter(fb["fb_users"].apply(log10), fb["users_requested"].apply(log10), 
        s=20, 
        alpha=0.7).axes
    ax.set_ylim(0, ax.get_ylim()[1] * 1.1)
    ax.set_title("Facebook Users vs. Accounts Requested, by Country\n")
    ax.set_xlabel("\nLog10 of Facebook Users, End of 2012")
    ax.set_ylabel("Log10 of Accounts Requested, First Half of 2013\n")
    mplstyle.reset(base_styles)
plot_fb_scatter()
pearsons_r = fb["fb_users"].apply(log10).corr(fb["users_requested"].apply(log10))
md("""
Indeed, it appears that countries with more Facebook users tend to request data on a greater number of user accounts.
(The *Pearson's r* is %.2f, which suggests a fairly strong, positive correlation.)
""" % round(pearsons_r, 2))

Indeed, it appears that countries with more Facebook users tend to request data on a greater number of user accounts. (The Pearson's r is 0.65, which suggests a fairly strong, positive correlation.)

Let's adjust for this information by creating a new metric: accounts requested requested per million users. The chart below plots this number for all countries that requested data on at least 10 users.

fb["per_m_users"] = fb["users_requested"] * 1e6 / fb["fb_users"]
def plot_request_rate():
    fb_sort_rate = fb[fb["fb_users"].notnull() * (fb["users_requested"] > 10)].sort("per_m_users")
    ax = fb_sort_rate["per_m_users"].plot(kind="barh", color="teal", alpha=0.5)
    ax.grid(axis="y")
    for i in range(len(fb_sort_rate)):
        row = fb_sort_rate.irow(i)
        ax.text(row["per_m_users"], i+0.6, "% 0.1f" % row["per_m_users"], va="center")
    ax.set_xlim(0, ax.get_xlim()[1] * 1.1)
    ax.set_title("Facebook Accounts Requested Per Million Users\nDuring the First Six Months of 2013, by Country\n")
    ax.set_ylabel("")
    ax.set_xlabel("\nAccounts Requested Per Million Users")
plot_request_rate()

Even by this metric, the U.S. still out-requests all other major countries — though by a smaller margin than it appeared before accounting for the number of Facebook users. But the U.S. rates only second overall, dwarfed by a tiny Mediterranean island-nation.

commafy = lambda x: "{:,}".format(int(round(x)))
malta = fb.ix["Malta"]
next_highest = fb["per_m_users"].order(ascending=False).iget(1)
md("""
__Malta requested data on %d Facebook accounts per million users, more than %d times the next-highest rate.__
Flipping the denominator, that's 1 in every %s users.
""" % (
    round(malta["per_m_users"]),
    int(malta["per_m_users"] / next_highest),
    commafy(round(malta["fb_users"] / malta["users_requested"])))
)

Malta requested data on 447 Facebook accounts per million users, more than 3 times the next-highest rate. Flipping the denominator, that's 1 in every 2,238 users.

It's possible that the usage statistics for Malta are wrong or outdated; if not, the country is a remarkable outlier. A cursory search of Maltese news doesn't reveal any obvious explanations. Have any hypotheses?

Here's the data behind the chart, if you're curious:

cols = ["country", "users_requested", "fb_users", "per_m_users"]
fb[fb["fb_users"].notnull()].sort("per_m_users", ascending=False).reset_index()[cols]
country users_requested fb_users per_m_users
0 Malta 97 217040 446.922226
1 United States 20500 166029240 123.472227
2 Italy 2306 23202640 99.385242
3 Germany 2068 25332440 81.634458
4 United Kingdom 2337 32950400 70.924784
5 India 4144 62713680 66.078087
6 France 1598 25624760 62.361560
7 New Zealand 119 2256040 52.747292
8 Australia 601 11680640 51.452660
9 Portugal 213 4663060 45.678160
10 Spain 715 17590500 40.646940
11 Singapore 117 2915640 40.128411
12 Greece 141 3845820 36.663182
13 Chile 340 9687720 35.095977
14 Israel 132 3792820 34.802601
15 Belgium 169 4922260 34.333822
16 Taiwan 329 13240660 24.847704
17 Barbados 3 121620 24.666996
18 Botswana 7 294000 23.809524
19 Ireland 40 2183760 18.317031
20 Poland 158 9863380 16.018850
21 Brazil 857 58565700 14.633139
22 Malaysia 197 13589520 14.496465
23 Austria 41 2915240 14.064022
24 Sweden 66 4950160 13.332902
25 Canada 219 18090640 12.105708
26 Switzerland 36 3055800 11.780876
27 Macedonia 11 962780 11.425248
28 Slovenia 8 730160 10.956503
29 Albania 12 1097800 10.930953
30 Argentina 218 20048100 10.873848
31 Bosnia and Herzegovina 11 1345020 8.178317
32 Cyprus 4 582600 6.865774
33 Romania 36 5374980 6.697699
34 Finland 15 2287960 6.556059
35 Montenegro 2 306260 6.530399
36 Lithuania 7 1118500 6.258382
37 Pakistan 47 7984880 5.886125
38 Norway 16 2771480 5.773089
39 Hungary 24 4265960 5.625932
40 Turkey 170 32131260 5.290798
41 Qatar 3 671720 4.466147
42 Iceland 1 227000 4.405286
43 Mongolia 2 515080 3.882892
44 Denmark 11 3037700 3.621161
45 Czech Republic 13 3834620 3.390166
46 Mexico 127 38463860 3.301801
47 Costa Rica 6 1889620 3.175242
48 Colombia 41 17322000 2.366932
49 Netherlands 15 7554940 1.985456
50 Panama 2 1014160 1.972075
51 Uganda 1 562240 1.778600
52 Nepal 3 1940820 1.545738
53 South Korea 15 10012400 1.498142
54 Peru 14 9351460 1.497092
55 South Africa 9 6269600 1.435498
56 Cambodia 1 742220 1.347309
57 El Salvador 2 1491480 1.340950
58 Croatia 2 1595760 1.253321
59 Egypt 11 12173540 0.903599
60 Bangladesh 12 14352680 0.836081
61 Ecuador 3 4970680 0.603539
62 Bulgaria 1 2522120 0.396492
63 Serbia 1 3377340 0.296091
64 Thailand 5 17721480 0.282143
65 Hong Kong 1 4034560 0.247859
66 Philippines 4 29890900 0.133820
67 Russia 1 7963400 0.125575
68 Japan 1 17196080 0.058153

Show CodeHide Code