• Home
  • About
  • Blog
  • News
  • Events
  • Media
  • Video
  • Glossary
  • Contact
  • Download
  • RSS

Facebook’s ocean of names becomes a torrent

July 28th, 2010  |  by jz  |  Published in cybersecurity, Future of the Internet, privacy  |  3 Comments

Nick Bilton over at the NYT Bits Blog has the story of Internet security consultant Ronald Bowes’s recent Facebook caper.  Ron noticed that Facebook has a directory of its users, just like the old Bell Telephone White Pages.  I agree with Ron’s assessment that this is a very little-noticed feature: normally one searches on Facebook not by looking at a directory, but rather by typing a name into a search box.  It’s in plain sight, though, at http://www.facebook.com/directory:

There are two differences that jump out between this awe-inspiring alphabetical listing of all Facebook users and a dog-eared telephone directory.  First, Facebook’s directory has a staggering 171 million names in it.  Second, in good news for paper prices everywhere given the first difference, the directory is digital — it’s right there, online.  And if it’s online, it’s scrapable.  Ron, being of the inquisitive engineering sort who can’t help but push a button if he sees one, figured that supply creates demand, and went ahead and scraped the directory.

That means he produced a file on his own hard drive containing more or less the directory’s main contents: for each person listed, a name, the person’s Facebook URL (what one types in to go directly to his or her entry), and unique Facebook ID (not a secret; this is part of a person’s Facebook url).  The resulting file is only a few gigs — amazing how cheap storage has become that so much can be roughly the side of an episode of House.  Ron then placed it online as a torrent — which means anyone can download the file, and voila, a snapshot of Facebook’s membership as of July 2010.

So, is this a problem?  As I’m writing, news is only just breaking, so it’s like that moment when a toddler trips, falls, and then has to think about whether to cry or not.  “You’re OK!” is usually what the alert parent encouragingly says — and if the toddler buys it, it’s usually true.  In fact, even if the toddler doesn’t buy it, it’s still usually true.  In this case, I think I’m with the metaphorical parent.  The data that Ron grabbed is precisely what Facebook users have chosen (or perhaps more accurately, passively acquiesced) to share.  For those who lock their privacy settings to avoid having a public listing in a Facebook search, they’re not present here.  For those who have, they are — along with a click through to their respective Facebook pages however they’ve chosen to share them.

Ron appears a little disquieted by it because of the prospect that the snapshot can live forever more.  If you remove your Facebook account or up your privacy settings, that will be reflected in real time in the Facebook directory and search (or at least it should be!).  But the torrent file exists forever — so one’s privacy choices are locked into that moment.  This is an artifact of having a service — Facebook — converted into a product — a Facebook database — the way that universities used to not just maintain online directories, but also publish bound volumes of their alumni with addresses, for those who opted in.  (In fact, many universities still do this; someone should tell them about saving the trees.)

There’s some privacy hit there, but there are also benefits.  By making a public directory — and a scrapable one, no less — Facebook gets more inbound links and attention as its members become easier to find.  And we benefit by having Facebook’s subscribers’ public pages indexed by the likes of Google and Yahoo! search.  In fact, when searching on a person’s name in a regular search engine, quite commonly a Facebook entry is one of the top hits.  That seems to me a good thing, and once Google, Yahoo!, and Bing have it, why shouldn’t Ron and anyone else who wants it have it too?  Indeed, Ron already did some cool stuff with the data.  For example, he crunched it all and came up with a list of Facebook’s most commonly used first and last names, discovering “Michael” and “Smith” coming in at number 1 for each.  Congratulations, Michael Smith, you are hidden in plain sight, since a search for you turns up so many others at the same time!  (Not so much with “Jonathan Zittrain”…)

Anyway, that’s generativity at work: Facebook makes available a directory on free and open terms, and people do stuff with it, some of which can surprise us.  There could be bad surprises, too — Ron and others hint at undesirable data mining — but I’m glad that the gates of Facebook’s gated community have some slats in them, rather than being a solid wall.  At most, it seems to highlight the desirability of getting the defaults right: Facebook shouldn’t have people automatically publicly sharing stuff they’d not normally share, without clear markers on what’s about to happen.  As Google would say, “Please read this carefully.   It’s not the usual yada yada.”

Indeed.  There have been so many Facebook privacy mini-scandals that we’re primed for the next, and the involvement of a torrent file adds an element of seeming subversiveness to the mix, given the association of p2p with contraband material.  But sometimes when the boy cries wolf it’s just a shadow.  I count 8 Yadas in the Facebook directory.  And I, along with my cool musician brother Jeff Zittrain, fall in between Aron Zittra and Austin Zittrauer.  Until now, who knew?  Interesting — but not pitchfork worthy.  …JZ

Responses

Feed
  1. Roy says:

    July 29th, 2010 at 3:07 am (#)

    At last someone who can strip away the hype whilst talking intelligently.

    Roy

  2. Conor says:

    July 30th, 2010 at 1:25 am (#)

    Agreed. The purpose of keeping tabs on each new development, though, is to track the direction of privacy incursions over time. Keeping Facebook’s fraught privacy practices in the public spotlight decreases their public favor, trumpets the consequences of the company’s controversial decisions as they unfold bit by bit in real-time, and prepares U.S. policymakers eventually to start thinking about where the line in the sand is drawn. If Senators are already writing letters to the company, Facebook might have to think twice and even three times (I can’t bring myself to write “thrice” and mean it) before taking the next big step that exposes its users.

    If Facebook doesn’t, Congress and the FTC have ready access to a time stamped record of the company’s long history of privacy abuses just by plugging a few keywords into Google News Search. Admittedly, credit agencies are far worse than Facebook in violating our informational privacy. But they’re regulated. Maybe Facebook should be too.

  3. Aries says:

    July 30th, 2010 at 9:57 am (#)

    You’re wrong on one very big thing. College students. Most colleges offer a database to anyone where you can search a person’s first and last name and find out their local and permanent addresses, cell phone number, and home phone number even if these items are not listed on the student’s facebook page. Furthermore, these students schools are automatically shown on facebook and it’s not a privacy setting you can change. While it was impossible to click 171 million people’s name to filter out everyone within a college, it’s easy to just filter these things out once OWN THE DATABASE FILE. This opens up many college students, who are some of the youngest users among whom there are probably many who haven’t used the site in years, that are open up to predators that other people simply are not.

Blog

  • Controlling Cyberspace
  • This semester, we’re starting an exciting new class, aimed not at lawyers, but undergraduate CS students here at Harvard. It’s called CS42: Controlling Cyberspace – and we’re sharing the syllabus online.  Anything big we’re missing? Read more »

  • Computers Going Wild?
  • Computers Gone Wild: Impact and Implications of Developments in Artificial Intelligence on Society was an informal discussion that took place at Harvard Law School on December 8th, 2011. Hosted by Jonathan Zittrain, Marin Soljačić and the Berkman Center for Internet & Society, we brought together eighteen mostly local guests to discuss the ways that AI is changing society. Unlike futuristic predictions involving the Singularity or the underlying technology, this workshop explored current technology. Sessions included discussions on warfare, finance, education, and labor. Below is a list of attendees and a summary of the discussion.

    Read more »

  • Ideas for a Better Internet
  • Ideas for a Better Internet, or i4bi, is an interdisciplinary course at Harvard and Stanford that challenges students from law, computer science, and public policy to come up with novel and plausible ways to improve the Internet and its use. i4bi centers on immersing participants in Internet history, technologies, and politics, so that students can come up with ideas that help to build a better Internet — however they define “better.” Read more »
  • Microsoft Echoes Apple App Store Requirements
  • Here at Future of the Internet, we’ve already talked a little bit about Apple’s content requirements for both the iOS and Mac App Stores in JZ’s The PC is Dead post. As JZ said,

    “Pulitzer Prize-winning editorial cartoonist Mark Fiore found his iPhone app rejected because it contained “content that ridicules public figures.” Fiore was well-known enough that the rejection raised eyebrows, and Apple later reversed its decision. But the fact that apps must routinely face approval masks how extraordinary the situation is: tech companies are in the business of approving, one by one, the text, images, and sounds that we are permitted to find and experience on our most common portals to the networked world. Why would we possibly want this to be how the world of ideas works, and why would we think that merely having competing tech companies—each of which is empowered to censor—solves the problem?”

    Apple’s approach is an example of a larger phenomenon. Read more »

  • A SOPA compromise is floated
  • Last week several members of Congress — Senators Wyden, Cantwell, Moran, and Paul, and Reps. Issa, Lofgren and Chaffetz — floated a proposal to substitute for the contentious proposed Stop Online Piracy Act, previously discussed here.  Sen. Wyden’s office has commented on the compromise, and TechDirt has a writeup and a copy of the document here. The proposal omits the elements of SOPA that had run into the most resistance. Gone is tinkering with fundamental Internet architecture such as the use of the domain name system. Gone is the involvement of the Attorney General. Gone is the criminal copyright streaming provision that could, theoretically, make a teenage Justin Bieber a felon for streaming amateur videos featuring his renditions of songs by his favorite artists.In all these ways, the Wyden compromise is significantly better than SOPA. So what’s left? Read more »
About Jonathan Zittrain

jonathan zittrain

Jonathan Zittrain is Professor of Law at Harvard Law School and co-founder of the Berkman Center for Internet and Society at Harvard Law School

RSS Tweets from Z

  • An error has occurred; the feed is probably down. Try again later.

Blog Archives



Creative Commons BY-NC-SA Jonathan Zittrain unless otherwise noted.
Powered by WordPress using Gridline Lite.