• Home
  • About
  • Blog
  • News
  • Events
  • Media
  • Video
  • Glossary
  • Contact
  • Download
  • RSS

Facebook’s ocean of names becomes a torrent

July 28th, 2010  |  by jz  |  Published in cybersecurity, Future of the Internet, privacy  |  3 Comments

Nick Bilton over at the NYT Bits Blog has the story of Internet security consultant Ronald Bowes’s recent Facebook caper.  Ron noticed that Facebook has a directory of its users, just like the old Bell Telephone White Pages.  I agree with Ron’s assessment that this is a very little-noticed feature: normally one searches on Facebook not by looking at a directory, but rather by typing a name into a search box.  It’s in plain sight, though, at http://www.facebook.com/directory:

There are two differences that jump out between this awe-inspiring alphabetical listing of all Facebook users and a dog-eared telephone directory.  First, Facebook’s directory has a staggering 171 million names in it.  Second, in good news for paper prices everywhere given the first difference, the directory is digital — it’s right there, online.  And if it’s online, it’s scrapable.  Ron, being of the inquisitive engineering sort who can’t help but push a button if he sees one, figured that supply creates demand, and went ahead and scraped the directory.

That means he produced a file on his own hard drive containing more or less the directory’s main contents: for each person listed, a name, the person’s Facebook URL (what one types in to go directly to his or her entry), and unique Facebook ID (not a secret; this is part of a person’s Facebook url).  The resulting file is only a few gigs — amazing how cheap storage has become that so much can be roughly the side of an episode of House.  Ron then placed it online as a torrent — which means anyone can download the file, and voila, a snapshot of Facebook’s membership as of July 2010.

So, is this a problem?  As I’m writing, news is only just breaking, so it’s like that moment when a toddler trips, falls, and then has to think about whether to cry or not.  “You’re OK!” is usually what the alert parent encouragingly says — and if the toddler buys it, it’s usually true.  In fact, even if the toddler doesn’t buy it, it’s still usually true.  In this case, I think I’m with the metaphorical parent.  The data that Ron grabbed is precisely what Facebook users have chosen (or perhaps more accurately, passively acquiesced) to share.  For those who lock their privacy settings to avoid having a public listing in a Facebook search, they’re not present here.  For those who have, they are — along with a click through to their respective Facebook pages however they’ve chosen to share them.

Ron appears a little disquieted by it because of the prospect that the snapshot can live forever more.  If you remove your Facebook account or up your privacy settings, that will be reflected in real time in the Facebook directory and search (or at least it should be!).  But the torrent file exists forever — so one’s privacy choices are locked into that moment.  This is an artifact of having a service — Facebook — converted into a product — a Facebook database — the way that universities used to not just maintain online directories, but also publish bound volumes of their alumni with addresses, for those who opted in.  (In fact, many universities still do this; someone should tell them about saving the trees.)

There’s some privacy hit there, but there are also benefits.  By making a public directory — and a scrapable one, no less — Facebook gets more inbound links and attention as its members become easier to find.  And we benefit by having Facebook’s subscribers’ public pages indexed by the likes of Google and Yahoo! search.  In fact, when searching on a person’s name in a regular search engine, quite commonly a Facebook entry is one of the top hits.  That seems to me a good thing, and once Google, Yahoo!, and Bing have it, why shouldn’t Ron and anyone else who wants it have it too?  Indeed, Ron already did some cool stuff with the data.  For example, he crunched it all and came up with a list of Facebook’s most commonly used first and last names, discovering “Michael” and “Smith” coming in at number 1 for each.  Congratulations, Michael Smith, you are hidden in plain sight, since a search for you turns up so many others at the same time!  (Not so much with “Jonathan Zittrain”…)

Anyway, that’s generativity at work: Facebook makes available a directory on free and open terms, and people do stuff with it, some of which can surprise us.  There could be bad surprises, too — Ron and others hint at undesirable data mining — but I’m glad that the gates of Facebook’s gated community have some slats in them, rather than being a solid wall.  At most, it seems to highlight the desirability of getting the defaults right: Facebook shouldn’t have people automatically publicly sharing stuff they’d not normally share, without clear markers on what’s about to happen.  As Google would say, “Please read this carefully.   It’s not the usual yada yada.”

Indeed.  There have been so many Facebook privacy mini-scandals that we’re primed for the next, and the involvement of a torrent file adds an element of seeming subversiveness to the mix, given the association of p2p with contraband material.  But sometimes when the boy cries wolf it’s just a shadow.  I count 8 Yadas in the Facebook directory.  And I, along with my cool musician brother Jeff Zittrain, fall in between Aron Zittra and Austin Zittrauer.  Until now, who knew?  Interesting — but not pitchfork worthy.  …JZ

Responses

Feed
  1. Roy says:

    July 29th, 2010 at 3:07 am (#)

    At last someone who can strip away the hype whilst talking intelligently.

    Roy

  2. Conor says:

    July 30th, 2010 at 1:25 am (#)

    Agreed. The purpose of keeping tabs on each new development, though, is to track the direction of privacy incursions over time. Keeping Facebook’s fraught privacy practices in the public spotlight decreases their public favor, trumpets the consequences of the company’s controversial decisions as they unfold bit by bit in real-time, and prepares U.S. policymakers eventually to start thinking about where the line in the sand is drawn. If Senators are already writing letters to the company, Facebook might have to think twice and even three times (I can’t bring myself to write “thrice” and mean it) before taking the next big step that exposes its users.

    If Facebook doesn’t, Congress and the FTC have ready access to a time stamped record of the company’s long history of privacy abuses just by plugging a few keywords into Google News Search. Admittedly, credit agencies are far worse than Facebook in violating our informational privacy. But they’re regulated. Maybe Facebook should be too.

  3. Aries says:

    July 30th, 2010 at 9:57 am (#)

    You’re wrong on one very big thing. College students. Most colleges offer a database to anyone where you can search a person’s first and last name and find out their local and permanent addresses, cell phone number, and home phone number even if these items are not listed on the student’s facebook page. Furthermore, these students schools are automatically shown on facebook and it’s not a privacy setting you can change. While it was impossible to click 171 million people’s name to filter out everyone within a college, it’s easy to just filter these things out once OWN THE DATABASE FILE. This opens up many college students, who are some of the youngest users among whom there are probably many who haven’t used the site in years, that are open up to predators that other people simply are not.

Blog

  • Dropbox Ran Afoul of Apple’s App Store Review Guidelines: So What?
  • Last week, a number of developers reported that Apple was rejecting iOS applications that used Dropbox, a popular cloud file storage and backup system. An initial thread on the Dropbox developers’ forum has led to a outpouring of tech news full of hyperbolic claims. However, none of this reporting has covered the real problem – Apple is now more concerned about protecting its business model than serving its users or its developers.  Read more »

  • Help pioneer Casebook: The Next Generation
  • We at the H2O project are seeking a full-time Project Manager. H2O is an online platform for textbook development and distribution, currently in a pilot stage. H2O is based on the open source model – instead of locking down materials in formalized textbooks, we believe that course books can be free (as in free speech) for everyone to access and, equally important, build upon.

    Using H2O, professors can freely pull together materials for a course by selecting cases, editing those cases to the sections that are most relevant, and grouping them into readings. Once the materials are assembled, they can be copied in part or in whole by other interested faculty and then edited further.  H2O has been successfully piloted in JZ’s 1L Torts class, and will be rolling out further over the coming year.

    H2O’s project manager will play a leading role in shepherding H2O into its next phase, which will focus on developing new materials and incorporating additional features, in order to expand the platform beyond its law school roots.

    H2O is a  joint project of the Berkman Center for Internet & Society and the Harvard Law School library.  The Project Manager will be housed at the HLS Library and work in close collaboration with lead members of the Library Innovation Lab team; he/she will also work closely with the Berkman Center and current H2O teams. More info and job posting here.

  • Meme patrol: “When something online is free, you’re not the customer, you’re the product.”
  • I participated in the Berkman Center’s fascinating HyperPublic symposium in the summer of 2011.  When moderating a panel I invoked the aphorism that “When something online is free, you’re not the customer, you’re the product.”  It’s a way of encapsulating the idea that online free services usually make money by extracting lots of data from users — and then selling that data, or using it for targeted availability of those users for advertising, to advertisers.  In that sense, the advertisers are the clients, and the users enjoying free content are what’s being sold.  (Of course, sometimes that happens even when the user pays.)

    I didn’t coin the phrase, and since it was featured (and attributed to me!) in wordsmith.org’s wildly popular “word a day” as a thought for the day accompanying the word “enceinte” — I sought to nail down its provenance.

    The first use of the quote that we can find is as a comment within the famed MetaFilter community  in August 2010. The user’s name is blue_beetle, who might be someone named Andrew Lewis.  It’s entirely possible I saw it there, as MeFi is one of my five favorite sites on the Web.

    Similar sentiments (whether drawn from that source or independently invented) have been expressed by Bruce Schneier in October 2010 and by Douglas Rushkoff in September ’11.

    The phrase “you’re the product” also apparently appeared in a 1986 speech by President Reagan about the drug war.

    Just say know.

    –KA and JZ

  • OS X Mountain Lion and Gatekeeper
  • This week, Apple announced that it was moving to a new, faster OS X operating system development cycle, starting with the release of Mountain Lion next summer.  It previewed a number of features for the OS, and released some parts in beta.

    Mountain Lion is slated to include a feature called Gatekeeper as part of the security and privacy settings. Gatekeeper allows administrators (those with full privileges on a Mac) to limit the applications that can run on the Mac.  They can choose among allowing apps downloaded from the Mac App Store only, or apps from outside the Store so long as they are digitally signed to Apple’s satisfaction by their developers, or apps from anywhere.  (The latter has been the way both Mac and Windows PCs have worked, for better or worse, since the introduction of the Apple II in 1977.) Read more »

  • GPS-based Insurance Rates: The Devil is in the (Data) Details
  • A British insurance company called Motaquote has teamed up with TomTom, the GPS manufacturer to offer insurance prices based on data gathered by GPS. Fair Pay Insurance, Motaquote’s new program, is an opt-in insurance pricing scheme where drivers will get a free GPS unit in return for potentially lower (but possibly higher) premiums. The GPS unit will provide all the traditional navigational services as well as warn drivers when they corner too sharply or brake too hard. Read more »

About Jonathan Zittrain

jonathan zittrain

Jonathan Zittrain is Professor of Law at Harvard Law School and co-founder of the Berkman Center for Internet and Society at Harvard Law School

RSS Tweets from Z

  • An error has occurred; the feed is probably down. Try again later.

Blog Archives



Creative Commons BY-NC-SA Jonathan Zittrain unless otherwise noted.
Powered by WordPress using Gridline Lite.