• Home
  • About
  • Blog
  • News
  • Events
  • Media
  • Video
  • Glossary
  • Contact
  • Download
  • RSS

Facebook’s ocean of names becomes a torrent

July 28th, 2010  |  by jz  |  Published in Future of the Internet, cybersecurity, privacy  |  3 Comments

Nick Bilton over at the NYT Bits Blog has the story of Internet security consultant Ronald Bowes’s recent Facebook caper.  Ron noticed that Facebook has a directory of its users, just like the old Bell Telephone White Pages.  I agree with Ron’s assessment that this is a very little-noticed feature: normally one searches on Facebook not by looking at a directory, but rather by typing a name into a search box.  It’s in plain sight, though, at http://www.facebook.com/directory:

There are two differences that jump out between this awe-inspiring alphabetical listing of all Facebook users and a dog-eared telephone directory.  First, Facebook’s directory has a staggering 171 million names in it.  Second, in good news for paper prices everywhere given the first difference, the directory is digital — it’s right there, online.  And if it’s online, it’s scrapable.  Ron, being of the inquisitive engineering sort who can’t help but push a button if he sees one, figured that supply creates demand, and went ahead and scraped the directory.

That means he produced a file on his own hard drive containing more or less the directory’s main contents: for each person listed, a name, the person’s Facebook URL (what one types in to go directly to his or her entry), and unique Facebook ID (not a secret; this is part of a person’s Facebook url).  The resulting file is only a few gigs — amazing how cheap storage has become that so much can be roughly the side of an episode of House.  Ron then placed it online as a torrent — which means anyone can download the file, and voila, a snapshot of Facebook’s membership as of July 2010.

So, is this a problem?  As I’m writing, news is only just breaking, so it’s like that moment when a toddler trips, falls, and then has to think about whether to cry or not.  “You’re OK!” is usually what the alert parent encouragingly says — and if the toddler buys it, it’s usually true.  In fact, even if the toddler doesn’t buy it, it’s still usually true.  In this case, I think I’m with the metaphorical parent.  The data that Ron grabbed is precisely what Facebook users have chosen (or perhaps more accurately, passively acquiesced) to share.  For those who lock their privacy settings to avoid having a public listing in a Facebook search, they’re not present here.  For those who have, they are — along with a click through to their respective Facebook pages however they’ve chosen to share them.

Ron appears a little disquieted by it because of the prospect that the snapshot can live forever more.  If you remove your Facebook account or up your privacy settings, that will be reflected in real time in the Facebook directory and search (or at least it should be!).  But the torrent file exists forever — so one’s privacy choices are locked into that moment.  This is an artifact of having a service — Facebook — converted into a product — a Facebook database — the way that universities used to not just maintain online directories, but also publish bound volumes of their alumni with addresses, for those who opted in.  (In fact, many universities still do this; someone should tell them about saving the trees.)

There’s some privacy hit there, but there are also benefits.  By making a public directory — and a scrapable one, no less — Facebook gets more inbound links and attention as its members become easier to find.  And we benefit by having Facebook’s subscribers’ public pages indexed by the likes of Google and Yahoo! search.  In fact, when searching on a person’s name in a regular search engine, quite commonly a Facebook entry is one of the top hits.  That seems to me a good thing, and once Google, Yahoo!, and Bing have it, why shouldn’t Ron and anyone else who wants it have it too?  Indeed, Ron already did some cool stuff with the data.  For example, he crunched it all and came up with a list of Facebook’s most commonly used first and last names, discovering “Michael” and “Smith” coming in at number 1 for each.  Congratulations, Michael Smith, you are hidden in plain sight, since a search for you turns up so many others at the same time!  (Not so much with “Jonathan Zittrain”…)

Anyway, that’s generativity at work: Facebook makes available a directory on free and open terms, and people do stuff with it, some of which can surprise us.  There could be bad surprises, too — Ron and others hint at undesirable data mining — but I’m glad that the gates of Facebook’s gated community have some slats in them, rather than being a solid wall.  At most, it seems to highlight the desirability of getting the defaults right: Facebook shouldn’t have people automatically publicly sharing stuff they’d not normally share, without clear markers on what’s about to happen.  As Google would say, “Please read this carefully.   It’s not the usual yada yada.”

Indeed.  There have been so many Facebook privacy mini-scandals that we’re primed for the next, and the involvement of a torrent file adds an element of seeming subversiveness to the mix, given the association of p2p with contraband material.  But sometimes when the boy cries wolf it’s just a shadow.  I count 8 Yadas in the Facebook directory.  And I, along with my cool musician brother Jeff Zittrain, fall in between Aron Zittra and Austin Zittrauer.  Until now, who knew?  Interesting — but not pitchfork worthy.  …JZ

Responses

Feed
  1. Roy says:

    July 29th, 2010 at 3:07 am (#)

    At last someone who can strip away the hype whilst talking intelligently.

    Roy

  2. Conor says:

    July 30th, 2010 at 1:25 am (#)

    Agreed. The purpose of keeping tabs on each new development, though, is to track the direction of privacy incursions over time. Keeping Facebook’s fraught privacy practices in the public spotlight decreases their public favor, trumpets the consequences of the company’s controversial decisions as they unfold bit by bit in real-time, and prepares U.S. policymakers eventually to start thinking about where the line in the sand is drawn. If Senators are already writing letters to the company, Facebook might have to think twice and even three times (I can’t bring myself to write “thrice” and mean it) before taking the next big step that exposes its users.

    If Facebook doesn’t, Congress and the FTC have ready access to a time stamped record of the company’s long history of privacy abuses just by plugging a few keywords into Google News Search. Admittedly, credit agencies are far worse than Facebook in violating our informational privacy. But they’re regulated. Maybe Facebook should be too.

  3. Aries says:

    July 30th, 2010 at 9:57 am (#)

    You’re wrong on one very big thing. College students. Most colleges offer a database to anyone where you can search a person’s first and last name and find out their local and permanent addresses, cell phone number, and home phone number even if these items are not listed on the student’s facebook page. Furthermore, these students schools are automatically shown on facebook and it’s not a privacy setting you can change. While it was impossible to click 171 million people’s name to filter out everyone within a college, it’s easy to just filter these things out once OWN THE DATABASE FILE. This opens up many college students, who are some of the youngest users among whom there are probably many who haven’t used the site in years, that are open up to predators that other people simply are not.

Blog

  • FTC goes after astroturfing
  • Last week the U.S. Federal Trade Commission announced a settlement with Reverb Communications, a firm that describes its business as a:

    … full service videogame agency that provides public relations, marketing, and sales services through one integrated campaign to the interactive entertainment and music industry.  Using precise messaging and calculated marketing campaigns, we are able to drive consumer and industry demand for our clients’ products, resulting in increased product sales.

    According to the FTC’s complaint, some of the “precise messaging” involved the firm putting in fake positive user reviews of various video games on the iTunes store.

    I haven’t been able to track down Reverb’s answer to the charges except a statement repeated here, a blog entry that reports some additional details of how the FTC got onto Reverb’s trail.  Reverb is said to have said:

    During discussions with the FTC, it became apparent that we would never agree on the facts of the situation. Rather than continuing to spend time and money arguing, and laying off employees to fight what we believed was a frivolous matter, we settled this case and ended the discussion because as the FTC states: “The consent agreement is for settlement purposes only and does not constitute admission by the respondents of a law violation.”

    That sounds like a non-denial denial, and the FTC appears to be doing good work here.  In the fall of ’09 it announced that paid commercial endorsements had to be disclosed — even on Twitter, Facebook, and in blogs.  There was some handwringing over this — would the government be going after any blogger who says something good about something and might have a financial interest in it?  It is not particularly easy to predict, especially since the FTC, unlike other Federal agencies, does not do formal rulemakings — it can only announce guidelines and then bring one enforcement action at a time under its general charter to combat unfair or deceptive trade practices.

    The Reverb case provides a good example of how the FTC is thinking about applying its limited staff power: to professional organizations working to subvert ratings schemes.  That’s a good place to start; if nascent ratings schemes are to work, it’s helpful to know what the boundaries are — especially to PR and marketing firms that don’t want to have to race to the bottom.  Now they can tell their clients that they’re just not able to help out with fake reviews.  (In the meantime, the Reverb main home page is showing a generic parked message — odd.)

    I remain curious how effective sites like subvertandprofit.com are.  S&P says it:

    … runs social media campaigns across a variety of social media sites, via our 25,000 users who earn money by viewing, voting, fanning, rating, or posting assigned tasks. Since 2007, our user actions have effectively promoted our advertisers’ web content to popularity at significant cost savings. In 2010, Subvert and Profit merged with Crowdsource Corp. to extend the power of crowdsourcing to a variety of social and business applications.

    More directly, S&P tells advertisers that they can:

    Buy votes on social media sites.

    1. Sign up.
    2. Add funds to your account.
    3. Buy votes.
    4. Get visitors to your site for cheap.
    5. Repeat.

    And in turn, social media users can “get paid just for clicking buttons.”

    Perhaps they or other intermediaries that help to launder ratings could find themselves answering some questions from the FTC.  I see the domain for subvertandprofit is registered in Massachusetts, so I’ve sent an email to its owner — I’ll update this post if I hear anything.

  • Fried Androids?
  • In March, a panel of the Federal Circuit affirmed a Texas district court ruling requiring EchoStar to remotely disable the DVRs of innocent customers as part of its damages for infringing on TiVo’s DVR patents.  At the time, Elisabeth and JZ predicted that we would see an increasing number of similar cases as companies — and governments — figured out how to take advantage of additional control points that exist in tethered appliances.  Their Delphian suggestion came to pass in the mobile arena recently when Oracle filed suit against Google for patent and copyright infringement.  The lawsuit claims that Google’s Android OS (along with its software development kit and custom virtual machine) infringes Oracle’s IP rights in the Java programming language.

    Much of the online discussion has focused on the merits of the suit.  Oracle officially acquired Sun Microsystems early this year.  Sun originally developed Java and, over time, released most of the platform into the open source ecosystem.  Patents that were filed may have been a defense against litigation or even a joke.  And Google has licenses for those patents.  So the question here revolves around whether, by strict or loose interpretation, Google violated its licenses, but the vagueness and generality of Oracle’s complaint [pdf] (and press release) renders most of this analysis speculative pending additional clarification.  (More discussion on the open source backdrop is available here and here, and counterpoint here.)

    However, the remedy Oracle wants couldn’t be more clear.  It asks for monetary damages to compensate it for its financial losses and punitive damages because it alleges Google “knowingly,” i.e. intentionally, violated its IP rights.  In addition, Oracle requests “[a]n order permanently enjoining Google, its officers, agents, servants, employees, attorneys and affiliated companies, its assigns and successors in interest, and those persons in active concert or participation with it, from continued acts of infringement of the patents and copyrights at issue in this litigation” and “[a]n order that all copies made or used in violation of Oracle America’s copyrights, and all means by which such copies may be reproduced, be impounded and destroyed or otherwise reasonably disposed of.”  The last one is the kicker: just like TiVo’s demand of EchoStar, Oracle wants the court to tell Google to reach into Android owners’ handsets and rip out the offending material, leaving innocent consumers with a gutted shell — and the remainder of their two-year service contract.

    The destruction remedy applies only to the copyright claim.  If the case goes to trial a jury could conceivably find Google liable for patent infringement but not copyright violation.  And even if it did, the district judge has discretion over what relief to grant.  Plus, the appeals process could hack back overbearing damages.

    But as long as it is on the table, the availability of such a remedy is a very big stick.  Even if Google believes it should win the suit, betting on that outcome doesn’t make sense if it means risking having to destroy consumers’ phones or fighting a long and uncertain legal battle after the destruction provision is awarded, instead of paying conventional monetary damages.

    Google has seen how a similar fight has played out for EchoStar.  EchoStar attempted to comply with the court order by sending DVR boxes an update that replaced the infringing technology with noninfringing parts, leaving intact the DVRs’ functionality.  The Federal Circuit said “no dice,” the remedy was disablement of the DVRs, and that alone would suffice.  EchoStar continues to refuse to disable its customers’ DVRs and has been held in contempt and fined $200 million.

    The Federal Circuit has agreed to rehear EchoStar’s case en banc.  And in the interim, the U.S. Patent and Trademark office has invalidated the very patents TiVo claimed EchoStar infringed. (TiVo is appealing the ruling; until its appeal is exhausted, the patents remain in force.)  And the FTC has stepped in to give the circuit court some guidance, filing an amicus brief urging it to consider how specific sanctions will impact innovation across the technology industry.

    The availability of destruction as a remedy smothers innovation.  If Oracle can’t strong-arm Google into settling but wins at trial and is awarded the destruction provision (and it survives appeal and Google eventually capitulates instead of balking and riding a series of contempt proceedings into a draconian post-litigation settlement or bankruptcy), (1) consumers would have their phones replaced with bricks and think twice before buying new tech again; (2) Android developers would see their platform and all their apps evaporate; and (3) in the future, companies would likely waste time reinventing the wheel to avoid Google’s court-ordered fate rather than developing new technologies.  There is a storm brewing, brought on by the rise of tethered appliances and the thicket of software patent regulation.

    —By Jennifer Halbleib

  • The Google/Verizon framework
  • I’ve been trying to figure out what the Google/Verizon announcement means.  It’s not easy to do, in large part because the announcement doesn’t precisely announce anything.  It’s titled a “legislative framework proposal.”  That is, on its own terms it’s not an agreement between two companies — neither is bound to do anything by it, which I guess is how they could deny last week’s New York Times report about a “deal on web pay tiers” — but it does represent a meeting of the minds between them about what ought to happen in the world, in particular what American (and presumably others’) law should become here.

    That kind of mental-but-not-legal agreement can get away with being far more vague than a typical contract.  It’s amenable to what Cass Sunstein calls “incompletely theorized agreements.”  Cass’s work points out that parties who disagree on basic things — such as a would-be polity that wants to produce a constitution for the first time — risk coming away empty handed if they insist on their own views.  But they don’t want to compromise, either.  So what they do is strategically punt: they come up with texts that are intentionally vague, leaving it for another day to figure out what they mean in practice, so they can move on with a joint endeavor of some kind.  There are lots of vague statements of that sort in the proposal, some of which are drawn from another likely-intentionally vague set of FCC principles about the Net.  So, for example, under the proposal, carriers can’t engage in undue discrimination.  They can do reasonable network management.  There’s to be transparency, but not neutrality, for wireless at this time. These definitions would have to be much more fleshed out to understand what the agreement means, and lawyers use terms like these so that the parties’ different ideas of “undue,” “reasonable,” and “now” can be parked in peace under the same roof.

    Here’s my own take so far — I figured it might be useful to share my own process in working this through rather than writing (yet) a firm advocacy piece for one view over another. Read more »

  • FOI Topics and Links of the Week
  • Game on. A featureless update released recently by TI blocks a hack that allowed owners to write their own programs for the company’s Nspire calculator. It’s not immediately obvious what rationale TI used to justify the block. It isn’t under pressure to protect the commercial interests of a partner service provider. And worst case, a buggy calculator isn’t exactly as calamitous as a compromised cell phone. In any event, the competition illustrates what may become an increasingly common arms race between hardware companies trying to lock down their products and consumers who want to load the software of their choice on a device they own.

    Disintegrating Droids. The Droid X comes pre-loaded with eFuse technology, which prevents it from booting with unapproved software. Motorola points out that triggering eFuse doesn’t permanently disable the phone — it can re-boot once approved software is reinstalled. Much better.

    Neighborhood watch for software vulnerabilities. At the Black Hat security conference last week, Microsoft advocated for cooperation between software companies, researchers, and security vendors to share information on flaws and patches in order to keep users safe. Perhaps cross-pollination at the meeting will spread the idea of mutual aid to website owners as well.

    Researcher remotely hacks ATMs. Also at Black Hat, a security researcher demonstrated that he could remotely order stand-alone ATMs to spew cash. While causing a remote ATM to dispense money at will is less appealing to the average thief than cracking open a proximate machine, an accomplice with a laptop in a van nearby could make it a profitable endeavor.

    Apple rejects iPad magazine subscription app. Apple has nixed an app from Time, Inc. that would have allowed iPad owners to purchase a digital subscription to Sports Illustrated. Peter Kafka of Media Memo hypothesizes that Apple doesn’t want to give magazine publishers the access to personal user information they would have with an app. But publishers are likely salivating over the targeted advertising potential of mining that data. Plus, single-issue sales through iTunes are cumbersome and inefficient. There may be a confrontation brewing, unless publishers are willing to be satisfied with whatever options Apple grants them.

    FBI challenges Wikipedia over logo. This week, the FBI accused Wikipedia of illegally displaying the agency’s official seal. Wikipedia has refused to remove the image from its FBI page. Wikipedians have a history of standing firm on controversial articles. It’s unclear whether a specific incident triggered agency action. The BBC notes that since the seal is published elsewhere on the Web, the FBI’s selective targeting of Wikipedia is also mysterious. And many reports on the story now include . . . images of the seal.
    Zombie cookie revenge. A lawsuit filed in federal court alleges that several prominent websites used Flash or “zombie” cookies to surreptitiously collect personal user information. Flash cookies can re-create browser cookies deleted by users. They function as extra storage for websites and maintain user preferences, but can also be exploited to track users online.
    —By Jennifer Halbleib
  • What matters in net neutrality
  • It’s hard to know what to make of the Google/Verizon deal since until earlier today both companies have denied that there is one. And it’s hard to argue about net neutrality because it means so many different things to different people. I’ve got lots of reading to do to catch up on the newly released set of principles from the companies, but in the meantime here are a few thoughts on the topic.

    The core question is this: when Internet Service Providers turn out to have captive audiences of subscribers — either because their customers have few if any alternatives for broadband, or because switching is complicated and cumbersome, or because ISP practices are obscure and thus hard for customers to adapt to — how far should they be allowed to leverage that captivity?

    That question arises in the midst of a very confused economy for the movement of bits over the Internet.  With telephones the baseline rule was simple: sender pays.  On the Internet, it’s more complicated: both sender and receiver pay their respective Internet Service Providers to move their data traffic.  Now, suppose these are large ISPs who are considering connecting to each other directly.  The ISP who hosts a sender of traffic like YouTube might say to the ISP with lots of individual users who watch YouTube videos: “We seem to have a lot of stuff that your users want, and they’re paying you to get it to them.  What will you pay us to pass this stuff efficiently over to you?”  The ISP with the individual users might reply with a different point of view: “You’ve got a lot of stuff you want to send to our users, and your corporate customer is making money through advertising or subscription fees when our users access it. What will you and your corporate subscriber pay us to be able to reach our captive audience?”  It’s an odd puzzle: both sides benefit from the transaction, so who should pay for it, given that there’s no baseline rule like “sender pays”?

    In the past this dilemma between large ISPs has been resolved through peering arrangements that have amounted to simple handshakes: I’ll carry your traffic aimed at my subscribers if you carry mine aimed for yours, and we’ll call it even.  Today those deals are more complicated, and their details are typically trade secrets.  But we know this much: Verizon, like other broadband providers, already says to its customers: pay us more and we’ll give you faster Internet access.  That’s not controversial.  So should Verizon also be able to make a similar offer in the other direction, to faraway upstream content providers?  Verizon could say to Google: regardless of what you pay your own ISP to get your bits launched on the Internet, pay us more and we’ll make sure your YouTube videos get to our subscribers all the more quickly as they come in for a landing.

    Google might well be able to pay — and then leave poorer content providers behind.  The next two guys who want to start, say, ShmouTube won’t be able to do it if they’ve got to negotiate business development deals with one ISP after another in order to reach those ISPs’ subscribers.  And that’s the real danger: when each ISP can, in effect, speak on behalf of its unwitting subscribers, serving as the troll under the bridge offering up different conditions for access to them, the economics of the Net will start to favor the consolidated, the well-connected, the well-heeled.  Verizon and Google each have reason to take the trouble to negotiate with one another to begin with — they’ve both big, and each can offer uniquely desirable benefits to the other.  The generative power of the Internet is that it has offered a perch for anyone who wants to plant a flag in the ground.  Set up www.mynewamazingwebsite.com, and people the world over can beat a path to it or not as they please.  That represented a huge change from the proprietary consumer networks of the 1980s and 90s, where AOL or CompuServe got to say who could have a presence within their gated communities.

    It may turn out to be too simple to have a blanket rule against ISPs charging faraway providers for access.  There are even some outcomes that make that desirable for consumers — imagine if Internet access were free, with ISPs beating down your door to provide you with broadband, because if you choose them then they’ll get paid by Google et al. for the privilege of sending bits (and ads) to you.  That’s a dubious outcome for a number of reasons, but it’s theoretically possible.  But much more dangerous is if ISPs get to pick and choose: one deal for Google, another for the New York Times, a third for eBay, and no deal at all for mynewamazingwebsite.  In a medium in which so many of the giants were yesterday’s scrappy upstarts — eBay, Google, even the Web itself — it would be a travesty to freeze out the next round of innovation from odd corners by deploying an impenetrable web of contracts and fees.  That’s what I take to be at the core of Chairman Genachowski’s comment that “Any outcome, any deal that doesn’t preserve the freedom and openness of the Internet for consumers and entrepreneurs will be unacceptable.”

    Update: More thoughts here.

About Jonathan Zittrain

jonathan zittrain

Jonathan Zittrain is Professor of Law at Harvard Law School and co-founder of the Berkman Center for Internet and Society at Harvard Law School

RSS Tweets from Z

  • An error has occurred; the feed is probably down. Try again later.

Blog Archives



Creative Commons BY-NC-SA Jonathan Zittrain unless otherwise noted.
Powered by WordPress using Gridline Lite.