MediaCloud — A new tool to analyze global media coverage
March 16th, 2009 | by jz | Published in Future of the Internet
The Berkman Center has just launched a very cool new project, MediaCloud, which you can see over at mediacloud.org. They’re gathering stories from thousands of newspapers, blogs, and other news sources around the web, and then extracting piles of data from the stories—source, topic, entities mentioned, and so on. Their idea is to figure out how to analyze all that data to answer some longstanding, hard-to-answer questions about media overage. A few sample questions that Ethan Zuckerman suggests: What are the biggest differences between citizen media and mainstream media? Can we track the path of a news story that starts out poorly covered, but eventually explodes, and figure out what caused the shift? Is the blogosphere really an echo chamber?
Right now they’ve got a few neat tools hacked together to answer these questions. You can look at the top ten most covered topics in the news sources of your choice. You can map a source’s geographical coverage, so you could figure out which source to read if you’re especially interested in, say, Zimbabwe. You can also look at how terms show up together—for instance, to borrow another example from Ethan, you might also want to know what other terms show up in those Zimbabwe stories:
“[T]he BBC most closely associates Zimbabwe with cholera, followed by the United Kingdom, United Nations, United States. Over here, Fox News—Robert Mugabe as the first thing. So, which is an interesting example of sort of playing the man rather than playing the story. Daily Kos, this is sort of interesting: United States, Afghanistan, Iraq, Washington, China, Pakistan. My guess is that Daily Kos almost has no Zimbabwe-dedicated reporting. It’s sort of general commentary on Obama’s foreign policy.”
These tools are a lot of fun to play with, but the long-term goal is to distribute the work of building tools. The Berkman Center can provide the data, up to 15,000 sources or so, but its researchers can’t think of or implement all the potential creative ways of analyzing it. So they’re releasing all the data they’ve collected, and they want people to independently take up that data and do something unexpected or interesting with it. It’s not hard to think of several functions that it would be useful for someone to build (ability to search by topic rather than by source and language translation immediately jump to mind), and the commenters on the site are busily suggesting more. You can head on over and check it out, suggest some functions you’d like to see, or start building tools yourself.

