Pete Warden, Facebook info harvester, explains why he deleted all his data
In recent days, media outlets nationwide have told the story of Boulder-based Pete Warden, who created a database of information about Facebook's approximately 215 million account holders. Facebook accused Warden of violating the rules of its site and threatened him with legal action -- a prospect he took seriously enough to destroy all the info he'd gleaned to date.
Turns out that Warden's project had raised eyebrows in the tech community for months: Check out this February post from MichaelZimmer.org, which challenges the ethics of the database. "Just because these Facebook users made their profiles publicly available does not mean they are fair game for scraping for research purposes," Zimmer argues, adding that the the approach "poses a serious privacy threat to the subjects in the dataset, their friends, and perhaps unknown others."
Warden doesn't see it that way.
Here's how he explains his basic concept:
I scratched my head a bit and thought "well, how hard can it be to build my own search engine?". As it turned out, it was very easy. Checking Facebook's robot.txt, they welcome the web crawlers that search engines use to gather their data, so I wrote my own in PHP (very similar to this Google Profile crawler I open-sourced) and left it running for about 6 months. Initially all I wanted to gather was people's names and locations so I could search on those to find public profiles. Talking to a few other startups they also needed the same sort of service so I started looking into either exposing a search API or sharing that sort of 'phone book for the internet' information with them.
Warden subsequently set up a website called FanPageAnalytics.com, which he saw as having commercial applications. But in early February, after putting together "How to Split Up the US," an article gleaned from some of his findings, he got a call from a Facebook attorney.
After contacting a lawyer of his own, Warden came to the conclusion that while he could fight Facebook's demand that he call a halt to his project, "the legal costs alone of being a test case would bankrupt me."
Hence, his decision to destroy his database. He concedes that he's "just glad that the whole process is over." However, he adds:
I'm bummed that Facebook are taking a legal position that would cripple the web if it was adopted (how many people would Google need to hire to write letters to every single website they crawled?), and a bit frustrated that people don't understand that the data I was planning to release is already in the hands of lots of commercial marketing firms, but mostly I'm just looking forward to leaving the massive distraction of a legal threat behind and getting on with building my startup. I really appreciate everyone's support, stay tuned for my next project!
Now doubt plenty of folks will be watching his next moves closely.
Get the ICYMI: Today's Top Stories Newsletter Our daily newsletter delivers quick clicks to keep you in the know
Catch up on the day's news and stay informed with our daily digest of the most popular news, music, food and arts stories in Denver, delivered to your inbox Monday through Friday.