By Joel Warner
By Michael Roberts
By Alan Prendergast
By Michael Roberts
By Michael Roberts
By Amber Taufen
By Patricia Calhoun
By William Breathes
That roll of pre-made cookie dough in the refrigerated aisle of the supermarket harbors a dirty secret: There's a good chance it will be eaten before it ever reaches the oven.
The numbers don't lie. Of roughly 400 people who bought refrigerated cookie dough in a recent two-month period and blogged about it, more than 60 percent consumed the product unbaked. The guilty were quick to confess. "I ate a roll of raw cookie dough -- again." "Bought another roll of cookie dough. Couldn't wait to get home to eat it. Spooned it into my mouth as I drove." "I hate myself. I've had 12 rolls of raw cookie dough this month."
"It was just amazing. I would almost venture to guess they had some eating issues," says David Howlett, vice president of client services for Umbria, the Boulder-based market-research company that uncovered the gluttonous trend. "It's like, you are going to keel over and die!"
Finding such obscure and potentially lucrative consumer trends is Umbria's specialty. The company uses the blogosphere and similar Internet phenomena -- the world of MySpace, Wikipedia, YouTube, Flickr and RSS feeds that information pundits label as "user-generated content," "consumer-generated media" or "social media," though many prefer the imprecise yet catchy "blogosphere" -- as a perpetual, globe-spanning focus group. As people blog about the new iPhone, what they thought of Borat and how they take their cookie dough, they provide a wealth of unsolicited opinions that can be mined for valuable information about how a target audience thinks -- and consumes.
There's only one problem: The blogosphere is a mess, full of colloquial, unorganized, factually questionable rants, rambles and rumors, and that mess is growing by the second. Sorting through it all to find reliable proof of, say, an untapped population of dough gobblers is anything but a piece of cake.
Dry-erase boards have a short lifespan in Ted Kremer's corner office. There are just too many complicated thoughts bubbling out of his energy-drink-fueled, spiky-haired and goateed head. The only way Umbria's chief technology officer can explain them intelligently is to continuously illustrate with scribbles of multi-colored flow charts, multigraphs and Venn diagrams. No amount of erasing will suffice; the faint remnants of countless circles, arrows and equations become permanently tattooed in the white surface like a wall-spanning watermark. "Every now and then I just throw it away and buy a new one," Kremer remarks in his intense, rapid voice, between scribbles. Markers don't fare much better; the one in his hand is just about dead. "My black pen is falling apart. We are going to switch colors, but there is no meaning to the change."
For Kremer, however, there's really meaning to everything, fundamental patterns and underlying significance beneath the unruly pandemonium of the world. "I find the chaotic aspects of human nature fascinating," he says. "That doesn't mean I am not going to try to find order in the chaos." The 35-year-old has made a career of doing just that. He spent his middle- and high-school vacations writing code for his father's East Texas accounting-software company before majoring in computer engineering at the University of Houston. He built software that analyzes the habits of cell-phone users to predict when they were likely to switch carriers. He developed computer systems that allow doctors to quickly make sense of digital mammogram X-rays compiled from hospitals across the nation. In the summer of 2003, he helped found Umbria with Howlett and Howard Kaushansky. Today's he's in charge of the technology that allows the company to decipher and organize the huge amounts of information constantly uploaded onto the Internet through blogs, message boards, web forums and the like -- one of the most unusual, chaotic and rapidly expanding data sources imaginable.
The scale and complexity of Umbria's task is well beyond the scope of any human, and also beyond the capacity of most computers. What typical search engines do -- scan the blogosphere and find the most relevant mentions of a particular topic -- is hard enough. What Umbria's computers have to do -- find and categorize every mention of a topic in the blogosphere -- is far trickier. "Search engines are looking for one thing only," says Kremer. "We pick up where search leaves off."
If, for example, Budweiser wants to know what bloggers everywhere are saying about its product, Umbria can't just hand over the ten most relevant blog postings that contain the terms "Bud" and "beer." The company has to locate every posting that mentions "Bud" and "beer," remove false positives such as those describing "drinking beer with my bud," and then make sense of it all: who's drinking Bud, what they think of it, and how the beer can be better. To do that, a computer can't just search blogs; it has to process and understand them. In a sense, it has to read them.
And getting computers to read is basically impossible. "It's almost a magical event how we learn language," says James Martin, an Umbria advisor and computer science professor at the Center for Spoken Language and Institute of Cognitive Science at the University of Colorado at Boulder. "When it comes to getting a computer to learn language the way a three-year-old does, with almost no instruction whatsoever, we are stumped." While a properly programmed computer can easily find every mention of the word "red" in a thousand-page document and translate them into every language on Earth, since the computer cannot see or thoroughly understand how the world works, there's no way it can comprehend what "red" actually means.