There is no question that real-time social networks like Twitter have become an important forum for public conversation, whether the discussion is about chemical weapons in the Middle East or the dance moves of Miley Cyrus at the MTV Video Music Awards.
But good luck trying to search and analyze the fire hose of information flowing through the real-time social network.
Sure, Twitter offers a search function, but its algorithms favor recent tweets and what it considers the most important items over the ones that might be most relevant to the searcher. For example, a search on Tuesday night for Bill de Blasio, who is leading the polls in New Yorkâs Democratic mayoral primary, pulled up lots of tweets repeating his poll stats but virtually nothing on the nightâs debate among the Democratic candidates. And search results for older material are quite limited.
Twitterâs weaknesses in search have left an opening for a start-up called Topsy, which has built a niche offering real-time indexing, search and analysis of the Twitter stream.
On Wednesday, the San Francisco company announced that it has now indexed every Twitter message since the first tweet was posted in 2006 â" about 425 billion pieces of content when you include photos, pages linked from Twitter, and other related material. (Previously, its complete archive only went back to 2010.)
And the database is free for the public to search at Topsy.com. Want to see what people are saying about President Obama and the Syria vote in Congress? A quick search pulls up what Topsyâs algorithm thinks are the most relevant results, factoring in retweets and the past influence of the tweeter. You can narrow down results by time frame, search for tweets in 10 languages, and see a graph with the volume of tweets over time and an indicator of the general sentiment, positive or negative.
âHow do you make sense of 400 billion pieces of content?â said Vipul Ved Prakash, Topsyâs co-founder and chief technology officer. âOne, by ranking it. We do that ranking by looking at how much a particular piece of content is being cited by other people.â
The system is similar to what Google does for Web search. (For a brief time, Google had a deal with Twitter that allowed it to index tweets. But after that deal expired in 2011, it became pretty much useless for searching the real-time social conversation.)
Topsy makes its money from more sophisticated tools â" aimed at marketers, media companies, political operations, and hedge funds â" that require a subscription fee that starts at $12,000 a year. Those allow searches that compare different terms, narrow down results by geography and surface the specific tweets with the most influence on the social conversation.
âWhen Sandy hit, I used them for tracking down information,â said Danny Sullivan, founding editor of Search Engine Land, referring to the 2012 storm that devastated much of the East Coast. âI think theyâre a great resource.â
But Mr. Sullivan, who has followed the development of search technology since its infancy, questioned whether Topsyâs powerful tools are more than most Twitter users want.
âPeople arenât turning to Twitter search the way they are with Google,â he said. âThe people who are really into Twitter search are journalists.â
Mr. Prakash acknowledged that most of Topsyâs users work for businesses that need to mine Twitter for valuable information, such as Visa and USA Today. Competitors like DataSift and Gnip also offer access to the Twitter archive, he said, although their ability to deliver real-time information is more limited.
But Topsy knows itâs doing something right when Twitter itself uses the companyâs tools, including for its Twitter Political Index that tracked voter sentiment during the 2012 presidential election and for its Twitter Oscars Index, which tried to predict this yearâs Academy Award winners based on Twitter chatter.
âWhat we are doing is creating new products from the data,â Mr. Prakash said. âIt becomes very complementary to the products that Twitter is providing.â