More Musings on People-Powered Search

Ever since I went to the SearchSIG event on social search earlier this year, I have been doing some more thinking about the barriers to having this “people-powered” approach defeat the machine-powered approaches that dominate today. After I read this article in CNet about blog spam, I finally feel like I have some level of coherency around my thinking in this space. From my point of view, the reasons that machine-based search has dominated (and will likely continue to dominate in most general search categories) is driven by four basic beliefs that people have about machines:

1. Machines scale. Just add another one and you have more capacity.
2. Machines are designed to handle repetitive tasks in high volume with a high level of precision
3. Machines can learn (to a point) to detect anomalies and respond accordingly
4. Machines can be coordinated by other machines.

Search is a pretty machine-friendly task at the end of the day. You have one basic task (collect information, or information about information as the case may be and store it for future retrieval) that is fairly monotonous, repetitive, and well-suited to parallelization. If “people-powered” search is to really break through and make an impact on general search, it will have to deal with a few major issues.

Pollution - If you own and operate your own blog, you know that splog is a real problem. I have thousands of junk comments that need to be deleted every month and I have a very small readership. There are some rudimentary tools (blacklists, registration systems, etc) that can stem the tide, but the folks posting bogus comments seem to be able to circumvent these pretty quickly. Solving the spam problem in email has taken a really long time and a fair amount of expense. What happens when tag spam reaches the level of blog spam? Will all of that noise drown out the valuable signals out there? I am not sure, but tag-powered search will need to get better at housekeeping before it’s really ready for being a viable alternative in the general search world.

Breadth – How can people-powered search get breadth? I still have not yet heard a really good explanation here. Right now, those domains where tagging is very popular (namely technology news and politics) have good services that allow you to discover tagged content. But how would you go about getting the best content about bass fishing, for example. Do bass fishers tag? Would you have to pay someone to go out and tag this content? Probably not, but you would have to have some way to organize efforts and get people focused on filling in the gaps.

Coordination of Effort – Let’s look at some of the most successful or popular people-powered applications with meaningful scale. Wikipedia and About.com are two good examples, in my opinion. Both of these have achieved meaningful scale, but both employ some concept of a moderator/editor who reviews content. Will this really scale to the web? Will it scale to billions of documents and millions of concepts? I have no way of knowing. My hunch is that the answer is “maybe, but it will take a decade or so.” The reason is simple. Coordinating effort is always time-consuming and hard.

One counter-argument is that services like Wikipedia don’t really suffer from coordination costs. You basically make infrastructure available and easy-to-use and content contributors will discover it and begin to build the index. Sure, this happens sometimes. But I argue that the density of contribution for the next 5 years or so will come from those who are technologically savvy. As a result, the index, and the topics covered will continue to reflect the interests and tastes of those contributing content. And until you have more broad participation, in terms of interest, these people-powered projects will always suffer from some level of niche bias.

As an aside, that’s why I think some of the focused people-powered products, like Yelp and Digg, have been successful. One way to get over the breadth problem is to focus on more niche audiences. Focusing on tractable problems is a good goal and will likely be the way of life for these kinds of endeavors for the near future.

There is, however, a simple way to begin to address the breadth and coordination cost issues — set up a structure where people get compensated to cover domains of expertise. Compensation can be financial or in terms of some other good that people value. Those who are responsible for these domains of interest can then sub-delegate to those beneath them. This model is not revolutionary. It is pretty similar to what About.com did, one of the original people-powered companies of web 1.0.