Tuesday, November 3, 2009

The sorrow with search

Want to find all the comparisons of the Droid versus its competitors? Great, Google it. Want to find out when it will hit the market? Again, Google is your friend. But Xenu help you if you want to find any bits of data that are too esotoric (for lack of a better word).

For example, is the Droid going to be part of the Verizon BOGO sale? (Buy One, Get One) Well, lets go to the Google and type in “Droid Verizon BOGO” and see what happens. The problem is that many (most) sights have “Recent Posts” and “You might also like” links embedded in the HTML, and virtually any sight that mentions the Storm BOGO sale will have a “You might also like” that includes the Droid. What this means is that all of my search results are articles about the new Storm2, with ads and “you might also like” pointers to Droid stories, not at all what I am looking for.

This is a nontrivial problem. When I was at Eluma I worked on a recommendation engine and this was a pretty big issue. When you recommend a page to a user you probably don’t want to include the advertisements on the page in your algorithm, and the comments should have a lower rating. So you need to parse the RSS items, not the landing page, but of course many sites just put a synopsis in the RSS item so you really need to parse the landing page.

But surely the Google has solved this problem? For example, what about their “Blog Search”. Yeah, good idea, but blog search indexes the landing pages, and the landing pages are full of all of the cruft that is making my search useless. Well, not totally useless, after switching to the blog search I *did* get better results, but still not good results.

Certainly Technorati can do a better search? In fact they can, and did. My search resulted in 2 (two, yes two) hits, but they were both pertinent. So did I only get 2 hits due to the fact that Technorati doesn’t index ALL RSS feeds, or just because there really are only 2 articles that are pertinent? Dunno, but it does make me wonder; Google is touching just about EVERY RSS feed in existence, why not have a blog search that only indexes the RSS feed for more focused results?

p.s. lest you think “sure, but I don’t care about the Droid and its BOGO sale status.” Well, this also cropped up when I was researching the article Best Sandwich Ever; search for “tuna muffaletta” and be prepared for nonsense. On a side note, it does appear that that sandwich may be unique.