Wednesday, February 02, 2011

"delhipublicschool40 chdjob" = Bing Fail!


"delhipublicschool40 chdjob" was one of the "honeypot" phrases used by Google in its sting operation on its rival Bling [See also Danny Sullivan's scoop and commentary for a very good description of how the sting was implemented.].

Looks like somebody at Google is an alumnus of Delhi Public School. Could it be Amit Singhal -- "a Google Fellow who oversees the search engine’s ranking algorithm" --himself?

25 Comments:

  1. Dilip D'Souza said...

    Abi, the Danny Sullivan link points back to nanopolitan, may not be quite what you intended, interesting as your blog is...

  2. Dilip D'Souza said...
    This comment has been removed by the author.
  3. Dilip D'Souza said...

    Isn't it fascinating that "hiybbprqag" now returns >15K results on Google, as opposed to only 1 just over a month ago!

  4. Tabula Rasa said...

    wow. thanks!

  5. Navaneethan Santhanam said...

    Well, if the only signal available to Bing is from a user's click on Google, then it seems pretty likely that that signal is used to rank results. It's a synthetic query - where else is the information supposed to come from?

    Full Disclosure - I used to work at Bing. I don't any more, but this entire thing smacks of a PR exercise on Google's part.

  6. jbeck said...

    Interesting operation, and good to see Google and Microsoft,two companies with plenty of learned folks have a knowledge driven battle. But that is ignoring the dumb dragon in the room - the dumb masses who think "technology" means using Facebook 16 hours a day. So do the big two have any plan to counter the flood of trivialization thru the likes of FB that threatens to dumb down the 'net? Hope so...

  7. Ankur Kulkarni said...

    This experiment is not conclusive of copying - it only shows that Bing bases its search results on the behaviour of its users. It would be somewhat suspicious if the only behaviour sampled for this purpose is the clicks users make on Google-search results, yet, that would not imply copying.

    To prove copying Google has to show that Bing and Google are identical in concept, not just that Bing.com uses some information from Google.com.

    The subtle issue with Google's complaint is its confusion about the domain of search. Any search engine presumably searches the "internet". Google's complaint stems from the assumption that Bing should search everything on the internet, except Google.com and the activity on it. Why this restriction?

  8. Rohan said...

    @Ankur: I'm not sure how user behaviour can generate the exact same irrelevant results for search queries? Also, if Bing is using search results from Google, it needs to do with permission at the very least, with attribution.

  9. Abi said...

    @Ankur: You said, "To prove copying Google has to show that Bing and Google are identical in concept, not just that Bing.com uses some information from Google.com."

    Your definition of copying is pretty extreme! It's enough to show several unique instances where copying has happened. [Isn't it enough to catch a student copying just one answer -- rather than an entire exam?]

    Bing can claim using thousands of "signals" -- but Google just proved that the source of one such signal is Google itself.

    If that's not egg on Bing's face, I don't know what is ...

  10. Navaneethan Santhanam said...

    @Abi - Bing is admitting using signal from Google. It's not really egg if they're saying that this is what they do, right? Not really being cloak and dagger about it. They are a search engine after all, and user behaviour all over the Web is interesting as a ranking signal, and Google is one source of information.

  11. Abi said...

    @Navaneethan: Bing's face acquired the egg when it stumbled on Google's honeypot!

    (a bit of mixed metaphors there, but still).

  12. Ankur Kulkarni said...

    @Abi You are assuming, much like Google is, that Bing ought to come up with its own search results which should necessarily be independent of any activity on Google.com. I said above
    "The subtle issue with Google's complaint is its confusion about the domain of search... [it] stems from the assumption that Bing should search everything on the internet, except Google.com and the activity on it."

    This is unfair, for even google uses a user's behaviour on its google.com to give results tailored to him/her! The job of search engines is to provide high-quality results, using every possible bit of information. It is not to indulge in an academic challenge of deducing results while ignoring some hints. Would we mind, for e.g., if it used user behaviour on orbitz.com to deduce the fall and rise of prices for a particular flight?

    When every search engine can potentially learn from each other's behaviour, the only real grave copying on the part of an engine would occur if it is similar to another engine at a deeper conceptual level.

    What's really needed is a way to make sure the that the learned information is transparently presented and attributed to the original source. I wrote about the need for plagiarism laws and a method of citations on my buzz: http://www.google.com/buzz/kulkarni.ankur/X51MtgUaRT3/Need-for-plagiarism-laws-and-citation-rules-for

  13. Abi said...

    @Ankur: Isn't there a problem when Bing produces a wrong answer to a search query, just because Google did so? Remember, for some of the honeypot phrases, Google had set up just one page, and it had nothing to do with the search query. Bing had no business dishing it out. Now, irrespective of my assumptions (or even yours), this is surely a problem?

  14. Navaneethan Santhanam said...

    @ Abi - well, if the only information available to Bing is the information from Google (and there is no other source), then I think it's fine. All the (search term, URL) pair is indicating is that there is some user interest in this particular link when they query for that search term. If a user wants to click on a link despite a nonsensical query, who is anyone to say that they are wrong? I think that search engines should reflect user interest, not JUST some top-down decision about what the company thinks is the relevant order to return results.

  15. Abi said...

    @Navaneethan: Shouldn't the search phrase be somewhere inside the pages that are generated? Here's what I said in my previous comment: Remember, for some of the honeypot phrases, Google had set up just one page, and it had nothing to do with the search query. Bing had no business dishing it out.

  16. Ankur Kulkarni said...

    Bing's result for the honeypot phrases is not a wrong answer any more! That is because Google's activity (deliberate or not) has added information that connects that result to the query. In the internet search regime, all observers (search engines) and their activities are a part of the system being observed.

  17. Ankur Kulkarni said...

    Abi: "Bing had no business dishing it out." I don't agree with this. There is no "objective" information on the internet that cannot be influenced by users or by other search engines. Bind would have had no business dishing the bogus result out, if google had planted the result and no user had ever seen it or clicked on it. Then Bing would be really stealing something from google.

  18. Abi said...

    @Ankur: QED!

    We also seem to have entered a deep quantum regime here ... ;-)

  19. Navaneethan Santhanam said...

    @ Abi - that was perhaps the case with earlier generations of search engines. They generated matches based on text or links on the target page. But that was when you had much less information to make the decision. With the latest generation rankers, there's just so much more - your social graph, location, time, your history, etc. Even though the result might appear irrelevant, there has been some interest in it from someone - and there appears to be no other information to use. So why not use the only thing you have?

    I think that Bing will have a harder time answering questions about why they didn't correct the misspelling that Google corrected - torsorophy vs tarsorrhaphy, I believe it was - and still managed to return the same result.

    I agree with what Ankur says - there's no "right" way to decide whether a link is valid or not.

  20. Ankur Kulkarni said...

    @Abi: I was about allude to some quantum stuff too, but I thought I should hold off, lest I sound stupid before someone who actually knows about it :-)

    What hurting google is possibly revenue that Bing can make using second-hand information from google. Google will have to figure out a way of either withholding information from Bing (e.g. "you can't access google.com with this Bing toolbar enabled. pl disable it and try again") or claiming credit for searches it has discovered.

  21. WebMiner said...

    My informed opinion is that Bing is legally well-covered in this matter. Doing something in (arguably) poor taste is not the same as doing something unethical. My guess is, for queries with enough real pages and click support, Bing and Google would go their correlated but non-identical ways, but when Bing's algorithm found no supporting evidence other than Google via toolbar, they defaulted to using that information alone. And yes, if someone downloaded my toolbar and clicked "I agree", I really do own all their click info. Definitely, if I used it for collective (ranking) decision making without revealing specific user identity, which is the current situation, there seems to be no strong legal charge against Bing. As has been pointed out here, "query word appears in anchor text of page" is as legitimate a signal as "Google returns page within top 10 for this query". Of course, this opens up the path to Google using the converse feature. There are monopoly/oligopoly-type reasons why such circularity can be a bad thing. But getting into a monopolistic position is not in itself a crime unless associated malpractice is demonstrated. Also, if anything, Google is the information monopoly here!

  22. jbeck said...

    The subtle issue with Google's complaint is its confusion about the domain of search... [it] stems from the assumption that Bing should search everything on the internet, except Google.com and the activity on it.

    To whoever wrote this, Google isn't complaining or dragging Microsoft to court. They are just dissing MS and its Bing team strongly - declaring that the Bing team is intellectually bankrupt. And this isn't a court of law where the "battle" is being fought, t is the court of intellectual opinion, where MS has clearly shown itself to be a cheapskate imitator of Google. It is the nerd's put down.
    I am not sure why no commentator here is ignoring the bigger battle underway; that of smartening up the webuser community, making it more desirous of the intellecutally challenging benefits of the net, rather than the trivial, pointless, and dumb world of Facebook. Facebook's continued growth doesn't spell the decline of Google or whatever. it portends far worse - the trivialization of the Internet/Web and applied information theory itself. So before you guys comment again, make a resolution to delte or atleast freeze your Facebook accounts.

  23. WebMiner said...

    @jbeck: I strongly doubt if this episode was intended to, or will succeed at "smartening up the webuser [sic.] community". In a basket of typical Web queries you may ask at either Bing or Google over a month or year, in how many cases will Bing's using Google as a ranking feature affect your prospects of getting to the best pages, as compared to using Google instead? If you are a Bing user, that is really the metric you should use before you switch to Google. However, the more likely scenario is that Bing will lose a large number of users because of these specific, highly-contrived queries that no one asks in a realistic workload, queries that were expressly designed to trip up Bing. I personally use Google much more than Bing, but that is a very informed decision, not based on gimmicks like these.

  24. Dilip D'Souza said...

    jbeck: For sure, a surprising turn to this discussion. While I haven't yet found much of value to me in FB and using FB (full disclosure, I have an account), I fail to understand why there is a particular need to "smarten up the webuser community" that I need to pay attention to. After all, the same need to smarten up could apply to the "community" of newspaper readers, or telephone users, or book readers, yet we haven't had people saying we need to make that effort. (I don't know, in the early days of phones, were there people anxious about dumbos using phones?)

    Please do explain.

    As for Bing vs Google: as much as the definition of search has moved on from the original idea of words in a document, I cannot agree that a document with absolutely no connection to the search term should be returned as a search result by Bing. If the sole connection is a few clicks (in this case by folks at Google doing their honeypot thing), then there should be filters to detect the tenuous nature of the connection.

  25. WebMiner said...

    "There should be filters" --- There will be, soon enough. There is no final stable point or grand unified theory for Web search (pity, but gives some of us a job). Forget Bing. Long back, anyone could host a Website of pizza recipes, and populate it using the Google API periodically. Google would then come back and crawl these sites, thinking it has always been correct about its top pizza recipe ratings. The real danger to one search engine feeding off another is such circular endorsement. In terms of progress of Web science or engineering, the Bing-Google tussle is of no consequence.