Searching the Web

From WhyNotWiki

Jump to: navigation, search

Searching the Web  edit   (Category  edit)


Contents

[edit] Problem: Too much junk to filter out

Source: Clay Shirky: 2007-05-24: What are we going to say about "Cult of the Amateur"?


More importantly, talent is unevenly distributed, and everyone knows it. Indeed, one of the many great things about the net is that talent can now express itself outside traditional frameworks; this extends to blogging, of course, but also to music, as Clive Thompson described in his great NY Times piece, or to software, as with Linus’ talent as an OS developer, and so on. The price of this, however, is that the amount of poorly written or produced material has expanded a million-fold. Increased failure is an inevitable byproduct of increased experimentation, and finding new filtering methods for dealing with an astonishingly adverse signal-to-noise ratio is the great engineering challenge of our age (c.f. Google.)

[edit] Wikiasari vs. Google

Looks like I'm not the only one not completely satisfied with Google's search results. Way to go, Mr. Wales!

James Doran (2006-12-23). The Times: Founder of Wikipedia plans search engine to rival Google (http://technology.timesonline.co.uk/tol/news/tech_and_web/article1264117.ece). Retrieved on 2007-02-11 19:50.

Mr. Wales has begun working on a search engine that exploits the same user-based technology as his open-access encyclopaedia, which was launched in 2003.

The project has been dubbed Wikiasari — a combination of wiki, the Hawaiian word for quick, and asari, which is Japanese for “rummaging search”.

Mr. Wales told The Times that he was planning to develop a commercial version of the search engine through Wikia Inc, his for-profit company, with a provisional launch date in the first quarter of next year.

Earlier this year he secured multimillion-dollar funding from amazon.com and a separate cash injection from a group of Silicon Valley financiers to finance projects at Wikia.

However, it is understood that amazon has also collaborated with Mr. Wales on the search engine project and is expected to lend its support to the venture in the future.

Mr. Wales, a 40-year-old former options trader, believes that, as the popularity of Google has grown, obvious flaws in its search engine technology have become apparent.

“Google is very good at many types of search, but in many instances it produces nothing but spam and useless crap. Try searching for the term ‘Tampa hotels’, for example, and you will not get any useful results,” he said.

Spammers and commercial ventures are also learning how to manipulate Google’s computer-based search, he added.

Mr. Wales believes that Google’s computer-based algorithmic search program is no match for the editorial judgment of humans.

Google searches are conducted using an algorithm that calculates how many other websites are linked to a certain site, which in turn gives the material found by the search a ranking. Therefore, the first result in any Google search is the website that has the most links pointing to it.

Wikipedia is an encyclopaedia written by thousands of contributors from around the world, known as “Wikipedians”, using free open-source software.

Mr Wales aims to exploit the same network of followers and the same type of free software to create his search engine.

“Essentially, if you consider one of the basic tasks of a search engine, it is to make a decision: ‘this page is good, this page sucks’,” Mr Wales said. “Computers are notoriously bad at making such judgments, so algorithmic search has to go about it in a roundabout way.

“But we have a really great method for doing that ourselves,” he added. “We just look at the page. It usually only takes a second to figure out if the page is good, so the key here is building a community of trust that can do that.

Mr Wales believes that the reputation already fostered by his Wikipedia community and the transparency of his technology will build sufficient trust in his search engine to bring in advertising revenue and make the Wikiasari venture profitable.

“The revenue model of search is advertising. Transparency in search, therefore, is like transparency in news. If the quality is there people will come.

[edit] List of example searches / difficulties

[edit] Problem: Searching for an error message produces mostly problem reports, not necessarily solutions

I don't want to see posts by people who have had the same problem as me ... unless' it includes/links to the solution! Then I definitely watn to see it!

Example:

Debugging an ssh connection that's not working. I was getting debug2: "no key of type 0". I Googled it. Many of the hits returned were posts to mailing lists of people's ssh ___ -vvv output. Not helpful.


[edit] Problem: Too many hits for the wrong type of result: Movie title heavily skews results

For example, let's just say hypothetically that you're searching for the phrase "all the president's men" and you don't want to see hits for the movie of that name. That could be a problem. In my Google search (2007-11-27 11:59), the top 10 results were for the movie or the book of that name.

There should be some way to tell it to exclude from the result set any hits that have to do with a movie. But in order for that to be possible, either people need to include enough metadata on their pages that one can easily determine whether it is about a movie or the search engine needs to use some clever artificial intelligence / heuristics to be able to make the same determination in the absence of metadata.

[edit] "car key shapes"

I was searching for information about the various shapes / cut patterns for car keys employed by different automobile manufacturers. You know, a typical Toyota key has a certain characteristic shape / look to it that is different from a typical Ford key. But how would you search for such a comparison page, if one exists?

None of these searches were fruitful:

  • "car key shapes"
  • toyota ford honda key shape
  • toyota ford honda key shape comparison
  • toyota ford honda "key shape" comparison
  • car "key shape" comparison
  • "car key" "shape comparison"
  • toyota "car keys" -rfid -transponder -replacement interchangeable
  • toyota honda "car keys" characteristic
  • toyota honda "car key" characteristic shape
  • various shapes cut patterns automobile manufacturers typical Toyota key characteristic shape typical Ford key
  • "Toyota key" "Ford key" comparison

I ended up giving up.

[edit] Links

http://www.searchengineshowdown.com/

Personal tools