Friday, March 7, 2014

The Secrets of Popularity


The Secrets of Popularity on Internet

Once upon a time there were two guys at Stanford working on their PhDs. Two of the guys at Stanford were not satisfied with the current options for searching online, so they attempted to develop a better way.

Being long-time academics, they eventually decided to take the way academic papers were organized and apply that to webpages. 

A quick and fairly objective way to judge the quality of an academic paper is to see how many times other academic papers have cited it. 

This concept was easy to replicate online because the original purpose of the Internet was to share academic resources between universities. 

The citations manifested themselves as hyperlinks once they went online. One of the guy came up with an algorithm for calculating these values on a global scale, and they both lived happily ever after.

Of course, these two guys were Larry Page and Sergey Brin, the founders of Google, and the algorithm that Larry invented that day was what eventually became PageRank.

Relevance, Speed, and Scalability

Hypothetically, the most relevant search engine would have a team of experts on 
every subject in the entire world—a staff large enough to read, study, and 
evaluate every document published on the web so they could return the most accurate results for each query submitted by users.

The fastest search engine, on the other hand, would crawl a new URL the very second it’s published and introduce it into the general index immediately,available to appear in query results only seconds after it goes live.

The challenge for Google and all other engines is to find the balance between those two scenarios: To combine rapid crawling and indexing with a relevance algorithm that can be instantly applied to new content. 

In other words, they’re trying to build scalable relevance. With very few exceptions, Google is uninterested in hand-removing (or hand-promoting) specific content. Instead, its model is built around identifying characteristics in web content that indicate the content is especially relevant or irrelevant, so that content all across the web with those same characteristics can be similarly promoted or demoted.

No comments:

Post a Comment