3.3 How Google search works

To fully comprehend the issue of search problems tabloids might suffer from their writing style, it is very important to understand how the process of searching for informational the Web looks like (from the technological perspective). Briefly, it’s all about:

A Web crawler (also commonly known as a web spider or a web robot) is a programme which browses the content of the World Wide Web in a fully automated way. Crawlers visit Web pages and create copies for later indexing. In the case of Google such spiders are run through the Net in order to put those pages in Google’s servers.

Another trick Google is using to provide us with the fullest (and best organized) spectrum of search results is PageRank. The idea behind this tool is that most relevant pages (the ones with most links to them) are displayed first. This is how Google explains the technology:

“Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.” Using these and other factors, Google provides its views on pages’ relative importance” (Google, 2008)

Indexing is the process within which a search engine collects, parses and stores all the data provided by a crawler. The purpose is creating accuracy and speed, e.g. an index of 10.000 documents takes much quicker to search through than 10.000 documents (we’re talking milliseconds vs. hours!).

It seems that it can all be simplified in this way: first a spider searches the article and then all the words used there are indexed on the engine’s server. The probability that an article will be displayed on top depends on the relevance of the article to the searched query (e.g. how many times the keyword is used on the site), “importance” of the site (as judged by PageRank) or the amount paid to utilize paid search (as suggested by Cooper).

previous | next

Leave a Reply