The Google way of searching and indexing

October 30, 2010 | by

Users who search for information on Google would attest the fact that this search engine gets them impressive results, each and every time. Google performs the job fast, and performs it well. In conventional terms Google’s inner workings can be compared to a crack librarian, but unlike its human counterpart Google does its job literally in microseconds.

How Google Works

The work of Google as a search engine can be described as a series of webpage crawling, indexing, database maintenance and data retrieval on command.

Web Crawling

Search engines employ software to do the act of crawling pages in the internet, and with Google it is known as Googlebot. Crawling is necessary to add content to a search engine’s database. This process harvests new pages and updates old ones for indexing. Mainly, the Googlebot will track HREF and SRC links in the pages already crawled, directing it to other pages for harvesting.

As Googlebot crawls within the internet, it may come across defunct links that were previously included in the index. These are removed. New sites are added and old ones are replaced with their newer versions. Although not all types of information can be crawled by Googlebot, the process serves to index and re-index the ever growing database continuously.

The ultimate goal of this function is to ensure each search made by users will return up-to-date information all the time.

In addition to crawling, websites may be submitted manually using Google’s Add URL feature. This might look like an ideal gateway for spamming links, but the form used in this feature has built-in measures to exclude spam.

Items in the search database are indexed alphabetically. The database is accessed through the input of keywords in the browser to locate relevant content stored within.

How a Google Search Retrieves Specific Items from the Database

The goal of a Google Search is to return the most relevant content in the database in as little time as possible. For this purpose Google’s index is optimized for easy data retrieval using a variety of criteria. These include PageRank and keyword density, and scores of other elements. By PageRank it means that a website is indexed using its popularity.

PageRank is a point-score system used by Google to index web pages. It scores a webpage in its index by how relevant its content is to a search term and the volume of web traffic it receives. In effect this reflects the webpage’s popularity. A high score denotes a high PageRank.

The more back links there are to a website, the more it is implied that it has relevant information. Google finds this desirable in a website. It prioritizes inward links as a sort of vote by other websites for the site in question, more so than reciprocal links which it considers as inconclusive to the quality of site content.

To make searches more concise, words like the, why, who and other stop words are filtered out before Google processes a search. These items are negligible and do not affect search results.

As further criteria to determining page relevance, Google excludes meta tags and concentrates its assessment on the actual page content.

Google’s Technology

The machinery behind Google that enables it to make flawless searches is divided into several components. First a user input, called a query, is entered in the browser. This query is forwarded to the Web Server and subsequently to the Index Servers, and from there is transferred to the Doc Servers. The last server type generates search result descriptions called snippets, which are usually fragments of content included in the results. The results are then presented to the user. The longest it takes for this process to be completed is one second.

The corporate creed of Google to deliver the best possible user experience shows in the continuous efforts to refine its search engine. New features are added regularly, and existing ones are constantly enhanced. Needless to say, Google as a search engine is topnotch. This is evident in the number of users who depend on Google to get them perfect results for their web searches.


View all

view all