Search Engine Crawling

google

Search Engines are the organizing kings of the Internet. In the chaos of information, the search engine (whichever you choose) helps you find web sites according to the terms you suggest. Of course, your end is quite simple. The job of the search engine is, in comparison, rather complex. In fact, here are some Google facts that may help put the task of a search engine in perspective.

  • Google employs more than 50 PhD level software engineers.
  • It takes 10,000 servers to keep Google running.
  • Google indexes BILLIONS, yes with a b, of web pages.
  • The Google data center is said to require 103 megawatts of electricity (enough to power 82,000 homes, or a city the size of Tacoma, WA); this power requirement is offset by solar panels on Google’s buildings.


With all that said, what is it that the software actually does? Enter GoogleBot. Ben Rathbone, said of GoogleBot in 2005 when he was at Google’s Hardward Operations:

I pondered the question: what does Google do? The grossly simplified answer that I came up with is Google connects the world with the Internet.

It all snapped into place: the idea of a robot, connecting a world with the Internet, with wires, that connect to big cabinets of computers. It was not hard then to make the leap to representing the Internet as a world, or globe, made up of pages.

GoogleBot

GoogleBot is a program that visits your website, reads each page, and follow the links within your site. All words used on the page(s) can later be used for finding your page when any combination of the words is used in a search. This function is called “crawling” or “spidering,” in reference to the way the so called robot finds its way through your website.

After reading the words contained in your page, the information is sent to the analyzing component of the software. Words and phrases are evaluated and readied for indexing. Once submitting your site to Google, you won’t need to have it re-crawled until the next Google Update.