An Introduction to the Search Engines' Tools for Webmasters
To encourage webmasters to create sites and content in accessible ways, each of the major search engines have built support and guidance-focused services. Each provides varying levels of value to search marketers, but all of them are worthy of understanding. These tools provide data points and opportunities for exchanging information with the engines that are not provided anywhere else.
The sections below explain the common interactive elements that each of the major search engines support and identify why they are useful. There are enough details on each of these elements to warrant their own blog posts, but for the purposes of this guide, only the most crucial and valuable components will be discussed.
Common Search Engine Protocols
Sitemaps are a formatted list of all of the pages on a given website. They are used to ensure that search engines can easily find the location of all of the webpages on a website and to assign each page a relative priority.
The sitemaps protocol (explained in detail at Sitemaps.org) is applicable to three different file formats:
Sitemaps can either be submitted directly to the major search engines or have their location specified in robots.txt.
The robots.txt file (a product of the Robots Exclusion Protocol) should be stored at a website's root directory (e.g., www.google.com/robots.txt). The file serves as an access guide for automated visitors (web robots). By using robots.txt, webmasters can indicate which areas of a site they would like to disallow bots from crawling as well as indicate the locations of sitemaps files (discussed below) and crawl-delay parameters. The following commands are available:
Example of Robots.txt:
# Don't allow spambot to crawl any pages
Warning: It is very important to realize that not all web robots follow robots.txt. People with bad intentions (e.g., e-mail address scrapers) build bots that don’t follow this protocol and in extreme cases can use it to identify the location of private information. For this reason, it is recommended that the location of administration sections and other private sections of publicly accessible websites not be included in the robots.txt. Instead, these pages can utilize the meta robots tag (discussed next) to keep the major search engines from indexing their high risk content.
The meta robots tag creates page-level instructions for search engine bots that govern everything from page inclusion to snippet controls and more.
The meta robots tag should be included in the head section of the HTML document.
Example of Meta Robots:
<title>The Best Webpage on the Internet</title>
<meta name=”ROBOT NAME” content=”ARGUMENTS” />
In this example, “ROBOT NAME” is the user-agent of a specific web robot (ex. Googlebot) or an asterisk to identify all robots, and “ARGUMENTS” is one of the meta/x-robots-tag below.
Allow access to your content
Disallow access to your content
Disallow access to index images on the page
Disallow the display of a cached version of your content in the SERP
Disallow the creation of a description for this content in the SERP
Disallow the translation of your content into other languages
Do not follow or give weight to links within this content
a href attribute:
Do not use the Open Directory Project (ODP) to create descriptions for your content in the SERP
Do not use the Yahoo Directory to create descriptions for your content in the SERP
Do not index this specific element within an HTML page
Stop indexing this content after a specific date
Specify a sitemap file or a sitemap index file
Specify how frequently a crawler may access your website
Authenticate the identity of the crawler
Reverse DNS Lookup
Request removal of your content from the engine's index
Source: Jane & Robot - Managing Robots' Access to Your Website
Nofollow is a common inline parameter that is adhered to by all of the major search engines. It is appended to links to prevent them from passing ranking power (or "link juice").
Example of nofollow:
<a href=”http://www.example.com” title=”Example”
An excellent and more comprehensive resource on robots.txt can be found at Jane & Robot - Managing Robots' Access to Your Website. Additionally, a printer friendly version of this information is available on The Web Developer’s SEO Cheat Sheet.
Search Engine Tools
The following tools are provided free of charge by the major search engines and enable webmasters to have more control over how their content is indexed.
Google Webmaster ToolsGoogle Webmaster Tools
Google Webmaster Tools
Google Webmaster Tools Sign Up
These statistics are a window into how Google sees a given website. Specifically, it identifies top search queries, crawl stats, subscriber stats, “What Googlebot sees” and Index stats.
This section provides details on links. Specifically, it outlines, external links, internal links and sitelinks. Sitelinks are section links that sometimes appear under websites when they are especially applicable to a given query.
This is the interface for submitting and managing sitemaps directly with Google.
Yahoo! Site ExplorerYahoo! Site Explorer
Yahoo! Site Explorer
Yahoo! Site Explorer Sign Up
Live Webmaster ToolsLive Webmaster Tools
Live Webmaster Center
Live Webmaster Center
It is a relatively recent occurrence that search engines have provided ways for webmasters to interact directly with crawlers. While this relationship is still not optimal, the search engines have made great strides toward opening their proprietary indices. This has been very helpful for webmasters who now rely so much on search driven traffic.
As always, comments and constructive criticism are appreciated. You'll note that I'm trying to go back to making this more of a true "beginner's" guide, as I'm concerned that the previous guide may have gone a bit too in-depth. Hopefully between Rand and me, we can finish this mammoth undertaking :)