Rewriting the Beginner's Guide Part VIII: Search Engine Tools and Services

drow names

An Introduction to the Search Engines' Tools for Webmasters


To encourage webmasters to create sites and content in accessible ways, each of the major search engines have built support and guidance-focused services. Each provides varying levels of value to search marketers, but all of them are worthy of understanding. These tools provide data points and opportunities for exchanging information with the engines that are not provided anywhere else.


The sections below explain the common interactive elements that each of the major search engines support and identify why they are useful. There are enough details on each of these elements to warrant their own blog posts, but for the purposes of this guide, only the most crucial and valuable components will be discussed.


Common Search Engine Protocols


Sitemaps

Sitemaps are a formatted list of all of the pages on a given website. They are used to ensure that search engines can easily find the location of all of the webpages on a website and to assign each page a relative priority.


The sitemaps protocol (explained in detail at Sitemaps.org) is applicable to three different file formats:


XMLXML

Pros

Cons


RSSRSS

Pros

Cons


TxtTxt

Pros

Cons


Sitemaps can either be submitted directly to the major search engines or have their location specified in robots.txt.


Robots.txt

The robots.txt file (a product of the Robots Exclusion Protocol) should be stored at a website's root directory (e.g., www.google.com/robots.txt). The file serves as an access guide for automated visitors (web robots). By using robots.txt, webmasters can indicate which areas of a site they would like to disallow bots from crawling as well as indicate the locations of sitemaps files (discussed below) and crawl-delay parameters. The following commands are available:




Disallow

Sitemap

Crawl Delay

Example of Robots.txt:

#Robots.txt www.example.com/robots.txt

User-agent: *

Disallow:


# Don't allow spambot to crawl any pages

User-agent: spambot

Disallow: /


sitemap:www.example.com/sitemap.xml

Warning: It is very important to realize that not all web robots follow robots.txt. People with bad intentions (e.g., e-mail address scrapers) build bots that don’t follow this protocol and in extreme cases can use it to identify the location of private information. For this reason, it is recommended that the location of administration sections and other private sections of publicly accessible websites not be included in the robots.txt. Instead, these pages can utilize the meta robots tag (discussed next) to keep the major search engines from indexing their high risk content.


Meta Robots

The meta robots tag creates page-level instructions for search engine bots that govern everything from page inclusion to snippet controls and more.


The meta robots tag should be included in the head section of the HTML document.


Example of Meta Robots:

<html>

<head>

<title>The Best Webpage on the Internet</title>

<meta name=”ROBOT NAME” content=”ARGUMENTS” />

</head>

<body>

<h1>Hello World</h1>

</body>

</html



In this example, “ROBOT NAME” is the user-agent of a specific web robot (ex. Googlebot) or an asterisk to identify all robots, and “ARGUMENTS” is one of the meta/x-robots-tag below.

Use Case

Robots.txt

META/ X-Robots-Tag

Other

Supported By

Allow access to your content

Allow

FOLLOW

INDEX


Google

Yahoo

Microsoft

Disallow access to your content

Disallow

NOINDEX

NOFOLLOW


Google

Yahoo

Microsoft

Disallow access to index images on the page


NOIMAGEINDEX


Google

Disallow the display of a cached version of your content in the SERP


NOARCHIVE


Google

Yahoo

Microsoft

Disallow the creation of a description for this content in the SERP


NOSNIPPET


Google

Yahoo

Microsoft

Disallow the translation of your content into other languages


NOTRANSLATE


Google

Do not follow or give weight to links within this content


NOFOLLOW

a href attribute:

rel=NOFOLLOW

Google

Yahoo

Microsoft

Do not use the Open Directory Project (ODP) to create descriptions for your content in the SERP


NOODP


Google

Yahoo

Microsoft

Do not use the Yahoo Directory to create descriptions for your content in the SERP


NOYDIR


Yahoo

Do not index this specific element within an HTML page



class=robots-nocontent

Yahoo

Stop indexing this content after a specific date


UNAVAILABLE_AFTER


Google

Specify a sitemap file or a sitemap index file

Sitemap



Google

Yahoo

Microsoft

Specify how frequently a crawler may access your website

Crawl-Delay


Google WMT

Yahoo

Microsoft

Authenticate the identity of the crawler



Reverse DNS Lookup

Google

Yahoo

Microsoft

Request removal of your content from the engine's index



Google WMT

Yahoo SE

Microsoft WMT

Google

Yahoo

Microsoft

Source: Jane & Robot - Managing Robots' Access to Your Website


Rel="nofollow"

Nofollow is a common inline parameter that is adhered to by all of the major search engines. It is appended to links to prevent them from passing ranking power (or "link juice").


Example of nofollow:

<a href=”http://www.example.com” title=”Example”

rel=”nofollow”>Example Link</a>

An excellent and more comprehensive resource on robots.txt can be found at Jane & Robot - Managing Robots' Access to Your Website. Additionally, a printer friendly version of this information is available on The Web Developer’s SEO Cheat Sheet.


Search Engine Tools

The following tools are provided free of charge by the major search engines and enable webmasters to have more control over how their content is indexed.


Google Webmaster ToolsGoogle Webmaster Tools

Google Webmaster Tools


Sign Up


Google Webmaster Tools Sign Up


Settings


Geographic target


Preferred Domain


Image Search

Crawl Rate


Diagnostics

Web Crawl


Mobile Crawl


Content Analysis


Statistics

These statistics are a window into how Google sees a given website. Specifically, it identifies top search queries, crawl stats, subscriber stats, “What Googlebot sees” and Index stats.


Link Data

This section provides details on links. Specifically, it outlines, external links, internal links and sitelinks. Sitelinks are section links that sometimes appear under websites when they are especially applicable to a given query.


Sitemaps

This is the interface for submitting and managing sitemaps directly with Google.


Yahoo! Site ExplorerYahoo! Site Explorer

Yahoo! Site Explorer


Sign Up


Yahoo! Site Explorer Sign Up


Features


Statistics


Feeds


Actions

Live Webmaster ToolsLive Webmaster Tools

Live Webmaster Center


Sign Up


Live Webmaster Center


Features


Profile

Crawl Issues


Backlinks

Outbound Links

Keywords


Sitemaps


It is a relatively recent occurrence that search engines have provided ways for webmasters to interact directly with crawlers. While this relationship is still not optimal, the search engines have made great strides toward opening their proprietary indices. This has been very helpful for webmasters who now rely so much on search driven traffic.

As always, comments and constructive criticism are appreciated. You'll note that I'm trying to go back to making this more of a true "beginner's" guide, as I'm concerned that the previous guide may have gone a bit too in-depth. Hopefully between Rand and me, we can finish this mammoth undertaking :)




getloadedinthepark

4.7 Star App Store Review!
Cpl.dev***uke
The Communities are great you rarely see anyone get in to an argument :)
king***ing
Love Love LOVE
Download

Select Collections