O'Reilly logo

Google Advertising Tools, 2nd Edition by Harold Davis

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. Nuts and Bolts of SEO

SEO—short for Search Engine Optimization—is the set of techniques used to drive web traffic to websites. These techniques include optimization of the code, content, and organization of a site. While some SEO strategies do involve paying for inbound links, generally SEO is considered to exclude advertising.

Note

Advertising on search engines is thought of as Search Engine Marketing (SEM). SEO campaigns should often be used in tandem with SEM as I explain in Chapter 2. Advertising programs, such as Google AdWords (see Part III), can very effectively generate traffic via ads placed on web pages as well as on search engine result pages.

Most web businesses need traffic to succeed. Some websites depend on broad, general traffic. These businesses need hundreds of thousands or millions of visitors per day to prosper and thrive.

Other web businesses are looking for high-quality, targeted traffic. This traffic is essentially like a prequalified sales prospect—they are already looking for and interested in your offering.

This chapter tells you what you need to know, and explains the tools you’ll use, to draw more traffic to your site. Google is by far the most important search engine, so most SEO targets Google (but what works for Google also works for the other search engines). You’ll learn how to effectively use PageRank and Google itself. Effective use of SEO means understanding how Google works, how to boost placement in Google search results, and how not to offend Google. You’ll also learn how to best organize your web pages and websites, apply SEO analysis tools, establish effective SEO best practices, and more.

When you approach SEO, make sure you’ve worked through the analysis I suggested in Chapter 2 to understand the characteristics of the traffic that you need. Then use the techniques explained in this chapter to boost your traffic—and your business.

SEO’s Evolution

Originally fairly narrowly conceived as a set of techniques for rising to the top of search engine listings, Search Engine Optimization (SEO) has conceptually expanded to include all possible ways of promoting web traffic.

Learning how to construct websites, web content, and pages to improve—and not harm—the search engine placement of those websites and web pages has become a key component in the evolution of SEO. This central goal of SEO is sometimes called core SEO, as opposed to broader, noncore, web traffic campaigns, which may include lobbying and paying for links.

Effective SEO requires understanding that there needs to be a balance between attracting search engines and interesting human visitors. One without the other does not succeed. There must be balance between the yin of the search bot and the yang of the real human site visitor.

The basics of SEO involve these steps:

  • Understanding how your pages are viewed by search engine software

  • Creating content that is attractive to the search engine indexing

  • Taking common-sense steps to make sure that the way your pages are coded is optimized from the viewpoint of these search engines

Note

Fortunately, this often means practicing good design, which makes your sites easy to use for human visitors as well. In addition, software is becoming available that automates the nuts and bolts of SEO page coding. For example, you can optimize the HTML tagging of a WordPress blog for SEO using the All-in-One SEO plug-in.

  • Organizing your site in a way that makes sense to the search engine index

  • Avoiding some overaggressive SEO practices that can get your site blacklisted by the search engines

Search engine placement means where a web page appears in an ordered list of search query results. It’s obviously better for pages to appear higher up and toward the beginning of the list returned by the search engine in response to a user’s query.

Not all queries are created equal, so part of effective SEO is to understand which queries matter to a specific website. It’s relatively easy to be the first search result returned for a query that nobody else cares about. On the other hand, it’s very tough to become a top-ranking response to a topical and popular search query that reflects the interest of millions of people (and has a corresponding number of search results all competing for visibility).

Clearly, driving traffic to a website can make the difference between commercial success and failure. SEO experts have come to look at search engine placement as only one of their tools—and to look at the broader context of web technologies and business mechanisms that help create and drive traffic.

SEO has become an advertising discipline that must be measured using the metrics of cost effectiveness that are applied to all advertising techniques.

The SEO Advantage

If you understand SEO, you have an edge. It pays to nurture this understanding—whether you are coding your web pages yourself, working with in-house developers, or outsourcing your web design and implementation.

There may be some sites that do just fine without consciously considering SEO. But by intentionally developing a plan that incorporates SEO into your websites and web pages, your web properties will outrank others that do not.

Just as success begets success in the brick-and-mortar world, online traffic begets traffic. What you plan to do with the traffic, and how you plan to monetize it, are other issues. Making money with web content is considered in Chapters 5 and 6.

One way to look at this is that sites that use core SEO have an incrementally higher ranking in search results. These sites don’t make gauche mistakes that cost them points in search engine ranking. They use tried-and-true SEO techniques to gain points for each web page.

Page by page, these increments give you an edge.

This edge is your SEO advantage.

What SEO Can (and Cannot) Do

SEO can drive more traffic to your website. If you plan carefully, you can also affect the kind and quality of traffic driven to your site. This means that you need to consider SEO as part of your general market research and business plan, as explained in Chapter 2. Sure, most businesses want traffic—but not just any traffic. Just as the goal of a brick-and-mortar business is to have qualified customers—ones with money in their pocket who are ready to buy when they walk in the door—an online business wants qualified traffic.

Qualified traffic is not just any traffic. It is made up of people who are genuinely interested in your offering, who are ready to buy it, and who have the means to buy it. This implies that to successfully create an SEO campaign, you need to plan. You need to understand your ideal prospect and know her habits, and create a step-by-step scheme to lure her to your site, where she can be converted to a customer. It should be natural and easy to perform this kind of planning if you’ve first followed the marketing plan suggestions in Chapter 2.

SEO cannot spin gold from straw or make a purse out of a sow’s ear. Garbage sites—those that have come to be known as spam websites—will not draw huge amounts of traffic. Or if they do, these sites won’t draw traffic for long. Google and other search engines will pull the plug as soon as they detect what is going on.

Note

There’s a web content arms race going on, with spammers and malware creators on one side and Google and other content indexers and evaluators on the other. For Google and other search engines to stay in business, the results they deliver have to be meaningful to users. This means that there’s only so much content garbage that Google will put up with.

Just as email spammers keep trying to outwit email filters, and filters keep getting better in response, content spammers and Google are involved in an arms race.

Your goal with SEO should not involve content spam, sometimes referred to as black hat SEO. Instead, be one of the good guys, a white hat SEO practitioner. SEO skills are SEO skills whether the hat is black or white. But use your white hat SEO skills to draw even more traffic to sites that are already good and useful. In other words, SEO should be used in an ethical and legitimate fashion to add genuinely interesting and valuable content—and not as part of a scam to rip people off!

SEO needs to be regarded as an adjunct to the first law of the web: good content draws traffic. There is no substitute for content that people really want to find.

While SEO best practices should always be observed, there needs to be a sense of proportion in how SEO is used. It may not make sense to create a “Potemkin village” using SEO to draw traffic to a site if the site itself doesn’t yield high returns. In other words, SEO that is costly to implement is becoming regarded as one more aspect of advertising campaign management—and subject to the same discipline of cost-benefit analysis applied to all other well-managed advertising campaigns.

You’ll find a wide range of SEO analysis tools available to help you optimize your web pages and sites:

Free tools

Free tools generally tackle a single piece of the SEO puzzle, such as generating good keywords, understanding how Google operates on specific sites and keywords, checking who links to your sites, and displaying rankings in multiple search engines at once.

Google Webmaster Tools

The Google Webmaster Tools, which are free and were partially explained in Chapter 3, provide many of the analysis features of third-party, one-off free tools.

Commercial SEO analysis software

Commercial software costs money to license (obviously!) and is mainly intended for administration of multiple SEO-improved sites.

Free Tools

There are a myriad of free SEO tools available, and there are many sites that list these free tools. (The compendium sites are generally supported by advertising, and must therefore practice good SEO themselves to be successful!)

Some good sites that list (and provide links) to free SEO analysis tools are http://www.trugroovez.com/free-seo-tools.htm, http://www.webuildpages.com/tools/, and http://www.seocompany.ca/tool/11-link-popularity-tools.html.

Some of the most useful (and free!) single-purpose SEO tools are:

NicheBot

A keyword discovery tool that helps pinpoint the right keywords for optimization

Note

As I mentioned in Chapter 2, keyword lists such as those generated by NicheBot are useful for marketing purposes.

The SERPS Tool

The SERPS (or search engine positioning) tool helps you discover your ranking across Google and other major search engines in one fell swoop

Meta Tag Analyzer

Checks meta information for errors and relevance to page content

These tools can definitely be time-savers, particularly if you have a large amount of content you need to optimize. The price is certainly right!

Individual tools also can serve as a reality check—by running your pages through one of these tools, you can get a pretty good feeling for how well you have optimized a page. However, you should bear in mind that there is nothing that one of these tools can do for you that also cannot be done by hand given the knowledge you learn from this chapter.

Individually, SEO analysis tools available on the web can help you with your SEO tasks. However, to get the most from these tools, you need to understand underlying SEO concepts as explained in this chapter before you use these tools.

Over time, as you progress with SEO, you will probably accumulate your own favorite SEO analysis toolkit.

Google Webmaster Tools

The Google Webmaster Tools provide several features that help with your inspection and analysis of your websites and pages. For information about starting the tools and launching them in reference to a specific site, see Chapter 3.

Formerly, many of the tools needed for effective SEO work had to be found piecemeal. While some SEO practitioners still prefer their own toolkits assembled from the web resources mentioned in Free Tools, the truth is that most essential tasks can be performed using the Google Webmaster Tools.

The most important analysis tools are “Crawl errors” and “HTML suggestions” (both in the Diagnostics tab) and “Your site on the web.”

Crawl errors

As shown in Figure 4-1, the crawl errors page shows you errors in your URLs as listed in your site map (see Taking Advantage of Site Mapping). In addition, the report will show you various other kinds of problems with the links (URLs) in your site—and most important, links that lead to HTTP 404 errors because the file can’t be found.

The crawl errors tool tells you about problems with the URLs on your site
Figure 4-1. The crawl errors tool tells you about problems with the URLs on your site

Note

The errors listed in the crawl errors report should be used as the basis for further inquiry, but not necessarily considered definitive. URLs that are listed as errors may simply be links that the Googlebot cannot follow for a variety of reasons.

HTML suggestions

The HTML suggestions page shows you potential problems with meta descriptions, title tags, and whether there is any content that is not indexable. As with the crawl errors page, any reported errors should be taken as the starting place for investigation rather than gospel.

Figure 4-2 shows an HTML suggestions report listing some meta description and title tag problems.

The HTML suggestions tool helps you pinpoint problems you can fix with meta descriptions, title tags, and content that cannot be indexed
Figure 4-2. The HTML suggestions tool helps you pinpoint problems you can fix with meta descriptions, title tags, and content that cannot be indexed

Your site on the web

The “Your site on the web” page (Figure 4-3) helps give webmasters a window into how their sites appear to the Googlebot. This page shows top search queries, inbound links to your site, the anchor text of those links (i.e., the text that is visible in the hyperlink as opposed to the destination URL), and the most common words found on your site.

It’s important to be clear that the link text appears on the sites that link to you. Inbound linking plays a very important role in Google’s evaluation of your site, and the anchor text that accompanies links is what the bot uses to understand other sites’ ideas about your site. Anchor text that is not descriptive, for example, “Click here,” doesn’t help the bot. If you find sites that are linking to you with undescriptive anchor text, it’s good SEO practice to work with these sites to improve the quality of these links.

“Your site on the web” also provides information about important keywords found on your pages, as well as keywords used in external links to your site. This is a great place to see how well the content is optimized for your target keywords. If they’re not on the list, or not high on the list, it’s really easy to see where more work is needed.

“Your site on the web” gives you the chance to view your site as Google sees it
Figure 4-3. “Your site on the web” gives you the chance to view your site as Google sees it

Note

Even if you’re not interested in SEO (but who isn’t?!), it’s worth running “Your site on the web” from time to time and glancing at the Keywords tab. If strange terms like hydrocodone and prescription have crept into the keyword table, it is likely that you’ve been hacked (assuming you don’t sell drugs on your site). Spam text of this sort may not be visible to human viewers of your site, so it is good to have a mechanism for making sure you haven’t been hacked.

Commercial SEO Analysis Software

Commercially licensed SEO analysis software is unlikely to prove worth its cost unless you are in the business of performing SEO for numerous websites and a great deal of content, or for a major enterprise.

If the advertising for this kind of software claims too much, beware!

If you do want to look into licensed SEO software, some of the better-known commercial SEO analysis products are:

Keyword Elite

Keyword analysis

SEO Elite

Automated SEO analysis

Lyris HQ

Provides a hosted suite of integrated web tools that includes marketing services, email, search marketing, web analytics, and web content management (CMS)

SEO Administrator

A suite of SEO analysis tools

Analysis tools are extremely important, not so much for the numbers they provide as for the insights that webmasters can get into improving the usability of their sites. You’ll find material in Chapter 13 about Google Analytics.

Note

If your need for analysis can be categorized as “heavy-duty enterprise,” you should look at software such as Omniture or WebTrends to see if your needs can be better served than by Google Analytics.

More About How Your Site Appears to a Bot

The Google Webmaster Tools, explained in the previous section, help give you some notion of Google’s evaluation of your website. But let’s step back for a second and look at some general issues:

  • Why this evaluation is very important

  • How you can get a better intuitive feeling for bot evaluation

To state the obvious, before your site can be indexed by a search engine, it has to be found by the search engine. Search engines find websites and web pages using software that follows links to crawl the Web. This kind of software is variously called a crawler, a spider, a search bot, or simply a bot (bot is a diminutive for “robot”).

Note

You may be able to short circuit the process of waiting to be found by the search engine’s bot by submitting your URL or site map directly to search engines, as explained in Chapter 3.

To be found quickly by a search engine bot, it helps to have inbound links to your site. More important, the links within your site should work properly. If a bot encounters a broken link, it cannot reach, or index, the page pointed to by the broken link.

Images

Pictures don’t mean anything to a search bot. The only information a bot can gather about pictures comes from the file name, from the alt attribute used within a picture’s <img> tag, from text surrounding the picture, and in some cases from the image meta data. Therefore, always take care to provide description information via alt along with your images and at least one link (outside of an image map) to all pages on your site.

Note

While effective automated image analysis is still largely in the lab, or in use by the military, its day is coming to the Internet. It’s likely that by the time this book goes into its next edition, Google and others will have added at least rudimentary image recognition features to their bots and crawlers.

Some kinds of links to pages (and sites) simply cannot be traversed by a search engine bot. The most significant issue is that a bot cannot log in to your site. If a site or page requires a username and a password for access, then it probably will not be included in a search index.

Note

Don’t be fooled by seamless page navigation using such techniques as cookies or session identifiers. If an initial login is required, these pages probably cannot be accessed by a bot.

When I was writing the previous edition of this book, there were some issues with search engine navigation of dynamic URLs. Dynamic URLs are generated on the server side from a database, and can often be recognized by characters in the URL such as ?, &, and =. The current word from Google is that there are no problems with dynamic URLs, and it may not be worth creating server-side rules to rewrite dynamic URLs to make them look more like static URLs.

Note

Navigation accomplished using JavaScript code can also be a problem, and the best bet for implementing complex navigations is to use CSS menus.

File Formats

Most search engines and search engine bots are capable of parsing and indexing many different kinds of file formats. For example, Google states: “We are able to index most types of pages and files with very few exceptions. File types we are able to index include: pdf, asp, jsp, html, shtml, xml, cfm, doc, xls, ppt, rtf, wks, lwp, wri, swf.”

However, simple is often better. To get the best search engine placement, you are well advised to keep your web pages, as they are actually opened in a browser, to straight HTML. Note a couple of related issues:

  • A file with a suffix other than .htm or .html can contain straight HTML. For example, when they are opened in the browser, .asp, .aspx, .cfm, php, and .shtml files often consist of straight HTML (it has, of course, been generated by server-side software).

  • Scripts (such as a PHP program) or include files (such as an .shtml page) running on your web server usually generate HTML pages that are returned to the browser. This architecture is shown in Figure 4-4. An important implication: check the source file as shown in a browser rather than the script file used to generate a dynamic page to see what the search engine will index.

Server-side programs usually return “straight” HTML to the browser
Figure 4-4. Server-side programs usually return “straight” HTML to the browser

Google puts the “simple is best” precept this way: “If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.” The only way to know for sure whether a bot will be unable to crawl your site is to check your site using an all-text browser.

So go ahead, find out for sure. View your site in an all-text browser. It’s easy. And fun.

Viewing Your Site with an All-Text Browser

Improvement implies a feedback loop: you can’t know how well you are doing without a mechanism for examining your current status. The feedback mechanism that helps you improve your site from an SEO perspective is to view it as the bot sees it. While information shown by the Google Webmaster Tools and other helpers is useful, nothing beats a text-only view of your site.

This means viewing the site using a text-only browser. A text-only browser, just like the search engine bot, will ignore images and graphics and only process the text on a page.

The best-known text-only web browser is Lynx. You can find more information about Lynx at http://lynx.isc.org/. Generally, the process of installing Lynx involves downloading source code and compiling it.

Note

The Lynx site also provides links to a variety of precompiled Lynx builds you can download.

Don’t want to get into compiled source code or figuring out which idiosyncratic Lynx build to download? There is a simple Lynx Viewer available on the Web at http://www.delorie.com/web/lynxview.html.

All you have to do is navigate to this page in your browser and submit the URL for your site. The text view of the page you’ll see (Figure 4-5) is starkly simpler than the comparable fully rendered view with images and graphics (Figure 4-6). The contrast may be a real eye opener. You should take the time to “crawl” the text link in your site to understand site navigation as well as page appearance from the bot’s viewpoint.

Excluding the Bot

There are a number of reasons you might want to block bots from all, or part, of your site. For example, if your site is not complete, if you have broken links, or if you haven’t prepared your site for a search engine visit, you probably don’t want to be indexed yet. You may also want to protect parts of your site from being indexed if those parts contain sensitive information or pages that you know cannot be accurately traversed or parsed.

Note

Google requests that you block URLs that will give the bot hiccups—for example, dynamic URLs that include calendar information that have the potential for infinite expansion. You can block individual URLs using a nofollow attribute value in the anchor tag of the URL itself. For example:

<a rel="nofollow" href="botcantgohere" />No follow me</a>
Lynx Viewer makes it easy to focus on text and links without the distraction of the image-rich version
Figure 4-5. Lynx Viewer makes it easy to focus on text and links without the distraction of the image-rich version

If you need to, you can make sure that part of your site does not get indexed by any search engine.

Note

Following the no-robots protocol is voluntary and based on the honor system. So all you can really be sure of is that a legitimate search engine that follows the protocol will not index the prohibited parts of your site from the root of your site (if there are external links to excluded pages, these may still be traversed regardless of your policy file). Don’t rely on search engine exclusion for security. Information that needs to be protected should be in password-protected locations, and protected by software hardened for security purposes.

Compared with the identical page in a text-only view (), it’s hard to focus on just the text and links
Figure 4-6. Compared with the identical page in a text-only view (Figure 4-5), it’s hard to focus on just the text and links

The robots.txt File

To block bots from traversing your site, place a text file named robots.txt in your site’s web root directory (where the HTML files for your site are placed). The following syntax in the robots.txt file blocks all compliant bots from traversing your entire site:

User-agent: *
Disallow: /

You can exercise more granular control over which bots you ban and which parts of your site are off-limits as follows:

  • The User-agent line specifies the bot that is to be banished.

  • The Disallow line specifies a path relative to your root directory that is banned territory.

Note

A single robots.txt file can include multiple User-agent bot bannings, each disallowing different paths.

For example, you would tell the Google search bot not to look in your cgi-bin directory (assuming the cgi-bin directory is right beneath your web root directory) by placing the following two lines in your robots.txt file:

User-agent: googlebot
Disallow: /images

Warning

As I’ve mentioned, the robots.txt mechanism relies on the honor system. By definition, it is a text file that can be read by anyone with a browser. Don’t rely on every bot honoring the request within a robots.txt file, and don’t use robots.txt in an attempt to protect sensitive information from being uncovered on your site by humans (this is a different issue from using it to avoid publishing sensitive information in honest search engine indexes like Google). In fact, someone trying to hack your site might specifically read your robots.txt file in an attempt to uncover site areas that you deem sensitive.

For more information about working with the robots.txt file, see the Web Robots FAQ. You can also find tools for managing and generating custom robots.txt files and robot meta tags (explained later) at http://www.rietta.com/robogen/ (an evaluation version is available for free download).

Meta Robot Tags

The Googlebot and many other web robots can be instructed not to index specific pages (rather than entire directories), not to follow links on a specific page, and to index but not cache a specific page, all via the HTML meta tag placed inside of the head tag.

Note

Google maintains a cache of documents it has indexed. The Google search results provide a link to the cached version in addition to the version on the Web. The cached version can be useful when the Web version has changed and also because the cached version highlights the search terms (so you can easily find them).

The meta tag used to block a robot has two attributes: name and content. The name attribute is the name of the bot you are excluding. To exclude all robots, you’d include the attribute name="robots" in the meta tag.

To exclude a specific robot, the robot’s identifier is used. The Googlebot’s identifier is googlebot, and it is excluded by using the attribute name="googlebot". You can find the entire database of registered and excludable robots and their identifiers (currently about 300) at http://www.robotstxt.org/db.html.

Note

The more than 300 robots in the official database are the tip of the iceberg. There are at least 200,000 robots and crawlers “in the wild.” Some of these software programs have malicious intent; all of them eat up valuable web bandwidth. For more information about wild (and rogue) robots, visit Bots vs. Browsers.

The possible values of the content attribute are shown in Table 4-1. You can use multiple attribute values, separated by commas, but you should not use contradictory attribute values together (such as content="follow, nofollow").

Table 4-1. Content attribute values and their meanings

Attribute value

Meaning

follow

Bot can follow links on the page

index

Bot can index the page

noarchive

Only works with the Googlebot; tells the Googlebot not to cache the page

nofollow

Bot should not follow links on the page

noindex

Bot should not index the page

For example, you can block Google from indexing a page, following links on a page, and caching the page using this meta tag:

<meta name="googlebot" content="noindex, nofollow, noarchive">

More generally, the following tag tells legitimate bots (including the Googlebot) not to index a page or follow any of the links on the page:

<meta name="robots" content="noindex, nofollow">

For more information about Google’s page-specific tags that exclude bots, and about the Googlebot in general, see http://www.google.com/bot.html.

Meta Information

Meta information, sometimes called meta tags for short, is a mechanism you can use to provide information about a web page.

Note

The term derives from the Greek word meta, which means “behind” or “hidden.” Meta refers to the aspect of something that is not immediately visible, perhaps because it is in the background, but that is there nonetheless and has an impact.

The most common meta tags provide a description and keywords for telling a search engine what your website and pages are all about. Each meta tag begins with a name attribute that says what the meta tag represents. The meta tag:

<meta name="description" ...></meta>

means that this tag will provide descriptive information. The meta tag:

<meta name="keywords" ...></meta>

means that the tag will provide keywords.

The description and keywords go within a content attribute in the meta tag. For example, here’s a meta description tag (often simply called the meta description):

<meta name="description" content="Quality information, articles about
a variety of topics ranging from Photoshop,
programming to business, and investing."></meta>

Keywords are provided in a comma-delimited list. For example:

<meta name="keywords" content="Photoshop, Wi-Fi,
wireless networking, programming, C#, business, investing, writing,
digital photography, eBay, pregnancy, information"></meta>

It’s easy for anyone to put any meta tag keywords and description they’d like in a page’s HTML code. This has lead to abuse when the meta tag information does not really reflect page content. Therefore, meta tag keyword and description information is deprecated by search engine indexing software and not as heavily relied on by search engines as it used to be. But it is still worth getting your meta tag keywords and descriptions right, as there is little or no cost to it.

In Chapter 2, I explained how to create a short (one- or two-sentence) elevator pitch for your website. The meta description is a perfect use for this elevator pitch. Be aware that your meta description may be what searchers see displayed for your site, particularly if your site doesn’t have much text on the page.

Note

Google will try to pick up page descriptions from text toward the beginning of a page, but if this is not available—for example, because the page consists of graphics only—it will look at the information in the content attribute of a meta description. Providing a good meta description from the viewpoint of human browsers is therefore important.

For example, the home page of Digital Photography: Digital Field Guide (shown in Figure 4-7) doesn’t have much text, but it does have a lot of images.

Meta description information is particularly important when your website or page doesn’t have much text (like this home page)
Figure 4-7. Meta description information is particularly important when your website or page doesn’t have much text (like this home page)

Meta keywords should be limited to a dozen or so terms. Don’t load up the proverbial kitchen sink. Think hard about the keywords that you’d like to lead to your site when visitors search (see Words and Keyword Density).

For the keywords that are really significant to your site, you should include both single and plural forms, as well as any variants. A site about photography might want to include both “photograph” and “photography” as meta tags.

Here’s the meta tag information included in the HTML source code for the home page of the Digital Photography: Digital Field Guide site:

<meta name="description" content="Showcasing digital photographs by Harold Davis:
photomacrography, flowers, landscapes and relevant Photoshop techniques"/>

<meta name="keywords" content="Digital, photography, photographs, photograms,
scans, night, field, guide, camera, tripod, filter, photo, processing, Harold
Davis, photomacrography, macro, landscapes, roadtrip, adventure, San Francisco,
Bayscapes, flowers, Yosemite, Wave, Photoshop"/>

Since Google’s software can’t find what this page is about except by reading the meta description, because the page is almost all images with no text, the meta description is what shows up when the site is part of Google’s search results (see Figure 4-8). The moral: if there aren’t very many words on your page, pick your meta description and keywords with special care.

In this example, Google simply took the meta description verbatim since it couldn’t find a description on the page itself
Figure 4-8. In this example, Google simply took the meta description verbatim since it couldn’t find a description on the page itself

Design for SEO

Designing a site to be SEO-friendly encompasses a wide range of topics that include the general structure of the site and its hierarchy, principles and content ratio of individual pages, and specifics of using HTML tags.

Keyword density means the ratio of keywords that you are trying to target for SEO purposes to the other text on your pages. Getting keyword density right—enough so that your SEO goals are achieved, not so much that the search engines are “offended”—is an important goal of core SEO practice. Search engines do look for keywords, but they take away points for excessive and inappropriate keyword “stuffing.”

Even from the point of view of your site visitors, you want a nice density of keywords in your pages—but you don’t want so many keywords that the content of your pages is diminished from the viewpoint of visitors.

It is certainly true that which keywords you use is more important than how many of them there are. Your main focus should be on finding the right keywords, not counting them.

Site Design Principles

The following are some design and information architecture guidelines you should apply to your site to optimize it for search engines.

Text is best

For most sites, the fancy graphics do not matter. If you are looking for search engine placement, it is the words that count. Always use text instead of—or in addition to—images to display important names, content, and links. Where it’s reasonable, add a title attribute to anchor tags to provide the bot with more information about the destination and purpose of hyperlinks.

Note

Make sure you provide accurate alt attribute text for any content-related images that are on your pages (such as logo and photos, but not purely formatting graphics such as stripes, etc.).

Pages within your site should be structured with a clear hierarchy. Several alternative site navigation mechanisms should be supplied, including at least one that is text-only. The major parts of your site should be easy to access using a site map. If your site map has more than 100 links, you should divide the site map into separate pages (as a matter of usability, not in relation to Google’s analysis).

Words and Keyword Density

By now, you probably understand that the most important thing you can do on the SEO front involves the words on your pages.

There are three issues you need to consider when placing keywords on a page:

  • How many words should be on a page?

  • Which words belong on what page?

  • Where should these be placed on the page?

Page size

Ideally, pages should be between 100 and 250 words. Shorter than 100 words, and Google and other search engines may tend to discount the page as unsubstantial. Personally, as a photographer I tend to resent this anti-image bias. But from an SEO standpoint, you should know the facts of life as they are. The Web started as a primarily text-based medium, and the underlying technology still tends to prefer words.

You do want to include as many keywords as you can without destroying the value and integrity of the site. Besides decreasing the value to humans, you don’t want the bot to think you have created a spam site. There’s a balance here. With fewer than 100 words, any significant inclusion of keywords is going to look like keyword stuffing—and get “points” taken off your pages.

Pages that are longer than 250 words are not terrible, but do tend to diminish traffic—both actual and measured as a per page statistic. From the viewpoint of advertising, lengthy pages waste content; 250 words is about as many as will fit on a single monitor screen, so your visitors will have to scroll down to finish reading the rest of the page if you publish longer pages. You might as well provide navigation to additional pages for the content beyond the 250 words—and gain the benefit of having extra pages to host advertising.

Note

Some sites prefer to run longer pages and rotate ads with the page still loaded. This should be considered a good option for any page with an average page view of a minute or longer.

The bottom line is that it’s best to create pages that have between 100 and 250 words. These words should include some keywords that are desirable, but don’t overdo it. If the sentences on the page appear unnatural because they are full of keywords, they have probably been “keyword stuffed”—which is counterproductive.

Choosing keywords

Beyond the mechanics of crafting sites and pages that are search-engine friendly lies another issue: what search queries does your site answer? You need to understand this to find the keywords to emphasize in your site construction—a very important part of Search Engine Optimization.

Keywords used in the body of a page can duplicate the keywords used in meta tags. However, it’s important to understand that the keywords within a page are far more important than the meta tag information.

Note

Keywords are emphasized by their placement within a page. For example, important keywords should go in a page’s HTML <title> and in <h1> headers. It’s best to craft titles and tags that reasonable and logically use the target keywords, but this can’t always be done, and you can always add a few keywords to the phrase used for the title attribute within tags such as the <h1> tag.

There’s no magic bullet for coming up with the right keywords to place in a page. A good starting place is the elevator pitch and related keywords, as explained in Chapter 2. You can also take a look at your competition to see if their optimization makes sense in terms of titles, keywords, and so on.

It’s fundamental to your success to vary keywords used in a page depending on the page content, rather than trying to stuff a one-size-fits-all approach across all the pages on your site. In fact, Google will definitely take points off if it finds that all the pages on your site emphasize the same keywords.

If the answer is X, for example, what is the question? This is the right way to consider keyword choice. X is your website or web page. What did someone type into Google to get there?

The Top Search Queries page on the Statistics tab of Google Webmaster Tools will, to some extent, answer this question for you. This page will tell you the top 20 search queries that returned your site. Bear in mind that you may need to know about more than the top 20 searches (this is the province of web logs and web analytics programs, as explained in Chapter 13). In addition, the Google Webmasters information is at least a week old, which may not be fresh enough for quickly moving sites.

As you come up with keywords and phrases, try them out. Search Google based on the keywords and phrases. Ask yourself if the results returned by Google are where you would like to see your site. If not, tweak, modify, wait for Google to re-index your site (this won’t take too long once you’ve been initially indexed), and try your search again.

Ultimately, the best way to measure success is relative. It’s easy to see how changes impact your search results ranking—just keep searching (as often as once a day) for a standard set of half a dozen keywords or phrases that you’ve decided to target. If you are moving up in the search rankings, then you are doing the right thing. If your ranking doesn’t improve, then reverse the changes. If you get search results to where you want them (usually within the top 30 or even top 10 results returned), then start using these results to optimize additional pages.

You should also realize that the success that is possible for a given keyword search depends upon the keyword. It’s highly unlikely that you will be able to position a site into the top results for, say, “Google” or “Microsoft”—but trivial to get to the top for keywords phrases with no natural search results (such as “nigritude ultramarine” or “loquine glupe,” two nonsense phrases that became the fodder for SEO contests).

The trade-off here is that it is a great deal harder to place at the top of natural search listings with keywords that are valuable—so you need to find a sweet spot: keywords where you stand a chance, but that also will drive significant site-related traffic.

Note

Since feedback is ultimately determined by financial incentive, an interesting approach to keyword selection is to see what words cost the most to advertisers. If you are registered with Google AdWords, you can use the AdWords tools as explained in Part III to do just that—and get valid cost estimates for keywords and phrases.

Keyword placement

The text on your web page should include the most important keywords you have developed in as unforced a way as possible. Try to string keywords together to make coherent sentences.

Not all text on a page is equal in importance. First of all, order does count: keywords higher up in a given page get more recognition from search engines than the same keywords further down on a page.

Roughly speaking, besides the body of the page itself and in meta information, you should try to place your keywords in the following elements—presented roughly in order of descending importance:

Title

Putting relevant keywords in the HTML <title> tag for your page is probably the single most important thing you can do in terms of SEO.

Headers

Keyword placement within HTML header styles, particularly <h1> headers toward the top of a page, is extremely important.

Links

Use your keywords as much as possible in the text that is enclosed by <a href="">...</a> hyperlink tags on your site in outbound and cross-bound links. Ask webmasters who provide inbound linking to your site to use your keywords whenever possible.

Images

Include your keywords in the alt attribute of your HTML <img> tags.

Text in bold

If there is any reasonable excuse for doing so, include your keywords within HTML bold (<b>...</b>) tags.

The saying, “Everything in moderation, even moderation” is a good principle to keep in mind when you tweak your website to achieve SEO. The moderation slogan has been aptly applied to many human activities, from the sexual to the gustatory and beyond. It fits very well with SEO.

For example, you want a nice density of keywords in your pages, but you don’t want so many keywords that the content of your pages is diminished from the viewpoint of visitors. Search engines look for keywords, but they take away points for excessive and inappropriate keyword “stuffing.”

Try to see the world from a search engine bot’s viewpoint (that’s the point of using a text-only browser as I explained in More About How Your Site Appears to a Bot). Create sites that appeal when looked at this way, but go easy. Don’t overdo it!

Site Design Principles

The following are some design and information architecture guidelines you should apply to your site to optimize it for search engines.

Eschew fancy graphics

Fancy graphics do not matter for most sites. When it comes to search engine placement, it is the words that count.

Use text wherever possible

Use text rather than images to display important names, content, and links.

Always provide alt attributes for images

Make sure you provide accurate alt attribute text for any images that are on your pages.

Navigability

Pages within your site should be structured with a clear hierarchy. Several alternative site-navigation mechanisms should be supplied, including at least one that is text-only.

Provide text links

Every page in your site should be accessible using a static text link.

Make a site map available to your users

The major parts of your site should be easy to access using a site map (Figure 4-9 shows an example of a useful site map). If your site map has more than 100 links, you should divide the site map into separate pages.

Note

Your human site map should not be confused with a site map prepared for search engines, as explained in Chapter 3.

Using PageRank

The PageRank algorithm is used in part by Google to order the results returned by specific search queries. As such, understanding PageRank is crucial to core SEO efforts to improve natural search results.

Depending on who you ask, PageRank is named after its inventor, Lawrence Page, Google’s cofounder—or because it is a mechanism for ranking pages.

When a user enters a query, also called a search, into Google, the result order of the returns is partially determined by the relative PageRank of the results.

A site map makes it easy for visitors to find what they need on your site and also helps optimize your site for search engines
Figure 4-9. A site map makes it easy for visitors to find what they need on your site and also helps optimize your site for search engines

Originally fairly simple in concept, PageRank now reportedly processes more than 100 variables. Since the exact nature of this “secret sauce” is, well, secret, the best thing you can do from an SEO perspective is more or less stick to the original concept.

The underlying idea behind PageRank is an old one that has been used by librarians in the pre-Web past to provide an objective method of scoring the relative importance of scholarly documents. The more citations other documents make to a particular document, the more “important” the document is, the higher its rank in the system, and the more likely it is to be retrieved first.

Let me break it down for you.

Each web page is assigned a number depending upon the number of other pages that link to the page.

The crucial element that makes PageRank work is the nature of the Web itself, which depends almost solely on the use of hyperlinking between pages and sites. In the system that makes Google’s PageRank algorithm work, links are a Web popularity contest: Webmaster A thinks Webmaster B’s site has good information (or is cool, or looks good, or is funny), so Webmaster A may decide to add a link to Webmaster B’s site. In turn, Webmaster B might return the favor.

Links from Website A to Website B are called outbound (from A) and inbound links (to B).

The more inbound links a page has (references from other sites), the more likely it is to have a higher PageRank. However, not all inbound links are of equal weight when it comes to how they contribute to PageRank—nor should they be. A web page gets a higher PageRank if another significant source (by “significant” I mean a source that also receives a lot of inbound links, and thus has a higher PageRank) links to it than if a trivial site without traffic provides the inbound link.

Note

PageRank is essentially a recursive algorithm, meaning a process that invokes itself. A given page’s PageRank is the sum of the PageRanks of the pages that link to it (weighted by the total number of links, of course). In this scheme, a link from a high PageRank page clearly counts for more than a link from a low-ranking page. See the sidebar Understanding PageRank.

The more sophisticated version of the PageRank algorithm currently used by Google involves more than simply crunching the number of links to a page and the PageRank of each page that provides an inbound link. While Google’s exact method of calculating PageRank is shrouded in proprietary mystery, PageRank does try to exclude links from so-called link farms, pages that contain only links, and mutual linking (which are individual two-way links put up for the sole purpose of boosting PageRanks).

Note

The easiest way to see the comparative PageRank for your web pages is to install the Google Toolbar. With a web page open, the PageRank is shown in the Toolbar on a scale of 0 to 10. These PageRanks are really between 0 and 1, so although the 0 to 10 scale is useful for comparison purposes, it does not represent an actual PageRank number.

Note that you may have to specifically turn on the feature that displays PageRank in the Google Toolbar; in some installations this feature is not enabled by default.

From the viewpoint of SEO, it’s easy to understand some of the implications of PageRank. If you want your site to have a high PageRank, then you need to get as many high-ranked sites as possible to link to you.

However, useful outbound links draw traffic to the linking site and encourage other sites to return the favor because they respect the quality of the links the original site provides. So for SEO, there’s a delicate balancing act with outbound linking—some quality outbound links add merit to a site, but too many outbound links decrease desirability. Trial and error is probably the only way to get this one right.

Linking

The links on your site constitute a very important part of how Google and other search engines will rank your pages.

Links can be categorized into inbound links, outbound links, and cross links (see Figure 4-10):

Inbound links

These links point to a page on your website from an external site somewhere else on the Web.

Outbound links

These links point from a page on your site to an external site somewhere else on the Web.

Cross links

These links point between the pages on your site.

It’s important to understand the distinctions among the three categories of links
Figure 4-10. It’s important to understand the distinctions among the three categories of links
An automated link checker goes through the hyperlinks on a page (or site) one by one
Figure 4-11. An automated link checker goes through the hyperlinks on a page (or site) one by one

You want as many inbound links as possible, provided these links are not from link farms or link exchanges. With this caveat about inbound linking from “naughty neighborhoods” understood, you cannot have too many inbound links. The more popular, and the higher the ranking of, the sites providing the inbound links to your site, the better.

Note

For information about the best approaches for generating inbound links, see Chapter 3.

The best—meaning most likely to drive traffic—inbound links come from:

  • Sites that publish content that is complementary and related to the content on your site

  • Hub sites that are a central repository, discussion area, or community site for a particular interest group

Outbound Links

The “everything in moderation” slogan is really apt when it comes to outbound links. You could also say that the “outbound link giveth and the outbound link taketh.” Here’s why: you want some respectable outbound links to establish the credibility of your site and pages and to provide a useful service for visitors. After all, part of the point of the Web is that it is a mechanism for linking information, and it is truly useless to pretend that all good information is on your site. So on-topic outbound links are themselves valuable content.

However, every time your site provides an outbound link, there is a probability that visitors to your site will use it to surf off your site. As a matter of statistics, this probability diminishes the popularity of your site, and Google will subtract points from your ranking if you have too many outbound links. In particular, pages that are essentially lists of outbound links are penalized.

If you follow the words-per-page guideline I made in Words and Keyword Density—between 100 and 250 words per page—you’ll get the best results if you try to provide at least 2 or 3 outbound links on every page and, in any case, no more than 10 or 15 per page.

Cross Links

Cross links—links within your site—are important to visitors as a way to find useful, related content. For example, if you have a page explaining the concept of class inheritance in an object-oriented programming language, a cross link to an explanation of the related concept of the class interface might help some visitors. From a navigability viewpoint, the idea is that it should be easy to move through all information that is topically related.

From an SEO perspective, your site should provide as many cross links as possible (without stretching the relevance of the links to the breaking point). There’s no downside to providing reasonable cross links, and several reasons for providing them. For example, effective cross linking keeps visitors on your site longer (as opposed to heading offsite because they can’t find what they need on your site).

Avoiding Overly Aggressive SEO

Google, like other major search engines, urges you to avoid overly aggressive SEO practices when you build your site.

Here’s why you should avoid being overly aggressive with SEO (besides wanting to avoid Google’s disapproval): building sites that get highly ranked is simply a matter of common sense; just build a site that will be useful or interesting to people, and it will naturally get indexed correctly, although this may take some time.

With this viewpoint, you shouldn’t concern yourself with search order ranking or SEO when you construct your site. Just create worthwhile content that is genuinely useful, interesting, or entertaining. However, at the same time you needn’t be naïve. It makes sense to deploy sites and pages in the most SEO-compliant way possible that doesn’t cross the line into deceptive behavior—or one of the constructions frowned upon by Google.

Note

SEO experts tend to disagree with this “build it with quality and they will come” theory of site creation. They point to the incredible competition for premium SEO results, and the money that is at stake, and suggest planning in advance for effective SEO with a knowledgeable expert.

Google’s Prohibitions

Following is a list of the techniques that Google considers bad behavior. Google prohibits these things because it considers them overaggressive and deceptive, but note that Google does not consider this list exhaustive and may penalize your site for anything new that you come up with if it is considered deceptive to either humans or the Googlebot, assuming it is discovered.

According to Google, good search engine citizen websites do not:

Employ hidden text or links

For example, users cannot read white text on a white background (and will never even know it is there), but this text will be parsed by the search engine. This rule comes down to making sure that the search engine sees the same thing that users view.

Cloak pages

Also called stealth, this is a technique that involves serving different pages to the search engine than to the user.

Use redirects in a deceptive way

It’s easy to redirect the user’s browser to another page. If this is done for deceptive purposes—for example, to make users think they are on a page associated with a well-known brand when in fact they are on a web spammer’s page—it’s frowned upon.

Attempt to improve your PageRank with dubious schemes

Linking to web spammers or bad neighborhoods on the Web may actually hurt your own PageRank (or search ranking), even if doing so provides inbound links to your site. (For information about how to legitimately encourage inbound site linking, and therefore improve your PageRank, see Chapter 3.)

Note

Bad neighborhoods are primarily link farms or link exchanges—sites that exist solely for the purpose of boosting a site’s inbound links without other content. Web spammers are sites that disguise themselves with pseudo-descriptions and fake keywords—the descriptions and keywords do not truly represent what the site contains.

Bombard Google with automated queries

This wastes Google’s bandwidth, so it doesn’t like it.

Practice keyword loading

This is the practice, beloved by some purported SEO “experts,” of adding irrelevant words to pages. (The page can then be served as the search result based on a query for the irrelevant words that actually don’t have anything to do with the page content.)

Create multiple similar pages

Google frowns on the creation of pages, domains, and subdomains that duplicate content.

Use cloaking or redirection

These techniques send a user to a page that has nothing to do with the one in the search engine results. (A variety of techniques may be used to substitute one page for another—either by redirection or actual substitution of pages on the web server—when the first page is optimized for specific keyword searches and the page to which the user is actually sent has little or nothing to do with that search.)

Create pages that lack content

Google frowns on pages that lack original content, such as a page that exists simply to present affiliate links.

Create domains with the intention of confusing users

It’s likely you’ve landed on a site with a domain name that’s confusing because it shares a name with a different domain suffix (for example, http://www.php.org, which combines a redirection with the deception, rather than the legitimate PHP language site, http://www.php.net) or because of a slight spelling variation (http://www.yahho.com rather than http://www.yahoo.com).

Publish advertising that is not clearly denoted as such

Paid advertisements and links are not in and of themselves evil, but graphically it should be clear to viewers what they are looking at. In addition, the Googlebot should be warned that it is looking at paid advertising through the use of the nofollow attribute in links within the ad.

Note

Paid advertising links should be marked with the nofollow attribute, but not all links marked in this way are advertisements.

The following code snippet contains a way to make the ad distinctive for humans (the adalt class formatting) and a nofollow attribute to make the paid nature of the link clear to bots as well:

<p class="ad adalt"><small>Please buy widgets from <a rel="nofollow"
href="http://www.advertiser.com/">this advertiser</a>. Thanks!</small></p>

Google frowns on deceptive domain naming if the domain name was selected for the purpose of taking advantage of the confusion.

As Google puts it, spending your energy creating a good user experience will let you “enjoy better ranking than those who spend their time looking for loopholes they can exploit.”

Warning

If you are a webmaster, you’ve likely been approached to pay for SEO services. While there are many reputable and effective SEO consultants, there are also a fair number of scam artists. Remember: if something seems too good to be true, it likely is.

Why Not to Be Overly Aggressive

If you draw Google’s attention for practicing dirty tricks, you can get expelled from Google’s index altogether. Worse, there’s effectively no way to appeal a Google decision to expel a site from its index. Nor is there a set of procedural safeguards for webmasters who feel they have been wrongfully accused of deceitful SEO practices. It’s therefore safest to avoid the wrath of Google by avoiding anything that even smacks of deceit.

Note

You can appeal for reconsideration from Google at the form found at https://www.google.com/webmasters/tools/reconsideration, but there’s no guarantee that Google will respond to your request, or when it will do so.

Most dirty SEO tricks are also simply bad web design. If you put sites together using bad practices that are intended solely to optimize your sites, most often you’ll just irritate visitors—and get less traffic.

Google and other major search engines urge you to avoid overly aggressive SEO practices when you build your site. Google has actually taken the trouble to spell out SEO practices it regards as naughty (the list can be found at http://www.google.com/support/webmasters/bin/answer.py?answer=35769). You should pay attention to this list as if it speaks the mind of every major search engine, not just Google. Google’s position that building sites that get highly ranked is simply a matter of providing useful content isn’t totally off-the-wall, although it assumes a world where everything always works.

Action Items

Here are some action items for you to optimize your website and pages for SEO:

  • Understand what SEO can (and can’t) do

  • Get to know the relevant Google Webmaster Tools

  • Learn about available SEO tools

  • View your site in text-only mode

  • Determine if you need to exclude search engine bots from portions of your site (or, if you already do exclude bots partially or completely, review the exclusion and change it as needed so your site can be indexed)

  • Tweak the meta information for the site’s major content areas or for individual pages

  • Create a text-heavy and easily navigable site

  • Check for, and fix, any broken links

  • Understand how PageRank works and the implications for your site

  • Work to add appropriate inbound, outbound, and cross links

  • Choose keywords that make sense for your content and the traffic you are seeking, and add them to the important elements of your page content

  • Endeavor to reach a keyword density balance

  • Make sure your site avoids overly aggressive SEO practices

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required