More Searchable Content and Content Types

The emphasis throughout this book has been on providing the crawlers with textual content semantically marked up using HTML. However, the less accessible document types—such as multimedia, content behind forms, and scanned historical documents—are being integrated into the search engine results pages (SERPs) more and more, as search algorithms evolve in the ways that the data is collected, parsed, and interpreted. Greater demand, availability, and usage also fuel the trend.

Engines Will Make Crawling Improvements

The search engines are breaking down some of the traditional limitations on crawling. Content types that search engines could not previously crawl or interpret are being addressed. For example, in mid-2008 reports began to surface that Google was finding links within JavaScript (http://www.seomoz.org/ugc/new-reality-google-follows-links-in-javascript-4930). Certainly, there is the possibility that the search engines could begin to execute JavaScript to find the content which may be embedded within it.

In June 2008, Google announced that it was crawling and indexing Flash content (http://googlewebmastercentral.blogspot.com/2008/06/improved-flash-indexing.html). In particular, this announcement indicated that Google was finding text and links within the content. However, there were still major limitations in Google’s ability to deal with Flash-based content. For example, it applied only to Flash implementations that do not rely on external ...

Get The Art of SEO now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.