Remove whitespace, hidden characters, and other unnecessary tags from your code using simple regular expression searches, or a full-fledged code optimization utility.
Even though high-speed Internet access has a firm foothold in U.S. homes and offices, everyone still likes a fast-loading web page. Unnecessarily large files also consume disk space and bandwidth resources on your web server, which can cost you if your web site starts to exceed the limits of your account quotas.
Tip
Web site file optimization usually calls to mind the compression and color management techniques used to strike an acceptable balance between fidelity to a high-resolution original image and the smallest acceptable file size for the web version. See Recipe 5.1.
There's some slack in your HTML code, too, and the good news is that getting rid of some or all of it won't affect how the page looks in a browser. Depending on the coding techniques you used in creating the original file, and the extent of the optimization techniques you use, the size of an optimized web page can be 5 to 25 percent less than the original. The one downside: Fully optimized HTML code is noticeably not user-friendly to the hand-coder, since all line feeds, extraneous spaces and tabs, and even comments are stripped away. Scanning over a dense, unformatted block of HTML code looking for the place to make a change can be maddening.
To make a modest impact on the file sizes of your web pages, you
can use regular expressions in the find-and-replace dialog of your web
page editor to remove extra spaces between tags, after tag attributes
or punctuation, or at the beginnings of lines. Using an HTML editor
capable of performing regular expressions, or grep
, searches (such as BBEdit, HomeSite, or
Dreamweaver) , type >\s+<
into the search field and ><
into the replace field to push all your tags up close together. Using
just this technique on what I thought was a well-coded page of my own
reduced its file size more than 5 percent. For a full list of special
characters and wildcards that can be used in a regular expression
search, see the tutorial site in the "See Also" section of this
Recipe.
You can also use Perl to execute regular expression searches directly on a batch of files on your web server, or combine a bunch of Perl find and replace commands in a shell script:
perl -pi -e 's/>\s+</></g' /full/path/to/your/files/*.html
Tip
The asteriskin this command tells Perl to perform the search on all files in the directory you specify that end in .html.
Combine more than one Perl command, each on its own line, and save them in a file on your server called optimize_files.sh:
#/bin/bash perl -pi -e 's/>\s+</></g' /full/path/to/your/files/*.html perl -pi -e 's/.\s\s/.\s/g' /full/path/to/your/files/*.html perl -pi -e 's/\t+/\t/g' /full/path/to/your/files/*.html perl -pi -e 's/\r+/\r/g' /full/path/to/your/files/*.html
Then run the script from the command-line prompt to your web server:
sh optimize_files.sh
Tip
The first line of your shell script, as well as the command to
execute it, varies depending on the default shell for the account on
the machine on which you plan to run the script. To find out the
shell your account uses, type env
at the command prompt.
To squeeze every last byte out of your HTML files, there are numerous PC applications and online tools that will cut the fat out of your web pages. I tried one of each on the original file mentioned above and got an overall file size reduction of about 12 percent with each of them. But in both cases, the code bore only a scant resemblance to its former self (see Figures 4-6 and 4-7).
Both procedures approached file optimization more or less the same way: remove everything that's not absolutely necessary. The online tool (links are in the "See Also" section of this Recipe) offers no way to tweak its routine. Just enter the URL of the page in the form, and it returns the optimized code. The PC application (also mentioned in "See Also" section of this Recipe) will optimize one file, a batch of files, or an entire site, and offers a long list of settings that let the user dictate what stays and what goes.
Heavy-duty optimization complements the model of web sites as software. By that, I mean you as the designer work on a version of the site with easy-to-read formatting and comments, and then deliver an optimized version to your customers, which in this case are your site's visitors.
Tip
Makers of proprietary software often call file optimization compiling the code. They do it not just to optimize the software's performance, but to prevent end users from reverse engineering their products. Despite the many source code protection and encryption tricks available to web designers, there's not much you can do to protect your HTML code once it's published on the Web.
Web page optimization is all about speed and visitor satisfaction. After all, the comments and neatly aligned tags are for your benefit, not the web surfer's. If you want to go as far as you can with optimization and keep a version that's easy to edit, you could maintain two versions of your site—an offline version that's easy to edit by hand and an optimized "live" version that is uploaded to the web server. (Software mentioned in the "See Also" section of this Recipe can help you set this up.)
Figure 4-6. My original, pre-optimization file; maybe a few too many line feeds and tab indents, but easy to read for a hand-coder
The amount of optimization you'll want to do depends on your work habits and web site needs. If you prefer to edit HTML code by hand (and you have to do it frequently), then you'll probably want to pick and choose how and what to optimize. For example, you might want to get rid of unneeded spaces, tabs, and tag attributes that specify a default setting, but leave your comments and line feeds so the files remain more manageable. Or, you could get the best of both worlds by structuring your pages as optimized shell files, while leaving the content you edit most often in a more-readable, lightly optimized include file.You can optimize to the fullest extent possible if you don't edit the pages very often or you do most of your site editing in the WYSIWYG or design view of your web page editor, rather than in code view.
For a good tutorial on using regular expressions see http://www.anybrowser.org/bbedit/grep.shtml. The two heavy-duty optimization tools I used for this Recipe are HTML Code Cleaner (online form at http://www.yook.de/webmaster/clean/) and HTML-Optimizer Pro (download from http://www.tonbrand.nl/products.htm). Port80 Software also offers a full-featured web page optimization application called w3compiler (http://www.w3compiler.com/).
Get Web Site Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.