More on HTML and URL Escapes

Perhaps the most subtle change in the last section’s rewrite is that, for robustness, this version also calls cgi.escape for the language name, not just for the language’s code snippet. It’s unlikely but not impossible that someone could pass the script a language name with an embedded HTML character. For example, a URL like:

http://starship.python.net/~lutz/Basics/languages2reply.cgi?language=a<b

embeds a < in the language name parameter (the name is a<b). When submitted, this version uses cgi.escape to properly translate the < for use in the reply HTML, according to the standard HTML escape conventions discussed earlier:

<TITLE>Languages</TITLE>
<H1>Syntax</H1><HR>

<H3>a&lt;b</H3><P><PRE>
Sorry--I don't know that language
</PRE></P><BR>
<HR>

The original version doesn’t escape the language name, such that the embedded <b is interpreted as an HTML tag (which may make the rest of the page render in bold font!). As you can probably tell by now, text escapes are pervasive in CGI scripting -- even text that you may think is safe must generally be escaped before being inserted into the HTML code in the reply stream.

URL Escape Code Conventions

Notice, though, that while it’s wrong to embed an unescaped < in the HTML code reply, it’s perfectly okay to include it literally in the earlier URL string used to trigger the reply. In fact, HTML and URLs define completely different characters as special. For instance, although & must be escaped as &amp inside HTML ...

Get Programming Python, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.