More on HTML and URL Escapes
Perhaps the most subtle change in the last section’s rewrite is
that, for robustness, this version also calls
cgi.escape
for the language
name, not just for the language’s code
snippet. It’s unlikely but not impossible that someone could
pass the script a language name with an embedded HTML character. For
example, a URL
like:
http://starship.python.net/~lutz/Basics/languages2reply.cgi?language=a<b
embeds a <
in the language name parameter (the
name is a<b
). When submitted, this version uses
cgi.escape
to properly translate the
<
for use in the reply HTML, according to the
standard HTML escape conventions discussed earlier:
<TITLE>Languages</TITLE> <H1>Syntax</H1><HR> <H3>a<b</H3><P><PRE> Sorry--I don't know that language </PRE></P><BR> <HR>
The original version doesn’t escape the language name, such
that the embedded <b
is interpreted as an HTML
tag (which may make the rest of the page render in bold font!). As
you can probably tell by now, text escapes are pervasive in CGI
scripting -- even text that you may think is safe must generally
be escaped before being inserted into the HTML code in the reply
stream.
URL Escape Code Conventions
Notice, though, that while it’s
wrong to embed an unescaped <
in the HTML code
reply, it’s perfectly okay to include it literally in the
earlier URL string used to trigger the reply. In fact, HTML and URLs
define completely different characters as special. For instance,
although &
must be escaped as
&
inside HTML ...
Get Programming Python, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.