Running Caja from a Web Application

We’ve seen how to take a mixed HTML and JavaScript document and cajole it into two files made up of the sanitized markup and cajoled JavaScript of the original code. Taking that knowledge as our base, we’ll now explore how to cajole content from a web source.

The SVN source that we obtained for Caja includes a sanitization JavaScript file that will allow us to run a cajoling function against some provided web content. The file is located at src/com/google/caja/plugin/html-sanitizer.js within the caja directory.

The other file we will need is a whitelist of all of the available HTML tags, which the sanitizer will use to determine which tags should be left alone, which should be sanitized, and which should be removed completely. A sample file (html4-defs.js) with this type of structure is available at https://github.com/jcleblanc/programming-social-applications/tree/master/caja/web_sanitizer_simple/ and provides an aggressive parsing whitelist that we will use in our example.

With these two files in hand, we can begin building out the markup and JavaScript to create a simple parsing mechanism:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
                      "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Simple Web Application Cajoler</title>
</head>
<body>
<script src="html4-defs.js"></script> ...

Get Programming Social Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.