Chapter 4. River of Content

The previous chapter talked about dynamically updating your home page to show the latest updates long after the page had been loaded. The examples used many of the same technologies most web developers have used for years. Although this works well for some things, it has limitations that quickly become clear. If you want to give your users a truly realtime experience in the web browser, you need to push content to them. In this chapter, I’ll show you how to build a simple river of content feed. The most obvious use for this would be for a truly realtime live blog application.

During big events, many blogs will provide a link to a separate page where they will “liveblog” the whole thing. They’ll post quick text updates, “this new product could save the company, too bad it doesn’t support bluetooth.” They’ll also post images as quickly as they can take them. However, the pages serving these “live” blogs tend to be nothing more than a regular web page that automatically refreshes every 30 seconds. Users will often refresh their browser by hand to ensure they’re seeing the latest content. Getting your content to users faster, even if it’s just a couple of seconds, can mean the difference between users staying on your site all day and leaving as soon as they feel they’re getting old news.

Using a liveblog as an example, I’ll show you how to build a river of content that pushes out updates as soon as they are available. This will help keep users from clicking away, save wear and tear on your server, and most importantly, it’s not that hard to build.

A Crash Course in Server Push

There are several forms of server push technology. This idea is not new and has existed in several different forms throughout the years. However, these days when people talk about server push technologies, they tend to refer to a technology called long polling.

Long Polling

Long polling is a method of server push technology that cleverly uses traditional HTTP requests to create and maintain a connection to the server, allowing the server to push data as it becomes available. In a standard HTTP request, when the browser requests data, the server will respond immediately, regardless of whether any new data is available (see Figure 4-1). Using long polling, the browser makes a request to the server and if no data is available, the server keeps the connection open, waiting until new data is available. If the connection breaks, the browser reconnects and keeps waiting. When data does become available, the server responds, closes the connection, and the whole process is repeated (see Figure 4-2).

Technically, there is no difference between this kind of request and standard pull requests. The difference, and advantage, is in the implementation. Without long polling, the client connects and checks for data; if there is none, the client disconnects and sleeps for 10 seconds before reconnecting again. With long polling, when the client connects and there is no data available, it will just hang on until data arrives. So if data arrives five seconds into the request, the client accepts the data and shows it to the user. The normal request wouldn’t see the new data until its timer was up and it reconnected again several seconds later.

This method of serving requests opens up a lot of doors to what is possible in a web application, but it also complicates matters immensely.

For example, on an application where users can send messages to one another, checking for new messages has always been a rather painless affair. When the browser requests new messages for Peter, the server checks and has no messages. The same transaction is made again a few seconds later, and the server has a message for Peter.

Standard HTTP message delivery
Figure 4-1. Standard HTTP message delivery

However, in long polling, when Peter connects that first time, he never disconnects. So when Andrew sends him a new message, that message must be routed to Peter’s existing connection. Where previously the message would be stored in a database and retrieved on Peter’s next connection, now it must be routed immediately.

This routing and delivery of these messages to clients that are already connected is a very complicated problem. Thankfully, it’s already been solved by a number of groups. Amongst others, the Dojo Foundation (http://www.dojofoundation.org) has developed a solution in the form of the Cometd server and the Bayeux protocol.

Cometd HTTP message delivery
Figure 4-2. Cometd HTTP message delivery

The Bayeux Protocol

At its heart, Bayeux is a messaging protocol. Messages (or events, as they’re sometimes called) can be sent from the server to the client (and vice versa) as well as from one client to another after a trip through the server. It’s a complicated protocol solving a complicated problem, but for both the scope of this book and most use cases, the details are not that important.

Aside from the handshakes and housekeeping involved, the protocol describes a system that actually is quite simple for day-to-day uses. A client subscribes to a channel by name, which tends to be something like /foo, /foo/bar, or /chat. Channel globbing is also supported, so a user can subscribe to /foo/**, which would include channels such as /foo, /foo/bar, and /foo/zab.

Messages are then sent to the different named channels. For example, a server will send a message to /foo/bar and any client that has subscribed to that channel will receive the message. Clients can also send messages to specific channels and, assuming the server passes them along, these messages will be published to any other clients subscribed to that channel.

Channel names that start with /meta are reserved for protocol use. These channels allow the client and server to handle tasks such as figuring out which client is which and protocol actions such as connecting and disconnecting.

One of the fields often sent along with these meta requests is an advice statement. This is a message from the server to the client about how the client should act. This allows the server to tell clients at which interval they should reconnect to the server after disconnecting. It can also tell them which operation to perform when reconnecting. The server commonly tells the client to retry the same connection after the standard server timeout, but it may request that the client retries the handshake process all together, or the server may tell the client not to reconnect at all.

"advice": {
    "reconnect": "retry",
    "interval": 0, 
    "timeout": 120000
}

The protocol specifies a number of other interesting things that are outside the scope of this book. I encourage you to find out more about the protocol and how to leverage it for more advanced applications, but you don’t actually need to worry about how it works underneath the hood during your day-to-day coding. Most client and server libraries, including the ones listed in this text, handle the vast majority of these details.

Cometd

The Dojo Foundation started this project in order to provide implementations of the Bayeux protocol in several different languages. At the time of this writing, only the JavaScript and Java implementations are designated as stable. There are also implementations in Python, Perl, and several other languages that are in varying stages of beta.

The Java version includes both a client and a server in the form of the org.cometd package. This package has already been bundled with the Jetty web server and, no doubt, other Java servers will implement this as well.

Setting Up Your Cometd Environment

The Java implementation of Cometd comes as a completely self-contained web server and associated bundled libraries. The web serving is done by an embedded version of Jetty, which is an incredibly scalable Java-based web server. Get the latest version from http://cometdproject.dojotoolkit.org/ and unzip it into any place on your computer. It comes ready to run and, for development purposes, there is nothing to install on your computer.

Tip

There are two different versions you can download, a source-only version and another with the Java code fully compiled. Both downloads provide implementations in all of the supported programming languages.

The compiled version is a much bigger download than the source-only version, but it provides everything you need in an easy bundle that is read to run.

Once Cometd has been downloaded and unzipped, you’ll notice that there are a lot of files in there for the different language implementations. For now, we’re mainly interested in the cometd-java and cometd-demo directories.

Depending on your system, you may need to install some supporting libraries before running the server. Thankfully, installing these libraries is dead simple, and so is handling the build process. To handle all of these project management tasks, we’re going to use a build tool called Maven (http://maven.apache.org/). Maven is a project management tool that handles everything from building an application and creating documentation, to downloading dependencies. To use it, open your terminal window and navigate to your recently unzipped directory. Run the following command, which will take some time and print out a lot of information:

~ cometd $ mvn
[INFO] Scanning for projects...
[INFO] Reactor build order: 
[INFO]   Cometd :: Java
...

With this command, Maven will look at the pom.xml file and do a number of different things. Maven is project management tool for software and the pom.xml file is the most basic unit of work in Maven, sort of like a Makefile. The main reason we’re using it here is its ability to compile our software project. However, in addition to compiling any updated source files, it will run some unit tests, generate documentation, and install any additional Java libraries needed by your system.

Included in the distribution are a couple of examples of what this package can do. To run those, we’re going to use Maven again. This time we’re going to give it specific instructions to run the Jetty server. In the same cometd-demo directory, run the following command:

~ cometd-demo $ mvn jetty:run

This will start up the Jetty server and start serving some example programs. Point your browser to http://127.0.0.1:8080/ and have a look at the examples (see Figure 4-3).

The Cometd Demo page
Figure 4-3. The Cometd Demo page

Putting Everything in Its Place

After looking around at the examples and glancing at the directory structure of Cometd, you’ve probably noticed a couple of things. There are a lot of files in that package, and there is no clear place to put your code. The entire package is geared toward showing off a couple of examples, not building real applications. So the first thing I recommend is to get some of those files out of the way and put everything we need into one self-contained directory. Amongst other things, this will allow us to build the distributable WAR file much easier.

What we’re really after is a cleaned-up version of the cometd-java directory. That’s going to require a couple of very easy tasks. First of all, we need to tell the project to use the Jetty plug-in. By default, this plug-in is included up the chain a little bit, but since we’re getting rid of the extra folders, we need to do it here.

Tip

This may look like a lot of work, and paradigm shifts take a bit of server-side configuration, but it’s just a bit of housekeeping. I’ve taken the liberty of repackaging the files to ease the development process. Feel free to save some time and download that version from http://therealtimebook.com. Once you do that, you can skip ahead to the next section.

First, we’re going to create an directory to house our realtime code. We’ll make two different directory trees following the standard Java servlet conventions. From inside the cometd-java directory, run the following commands to create the directory structure we’ll use:

~ cometd-java $ mkdir -p apps/src/main/java/com/tedroden/realtime
~ cometd-java $ cp examples/pom.xml apps/pom.xml
~ cometd-java $ mkdir -p apps/src/main/webapp/WEB-INF

Tip

The path com/tedroden/realtime will end up as the namespace in the Java source code. It will also be used in various forms throughout the codebase. You’re encouraged to use your own namespace instead of one with my name on it; just be sure to keep it consistent in the code.

The next thing we want to do is create the apps/src/main/webapp/WEB-INF/web.xml file. This is the file that provides both configuration and deployment information for web applications. If you’re familiar with Apache’s httpd software, this file serves a similar purpose to the httpd.conf file. This is fairly straightforward. Create the file listed above and add the following:

<?xml version="1.0" encoding="ISO-8859-1"?>
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://java.sun.com/xml/ns/javaee 
							 http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
         version="2.5">

    <display-name>Realtime User Experience - LiveBlog</display-name>

    <context-param>
        <param-name>org.mortbay.jetty.servlet.ManagedAttributes</param-name>
        <param-value>org.cometd.bayeux</param-value>
    </context-param>

    <servlet>
        <servlet-name>cometd</servlet-name>
        <servlet-class>
		  org.cometd.server.continuation.ContinuationCometdServlet
		</servlet-class>
        <init-param>
            <param-name>timeout</param-name>
            <param-value>120000</param-value>
        </init-param>
        <init-param>
            <param-name>interval</param-name>
            <param-value>0</param-value>
        </init-param>
        <init-param>
            <param-name>maxInterval</param-name>
            <param-value>10000</param-value>
        </init-param>
        <init-param>
            <param-name>multiFrameInterval</param-name>
            <param-value>2000</param-value>
        </init-param>
        <init-param>
            <param-name>logLevel</param-name>
            <param-value>0</param-value>
        </init-param>
        <init-param>
            <param-name>refsThreshold</param-name>
            <param-value>10</param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>

    <filter>
        <filter-name>Continuation</filter-name>
        <filter-class>org.eclipse.jetty.continuation.ContinuationFilter</filter-class>
    </filter>
    
    <filter-mapping>
        <filter-name>Continuation</filter-name>
        <url-pattern>/cometd/*</url-pattern>
    </filter-mapping>

    <servlet-mapping>
        <servlet-name>cometd</servlet-name>
        <url-pattern>/cometd/*</url-pattern>
    </servlet-mapping>

</web-app>

This file describes simple things like the name of the app, along with simple parameters configuring basic variables needed to serve the content. There are tons of options to this file and we’re using only a small subset. Via the servlet-mapping tag, we’ve told the server where to find the cometd code (appropriately enough, at the path /cometd/). The following are some of the more important parameters that are of interest to most cometd applications, including this chat application:

timeout

The length of time (in milliseconds) that the server will maintain a connection with the client while waiting for new data. This is the heart of long polling.

interval

If the client disconnects for any reason, this is the amount of time to wait before connecting again. We’ve set this to zero because the client should essentially always be connected.

maxInterval

This is the maximum amount of time to wait for a client to reconnect. If the client doesn’t receive a connection in at least this many milliseconds, the server will consider them disconnected.

multiFrameInterval

If the server detects that a client has connected more than once, meaning the user has more than one browser window or tab open, the server will instruct the client to back off. This value tells the client how long to wait before reconnecting in that situation.

logLevel

Configure how verbose the logging should be. 0 is None, 1 is for logging informational messages, and 2 is for “Debug” logging.

refsThreshold

This is used internally by the server to determine when the server should generate each message on the fly and when it should regenerate the message being sent to multiple clients.

You’ll also notice that there is a filter-mapping and a servlet-mapping, which look suspiciously similar. They both have identical url-pattern values. These are to handle two different types of servlet containers. Having both of them in there will ensure your code is more portable when you are getting ready to actually deploy it.

The next thing we need to do is make some minor modifications to that pom.xml file we copied from the examples directory. We want to instruct Maven where to find the files and ensure that everything is running from the right spot once we start the server.

Open up the apps/pom.xml file and change the artifactId setting from cometd-examples-java to tedroden-realtime-apps and the name setting to TedRoden :: Realtime :: Apps.

Since we’re slimming things down and leaving out many of the other directories, we need to update another file that would normally be included upstream. Inside that same cometd-java directory, open up pom.xml. Be sure to note that this is not the same file as the pom.xml listed earlier. Search the file for the opening XML tag called <plugins> and insert the following inside:

<plugin>
  <groupId>org.mortbay.jetty</groupId>
  <artifactId>maven-jetty-plugin</artifactId>
  <configuration>
	<scanIntervalSeconds>1</scanIntervalSeconds>
	<webAppConfig>
	  <contextPath>/</contextPath>
	</webAppConfig>
  </configuration>
</plugin>

Adding this tells Maven to load up the jetty server, downloading and installing the proper JAR files if needed. Jetty is used to serve the files for our project and can be run from within a library inside our own project, making our app a totally self-contained web realtime application.

We still need to make one more change to that file, so with pom.xml still open, update the modules section and rename the examples module to apps. The modules section should end up looking like this:

<modules>
    <module>api</module>
    <module>server</module>
    <module>client</module>
    <module>oort</module>
    <module>apps</module>
</modules>

The modules section of the pom.xml file tells Maven where to find each module that is needed for the project. We don’t actually need all of these modules for our purposes, and I’ve removed them in the packages I mentioned earlier, but they’re not hurting anything, so we’ll leave them alone for now. Some of these changes may seem like cosmetic differences, but they’ll make it much easier when it comes time to publish your code to production environments.

A Realtime Live Blog

Now that everything is in its place, it’s time to actually write some code. At this point the only programming skills required are HTML and JavaScript. We’re going to build this base of this liveblog application without any server-side code at all.

To get started, use your favorite text editor and create a file in apps/src/main/webapp/river.html. Add the following code to your file:

<!DOCTYPE HTML>
<html>
  <head>
    <script type="text/javascript" src="http://www.google.com/jsapi"></script> 
	<script type="text/javascript">
	  google.load("dojo", "1.3.2");
	</script>
    <script type="text/javascript" src="river.js"></script> 
  </head>
  <body>
	<h3>Live Feed</h3>
	<div id="stream">
	</div>
  </body>
</html>

As you can see, we’re borrowing some bandwidth from Google by using their Ajax libraries API (http://code.google.com/apis/ajaxlibs/) to host all of the Dojo JavaScript that we’re going to use. After including their main API script (http://www.google.com/jsapi), we load the latest version of the base Dojo framework.

The rest of the file is pretty straightforward. Aside from the last JavaScript include, which we will be creating in just a bit, this is just all standard HTML. The most important bit is the DIV tag with the stream id. This is the place on the page where we’re going to be posting our live updates.

Now that we have a page that will display the feed, we need to be able to post content to it. Create a file called apps/src/main/webapp/river-post.html and add the following code:

<html>
  <head>
    <script type="text/javascript" src="http://www.google.com/jsapi"></script> 
	<script type="text/javascript">
	  google.load("dojo", "1.3.2");
	</script>
    <script type="text/javascript" src="river.js"></script> 
  </head>
  <body>
	<div>
	  <p>
		<label for="author">Author</label> <br />
		<input type="text" id="author" value="" placeholder="Your Name" />
	  </p>
	  <p>
		<textarea rows="10" cols="50" id="content"></textarea>
	  </p>
	  <p>
		<input type="button" id="river-post-submit" value="Post" />
	  </p>
	</div>
  </body>
</html>

This file has all of the same JavaScript as the river.html file, but it also has some HTML form elements. Although this is where we’ll post the content to the realtime feed, we’re not actually POSTing or submitting the form. We’ll be sending all the data through JavaScript using cometd.

While the Cometd software and the Bayeux protocol handle all of the complicated parts of this process, a small bit of JavaScript is needed to get everything running. Open apps/src/main/webapp/river.js and we’ll add the needed code piece by piece.

function submitPost(e) {
	dojox.cometd.publish('/river/flow', {
		'content': dojo.byId('content').value,
		'author': (dojo.byId('author').value ?
				   dojo.ById('author').value :
				   'Anonymous')
	} );
	dojo.byId('content').value ='';
}

The first function to add to the file is submitPost. This is what is used to send the content to the server, much like submitting a standard web form. However, rather than POSTing the data, we grab the values of the form fields created in river-post.html and publish them via dojox.cometd.publish.

The function dojox.cometd.publish is what is used to publish (send) data to a named channel. The channel is the first parameter and is always a string, in this case /river/flow. The second parameter is for the JSON data that gets sent to the server.

function setupRiver() {
	dojox.cometd.init('cometd');
	var catcher = {
		handler: function(msg) {
			if (msg.data.content) {
				var p = dojo.create("p", {style: 'opacity: 0' } );
				dojo.create('strong', { innerHTML: msg.data.author }, p);
				dojo.create('p', { innerHTML: msg.data.content }, p);
				dojo.place(p, "stream", 'first');
				dojo.fadeIn({ node: p, duration: 300 }).play();
			}
		}
	};

	if(dojo.byId('river-post-submit'))
		dojo.connect(dojo.byId('river-post-submit'), "onclick", "submitPost");
	else
		dojox.cometd.subscribe("/river/flow", catcher, "handler");
}

This simple little function is the most complicated part of the JavaScript that we need to create for this application. This one function handles the main setup for both the liveblog viewer and the content creation process.

The first thing that happens is the call to dojox.cometd.init, which initializes the connection to the cometd server. This handles the handshake required by the Bayeux protocol along with details such as determining the best transport method for your browser, reconnecting if something goes wrong, and everything else that we’d rather not worry about. The lone parameter is the path to the cometd server itself. This is the same path we set up when we put the servlet-mapping tag into web.xml.

Next, we create a small object called catcher, which is what receives any messages sent from the server. These messages are passed to the handler function as JSON objects. The full Bayeux message response is sent from the server, but the only part we’re concerned with is the data. This data object is the very same JSON object published previously in the submitPost function. You’ll remember there were two members in that object: author and content.

In this function, we use the base dojo framework to create some DOM elements to display the posted content. After creating a couple of P elements and a STRONG tag to show the author name, we use dojo animation to make the HTML fade in. It’s just an extensible way of printing the HTML content to the page.

Since this file gets included by both river.html and river-post.html, this function may execute two different actions depending on which page is loaded. When a user is looking at the river-post.html file, we’ll be able to access the “Post” form button via JavaScript. We simply connect that button’s onclick event to the submitPost function we created earlier. When we don’t have that form element, we assume that we’re going to view the liveblog feed and subscribe to the /river/flow channel.

Finally, we need to add the code that gets this whole process up and running. Add the following code to the bottom of your river.js file:

google.setOnLoadCallback(
	function() {
		dojo.require("dojox.cometd");
		dojo.addOnLoad(setupRiver);
		dojo.addOnUnload(dojox.cometd, "disconnect");
	}
);

Because we’re loading the dojo files from remote servers, we need to wait to run our setup functions until all of the code has been loaded. Luckily, the Google Ajax libraries API provides the callback function google.setOnLoadCallback to let us know when that has happened. In that callback function, we tell Dojo which features we’re going to need, which in this case is only the cometd extension. Then, we instruct dojo to call our setupRiver function when it is ready to continue. The very last step is to instruct the cometd library to disconnect when we unload (or navigate away from) this page.

At this point we have all the code in place for a fully functional liveblog, so I try it out. Running the app is simple. From a terminal window, navigate to the apps directory and enter the following command:

~ cometd-java/apps $ mvn jetty:run

Now the server should be up and running. To fully test this out, you’ll want to open one browser, say Safari or Internet Explorer, and point it to http://127.0.0.1:8080/river.html. Then, open another browser, say Firefox, and point it to http://127.0.0.1:8080/river-post.html. Figure 4-4 shows a liveblog session in action (see Figure 4-4).

Realtime updates from one browser to another
Figure 4-4. Realtime updates from one browser to another

Once you’ve loaded up the browsers, you should see the two web pages we created. As you start posting content, you should notice how quickly the content shows up in the other browser window. Keep in mind that this involves a trip through the server, and it’s not just drawing content from one window to the other dynamically; all of the content is posted to the Web and loaded back in the browser in realtime. If you opened up another browser, say Opera or Chrome, and pointed it at http://127.0.0.1:8080/river.html, you’d see the content refresh in two browsers just as quickly as the one.

So what is happening here?

In our JavaScript file, when we called dojox.cometd.init, we made a connect request to the cometd server. This request handles all of the dirty work of Bayeux’s fairly complicated handshake process. At this point the server instructs the client (our JavaScript file) on how to interact with the server. This is where the server declares the timeout, interval, and all the other variables we set up in our server configuration. As programmers, we can safely ignore all of those things now because the dojox.cometd framework takes care of everything.

The next thing we did was call dojox.cometd.subscribe and subscribe to the /river/flow channel. This function starts making requests to the server using long polling at the intervals described earlier. If the server ever tells us to back off, the framework will handle that appropriately. Once again, we can focus on building our application and not the housekeeping required by the protocol.

The handshake and subscribe processes are detailed in Figure 4-5.

Handshake and subscribe with Bayeux
Figure 4-5. Handshake and subscribe with Bayeux

At this point, our browser-based client is maintaining a long-polling-based connection to the server. When data is available, it will be sent through the existing connections.

In the demo, we use a second browser to send messages to the server, which then routes the messages to the subscribed clients. To send a message to the server, pass the data encoded as a JSON object to the dojox.cometd.publish. This function just requires the JSON object and the channel name of where the data should be delivered. Then, we clear out the content field to allow the blogger to quickly post more content.

When that message makes it to the server, the server then routes it back to all of the subscribed clients. In this case, it’s sent back to the handler function of the catcher object, which we specified when we subscribed to the channel. This simple function just draws the HTML to the screen and returns.

The whole publish to message routing process is illustrated in Figure 4-6.

Publishing message to clients
Figure 4-6. Publishing message to clients

The Two-Connection Limit

There is big a reason you need to open these files up in separate browsers and not just different tabs in the same browser when testing. Most modern browsers limit the amount of concurrent connections to just two connections per server. This means that if you have two connections open to the server and try to open another connection in the same browser, even if it’s in a new tab, the browser will wait until one of the original connections disconnects. When we’re doing long polling, that means we’ll be waiting a long time between connections.

Cometd actually helps with this issue by using the advice part of the protocol to instruct the client to fall back to regular polling at standard intervals. Although this helps keep connections alive, it means we’re not getting truly realtime content because of the length of time between requests. In practice for normal users, this isn’t as much of an issue, but when building sites, it poses a bit of a problem. The solution is simple: use totally different browsers.

You can easily check to see if the cometd libraries have fallen back to standard polling at long intervals by examining the transfers with the Firefox extension Firebug (http://getfirebug.com). Firebug has many features that make debugging web applications much easier, such as the ability to examine the network activity, including connections that are still active. When you load up and enable Firebug, the Console tab will show you the different POST requests currently active (see Figure 4-7). If one of them is constantly active, long polling is working. If the connection returns immediately, it’s fallen back. To fix this, navigate away from the page for a minute; it should go right back to long polling when you return.

Firebug during a long polling session
Figure 4-7. Firebug during a long polling session

While the browser is looking at a page in a long polling section, you may notice the status bar say something to the effect of “completed 5 of 6 items,” as if the browser is still loading an asset. It is! It’s waiting on your long polling operation to finish, and it will keep waiting until it does. Although the status bar may claim the page load is incomplete, for all intents and purposes, everything is ready to go.

Server-Side Filters (with Java)

Our liveblog application will allow any number of clients to connect and view the content in realtime, but it also will allow any number of people to publish content. We don’t want to rely on the hope that the users will never find our content creation URL, so we’re going to have to lock it down. This is a great opportunity for us to test out some Java code. If you’re not familiar with Java, don’t worry. It won’t hurt a bit.

To limit posting access to authorized users, we’re going to require a password to be sent along with each publish request. If the password is correct, we’ll publish the content; if it’s wrong, we’ll silently ignore the request. On the server side we can check for this password in a number of places, but we’re going to do it from within the server-side content filter.

Server-side filters are Java-based classes that work very simply. As messages get sent to specific channels on the server, the server checks to see whether those channels have any filters set up. If a filter is configured, the request is sent to the filter before doing any other processing on the request. Inside the filter class, the data may be modified and returned to be operated on by the server or passed back to the clients. But if the filter returns null, the message is dropped and not delivered any further up the chain, and certainly not back to clients who have subscribed to the channel. This makes it a perfect place for us to check for the correct password.

The first thing we need to do is set up the filter configuration file. These files are just JSON data structures linking channels to Java classes, stored near the configuration file in the WEB-INF folder of the servlet. Open the file apps/src/main/webapp/WEB-INF/filters.json and add the following data structure:

[
    {
        "channels": "/river/**",
        "filter"  : "com.tedroden.realtime.FilterPasswordCheck",
        "init"    : { "required_password": "12345" }
    }
]

This file is pretty straightforward. While this example has only one filter in it, the data structure is set up as an array, so as you add more filters, simply append them as JSON objects into the existing array. The fields in the object are as follows:

channels

The channel name (or names) that this filter applies to. The channel names follow the same conventions as everywhere else, and we can use a channel glob as we’ve done here. This will allow us to filter any content sent to any /river channel.

filter

This is the actual Java classname that is used as the filter. When a request matches the channel listed in that field, the request is sent through this class as soon as it is received.

init

This JSON object gets sent to the filter class listed in the previous field upon initialization. You can use this space to pass any variables to the class that apply to this specific instance of the filter.

Next, we need to create the filter class that is called when a publish request is made to /river/flow. Open the file apps/src/main/java/com/tedroden/realtime/FilterPasswordCheck.java and add this:

package com.tedroden.realtime;

import java.util.Map;
import org.cometd.Client;
import org.cometd.Channel;
import org.cometd.server.filter.JSONDataFilter;

public class FilterPasswordCheck extends JSONDataFilter
{

	String required_password;
    @Override
	public void init(Object init)
    {
        super.init(init);
		required_password = (String) ((Map)init).get("required_password");
    }

    @Override
	public Object filter(Client from, Channel to, Object data)
    {
		try {
			if(((Map)data).get("password").equals(required_password)) 
				return data;
			else  
				return null;
		}
		catch (NullPointerException e) {
			return null;
		}
	}
}

The top of this file just declares that it’s part of the com.tedroden.realtime package, or whichever namespace you’ve decided to use. After that, we import a few Java libraries that this file uses. Then, we just create our filter class, which extends the JSONDataFilter class provided by the cometd distribution.

When we set up the filters.json file, we specified certain data that got passed to the init function of the filter. As you can see, it’s the first and only parameter this init function accepts. We pass it along to the parent class and then grab the required_password, which will be used for the lifetime of this filter.

The only other function in this file is the actual filter function. On top of the data parameter, which is the JSON data passed in from the client, there are also two other parameters. The first parameter is the Client object, otherwise known as the sender of the message. The second parameter is an object representation of the destination channel.

The data parameter is the JSON object we pass in from JavaScript when we publish to the server. All we do is check to see that the provided password matches the required_password that we set up in the init function. If it does, we return the data; otherwise, we return null. This is a very basic filter that does one of two things. It either blocks the data from getting through to the client or passes it along unchanged.

Before this filter is picked up by the server, we need to tell the apps/src/main/webapp/WEB-INF/web.xml file about it. Open up that file and add the filters parameter to the <servlet> section:

<servlet-name>cometd</servlet-name>
        <servlet-class>
		  org.cometd.server.continuation.ContinuationCometdServlet
		</servlet-class>
        <init-param>
            <param-name>filters</param-name>
            <param-value>/WEB-INF/filters.json</param-value>
        </init-param>
...

Finally, we need to collect the password from the author and pass it along with the Bayeux publish request. We need to make two minor changes to get this to work. First, let’s add the password field to apps/src/main/webapp/river-post.html:

<p>
	<label for="author">Author</label> <br />
	<input type="text" id="author" value="" placeholder="Your Name" />
</p>
<p>
	<label for="password">Password</label> <br />
	<input type="text" id="password" value="" placeholder="Password (12345)" />
</p>

Then, inside apps/src/main/webapp/river.js, we add one additional line to send the password to the server:

function submitPost(e) {
	dojox.cometd.publish('/river/flow', {
		'content': dojo.byId('content').value,
		'password': dojo.byId('password').value,
		'author': (dojo.byId('author').value ?
				   dojo.byId('author').value :
				   'Anonymous')
...

We’ve added all of the code needed to secure this form with a password, so now we need to start the server and test it out. From the apps directory, instruct Maven to start the server:

~ cometd-java/apps $ mvn jetty:run
[INFO] Scanning for projects...
[INFO] Searching repository for plugin with prefix: 'jetty'.
...

This will compile the newly created filter and then start the server. When you point your web browser to the same river-post.html, you’ll notice the newly added password field (see Figure 4-8). The password we set in the filters.json file was “12345”. Try submitting the form both with and without the correct password. When the correct password is supplied, the form submits like normal. If the password is wrong, the publish request is silently ignored.

The form with its password field
Figure 4-8. The form with its password field

Integrating Cometd into Your Infrastructure

So far, we’ve seen a fairly interesting example of how Bayeux and Cometd change things for both users and developers. This is great stuff, but unless you’re starting a project from scratch, there is a good chance that these examples have used different technologies than your existing infrastructure. Moving over all of your existing code doesn’t make a lot of sense. The good news is that incorporating this into a standard web environment is remarkably easy.

For the sake of simplicity, I’ll demonstrate how to set this up with Apache’s httpd server, which is extremely popular and runs on just about every platform. This idea should work on most other server platforms as well; you’ll just need to use their configuration methods.

On Apache, you’ll need to install the proxy module if it’s not installed already. If you installed Apache through your operating system’s package manager, you should be able to install it that way. On Ubuntu, this should do the trick: sudo apt-get install libapache2-mod-proxy-html. If you compiled Apache from source, this is as simple as reconfiguring with --enable-proxy=shared and recompiling.

You’ll also need to update your httpd.conf file to load the proxy module. To do that, add the following lines near any other LoadModule statements:

LoadModule proxy_module           modules/mod_proxy.so
LoadModule proxy_http_module      modules/mod_proxy_http.so
LoadModule proxy_connect_module   modules/mod_proxy_connect.so

Now, before you restart your Apache server, you want to add the proxy directives to your (virtual) host configuration. Open up your host’s configuration file and add the following:

    <Location /cometd>
       ProxyPass http://localhost:8080/cometd/
    </Location>

This Location directive tells apache to take any requests that it receives for /cometd and pass them along to http://localhost:8080/cometd/. Any response from http://localhost:8080/cometd/ is proxied back to the web client through Apache. The web browser never knows anything about the proxy or port 8080. All of the content appears to come from the same server.

This configuration sets up Apache to use the exact same relative path as our Java-based Cometd code. This means we can actually drop in the client-side code, unchanged, and it will just work. To test this out, copy river.html and river.js to your standard document root and bring up river.html in your web browser. When posting content from http://127.0.0.1:8080/river-post.html, it should work exactly the same as if it were being served by the Jetty server. This allows you to easily integrate this functionality into new and existing pages on your site.

Get Building the Realtime User Experience now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.