Open source, command-line Java programs that process XML are abundant. This hack shows you how to use them.
The Java programming language (http://java.sun.com) has been a popular object-oriented language since it was unveiled by Sun in the mid-1990s. One key idea behind Java was that it made it possible to write and compile a program once, and then run it on any machine that supports a Java interpreter (“write once, run anywhere”). Note that it’s not a perfect programming language—I’ve heard Ted Ts’o (http://thunk.org/tytso/) say of Java, “Write once, run screaming.”
Nonetheless, Java is widespread and generally well liked, and you’ll find many command-line Java programs that can process XML in one way or another. A number of these programs appear in this book, so this hack walks you through how to use them.
This hack assumes that you know little to nothing about Java. If you are entirely new to Java, the information at http://java.sun.com/learning/new2java/ will also help you get up to speed quickly.
To get a Java program to run on your system, you need a Java virtual machine (VM), part of the Java runtime environment (JRE). One may already be on your system, but to get the latest JRE anyway, go to http://java.sun.com and find the link for the Java VM download. (There are alternatives to Sun’s VM, such as one offered on http://www.kaffe.org/, but I’m only going to talk about the Sun VM here.) In a few clicks, the new VM will be downloaded to your machine. You should then be able to go to a command prompt and type:
and get a response that looks something like the following:
java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)
A more recent version may be available, but if you get a reply similar to this, you’re in business. If not, consult the installation instructions for Windows (http://java.sun.com/j2se/1.4.2/install-windows.html), the Mac (http://developer.apple.com/java/download.html), general Unix (http://java.sun.com/j2se/1.4.2/install-linux.html), or Solaris (http://java.sun.com/j2se/1.4.2/install-solaris.html).
In the file archive for this book (mentioned at the beginning of this chapter) is the Java archive or JAR file wf.jar . This JAR contains all the compiled Java classes from the XML Object Model or XOM (http://www.cafeconleche.org/XOM/). XOM is a simple, open source, tree-based application programming interface (API) for XML, written in Java. wf.jar also contains a little program called Wf.class that does a well-formedness check on an XML document. Type in this line:
java -jar wf.jar
The program echoes back usage information, letting you know that it expects a URL as an argument:
Usage: java -jar wf.jar URL
Try it with a file:
java -jar wf.jar time.xml
Because it is well-formed, time.xml is written to standard output. If it were not well-formed, Wf.class would display an error. Try this program with bad.xml, which contains a fatal well-formedness error:
java -jar wf.jar bad.xml
You should get an error like:
nu.xom.ParsingException: Expected "</hour>" to terminate element starting on line 5. at line 5, column -1.
Once again, try it with a web resource:
java -jar wf.jar http://www.wyeast.net/time.xml
If it finds no errors, the program will echo the file to standard output (the console).
Class files contain compiled bytecode that can be executed by the Java interpreter. The interpreter has to be able to “see” where the class files are in order to execute them. That’s why there’s such a thing as a classpath. You have to place the needed Java classes in the classpath so that the interpreter can see them.
The file Wf.class comes with the book’s file archive and should have been extracted into your working directory. Even when a class file is in the same directory where you are running the Java interpreter, you can’t execute it unless it’s in the classpath. In addition, the class file Wf.class also needs the XOM JAR to run.
Assuming that you have downloaded and stored
(renamed to xom.jar from a version available at
writing time, xom-1.0d24.jar) in the working
directory, place it directly in the classpath on the command line by
-cp switch. On Windows, you do it like
java -cp .;xom.jar Wf worksheet.xml
Or on Unix, you do it like this:
java -cp .:xom.jar Wf worksheet.xml
The difference between the Windows and Unix commands is the colon
versus the semicolon (: or
The current directory is represented by a period
If a directory contains the actual classes, all you have to do is place the directory in the classpath; if the classes are contained in a JAR file, you have to place the path to the JAR file, including the JAR filename, in the classpath.
There are several other solutions for placing class files in the classpath. On Windows, you could place the JAR file in the classpath using this line at a command prompt or in autoexec.bat:
This puts the current directory (.) and
C:\Hacks\examples\xom.jar in the
CLASSPATH environment variable.
%CLASSPATH% prepends the current classpath to the
new value of
The following command works on Unix (this line could be added to a shell setup file, such as .profile or .cshrc):
$CLASSPATH adds the current classpath to the new
classpath. Another way to put classes in
the classpath is to place a copy of the JAR file in the
jre/lib directory where your JRE is installed.
For example, wherever the JRE is installed, it will have the
subdirectory jre/lib, such as
C:\Program Files\Java\j2sdk1.4.2_03\jre\lib on
If you are using Windows XP, you can also set the
CLASSPATH environment variable by choosing Start
→ Control Panel → System, clicking the Advanced
tab, and then clicking the Environment Variables button (Figure 1-20). Select the existing
CLASSPATH variable and add the classpath
information to it. If the classpath variable does not already exist,
you can create it by clicking the New button (Figure 1-21). You can select or add a
CLASSPATH variable either for an individual user
or, if you have administrator privileges, for the whole system
With a little
setup, you can use a JAR file that uses the JAR method—one that
Main-class: field in its manifest
file—like a normal executable file (.exe)
on a Windows 2000 or XP command line. James Clark explained this
technique on the RELAX NG mailing list a few years ago (http://lists.oasis-open.org/archives/relax-ng/200203/msg00037.html).
This is how you do it.
In a command prompt window, go to the working directory where you extracted the file archive for the book, then type:
This helps you find out what name is associated with the .jar extension, if any (to backtrack, write it down if it is already associated with some name). Now type this in:
This command associates the extension .jar with the name jarfile. Then enter:
ftype jarfile=C:\Program Files\Java\j2sdk1.4.2_03\bin\java -jar %1 %*
ftype displays or modifies the file types that are
used with file extension associations. This command associates the
name jarfile with java.exe
using the replaceable parameters
%* for the JAR filename and for the input files,
Next, set the path extension like this, which prepends the
.jar extension to the current path extensions
Also make sure that the current directory is in the path by using this command:
This prepends the path of the current directory
(.) to the current path
%path%). Now, enter the following:
This will execute Wf.class, which
Main-Class: Wf points to in the manifest file. You
will see this response:
Usage: java -jar wf.jar URL
Try this command with other JARs, such as jing.jar or trang.jar, to see what kind of response you get. To turn this feature off, just type:
This disassociates files with the .jar extension
with the name jarfile, or any other name. If
.jar was associated with another name
(determined in the first step when you typed
assoc.jar), you can reenter that name now.