[2.0] The unparsed-text() and unparsed-text-available() Functions

The last new function for combining documents is the unparsed-text() function. This lets you read in text from a URL. That text is not parsed, letting you read in text documents, comma-separated values, or even HTML documents that aren’t well-formed XML. What’s more, you can combine unparsed-text() with other new features such as the tokenize() function or the <xsl:analyze-string> element to process that text and transform it in a useful way.

As an example, we’ll read in a file of comma-separated values and output them as an HTML table of addresses. Here’s the comma-separated file, unparsed-text.csv:

Mr.,Chester Hasbrouck,Frisby,1234 Main Street,Sheboygan,WI,48392
Ms.,Natalie,Attired,707 Breitling Way,Winter Harbor,ME,00218
Ms.,Amanda,Reckonwith,930-A Chestnut Street,Lynn,MA,02930
Mrs.,Mary,Backstayge,283 First Avenue,Skunk Haven,MA,02718

We’ll go through three simple steps to process this data. First, we’ll use the tokenize() function to get each line of the file. Next, we’ll use tokenize() to get each comma-separated value. Finally, we’ll take each value and transform it appropriately. Using the comma-separated file we’ve listed here, the third comma-separated value in each line is the customer’s last name, the seventh value is the zip code, and so forth.

To process the file one line at a time, we’ll use this technique, courtesy of the XSLT 2.0 spec:

<xsl:for-each select="tokenize(unparsed-text('addresses.csv'), '\r?\n')"> ...

Get XSLT, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.