JavaScript & DHTML Cookbook, 2nd Edition

Chapter 1. Strings

Introduction

A string is one of the fundamental building blocks of data that JavaScript works with. Any script that touches URLs or user entries in form text boxes works with strings. Most document object model properties are string values. Data that you read or write to a browser cookie is a string. Strings are everywhere!

The core JavaScript language has a repertoire of the common string manipulation properties and methods that you find in most programming languages. You can tear apart a string character by character if you like, change the case of all letters in the string, or work with subsections of a string. Most scriptable browsers now in circulation also benefit from the power of regular expressions, which greatly simplify numerous string manipulation tasks—once you surmount a fairly steep learning curve of regular expression syntax.

Your scripts will commonly be handed values that are already string data types. For instance, if you need to inspect the text that a user has entered into a form’s text box, the value property of that text box object returns a value already typed as a string. All properties and methods of any string object are immediately available for your scripts to operate on that text box value.

Creating a String

If you need to create a string, you have a couple of ways to accomplish it. The simplest way is to simply assign a quoted string of characters (known as a string literal) to a variable (or object property):

	var myString = "Fluffy is a pretty cat.";

Quotes around a JavaScript string can be either single or double quotes, but each pair must be of the same type. Therefore, both of the following statements are acceptable:

	var myString = "Fluffy is a pretty cat.";
	var myString = 'Fluffy is a pretty cat.';

But the following mismatched pair is illegal and throws a script error:

	var myString = "Fluffy is a pretty cat.';

Having the two sets of quote symbols is handy when you need to embed one string within another. The following document.write() statement that would execute while a page loads into the browser has one outer string (the entire string being written by the method) and nested sets of quotes that surround a string value for an HTML element attribute:

	document.write("<img src='img/logo.jpg' height='30' width='100' alt='Logo' />");

You are also free to reverse the order of double and single quotes as your style demands. Thus, the above statement would be interpreted the same way if it were written as follows:

	document.write('<img src="img/logo.jpg" height="30" width="100" alt="Logo" />');

Two more levels of nesting are also possible if you use escape characters with the quote symbols. See Using Special and Escaped Characters for examples of escaped character usage in JavaScript strings.

If you need to include only one instance of a single or double quote within a string (e.g., "Welcome to Joe's Diner."), you can do so without special characters. This is because upon encountering the start of a string, JavaScript treats ensuing characters—up to the next occurrence of the same quote symbol that starts the string—as part of the string. Trouble arises, however, when two or more alternate quote symbols are nested within the string (e.g., "Welcome to Joanne's and Joe's Diner."). In such cases, you would have to use escaped apostrophes to keep the string together ("Welcome to Joanne\'s and Joe\'s Diner."). Or, you can always use escaped quotes (even just one) inside a string, and then you won’t have to worry about the balancing act.

Technically speaking, the strings described so far aren’t precisely string objects in the purest sense of JavaScript. They are string values, which, as it turns out, lets the strings use all of the properties and methods of the global String object which inhabits every scriptable browser window. Use string values for all of your JavaScript text manipulation. In a few rare instances, however, a JavaScript string value isn’t quite good enough. You may encounter this situation if you are using JavaScript to communicate with a Java applet, and one of the applet’s public methods requires an argument as a string data type. In this case, you might need to create a full-fledged instance of a String object and pass that object as the method argument. To create such an object, use the constructor function of the String object:

	var myString = new String("Fluffy is a pretty cat.");

The data type of the myString variable after this statement executes is object rather than string. But this object inherits all of the same String object properties and methods that a string value has, and works fine with a Java applet.

Regular Expressions

For the uninitiated, regular expressions can be cryptic and confusing. This isn’t the forum to teach you regular expressions from scratch, but perhaps the recipes in this chapter that demonstrate them will pique your interest enough to pursue their study.

The purpose of a regular expression is to define a pattern of characters that you can then use to compare against an existing string. If the string contains characters that match the pattern, the regular expression tells you what the text is that matches the pattern and where the match occurs within the string, facilitating further manipulation (perhaps a search-and-replace operation). Regular expression patterns are powerful entities because they let you go much further than simply defining a pattern of fixed characters. For example, you can define a pattern to be a sequence of five numerals bounded on each side by whitespace. Another pattern can define the format for a typical email address, regardless of the length of the username or domain, but the full domain must include at least one period.

The cryptic part of regular expressions is the notation they use to specify the various conditions within the pattern. JavaScript regular expressions notation is nearly identical to regular expressions found in languages such as Perl. The syntax is the same for all except for some of the more esoteric uses. One definite difference is the way you create a regular expression object from a pattern. You can use either the formal constructor function or the more compact literal syntax. The following two syntax examples create the same regular expression object:

	var re = /pattern/ [g | i | m];                        // Literal syntax
	var re = new RegExp(["pattern", ["g"| "i" | "m"]]);    // Formal constructor

The optional trailing characters (g, i, and m) indicate whether:

g The pattern should be applied globally (i.e., to every instance of the pattern in a string)

i The pattern is case-insensitive

m Each physical line of the target string is treated as the start of a string

If you have been exposed to regular expressions in the past, Table 1-1 lists the regular expression pattern notation available in today’s browsers.

Table 1-1. Regular expression notation

Character	Matches	Example
`\b`	Word boundary	`/\bto/`matches “tomorrow”
		`/to\b/`matches “Soweto”
		`/\bto\b/`matches “to”
`\B`	Word nonboundary	`/\Bto/`matches “stool” and “Soweto”
		`/to\B/`matches “stool” and “tomorrow”
		`/\Bto\B/`matches “stool”
`\d`	Numeral 0 through 9	`/\d\d/`matches “42”
`\D`	Nonnumeral	`/\D\D/`matches “to”
`\s`	Single whitespace	`4/under\sdog/`matches “under dog”
`\S`	Single nonwhitespace	`/under\Sdog/`matches “under-dog”
`\w`	Letter, numeral, or underscore	`/1\w/`matches “1A”
`\W`	Not a letter, numeral, or underscore	`/1\W/`matches “1%”
`.`	Any character except a newline	`/../`matches “Z3”
`[...]`	Any one of the character set in brackets	`/J[aeiou]y/`matches “Joy”
`[^...]`	Negated character set	`/J[^eiou]y/`matches “Jay”
`*`	Zero or more times	`/\d*/`matches “”, “5” , or “444”
`?`	Zero or one time	`/\d?/`matches “” or “5”
`+`	One or more times	`/\d+/`matches “5” or “444”
`{n}`	Exactly n times	`/\d{2}/`matches “55”
`{n,}`	n or more times	`/\d{2,}/`matches “555”
`{n,m}`	At least n, at most m times	`/\d{2,4}/`matches “5555”
`^`	At beginning of a string or line	`/^Sally/`matches “Sally says…”
`$`	At end of a string or line	`/Sally.$/`matches “Hi, Sally.”

See Testing String Containment Without Regular Expressions, Testing String Containment with Regular Expressions through Searching and Replacing Substrings, as well as Performing Common Text Field Validations, to see how regular expressions can empower a variety of string examination operations with less overhead than more traditional string manipulations. For in-depth coverage of regular expressions, see Mastering Regular Expressions by Jeffrey E. F. Friedl (O’Reilly).

Concatenating (Joining) Strings

Problem

You want to join together two strings or accumulate one long string from numerous sequential pieces.

Solution

Within a single statement, use the plus (+) operator to concatenate multiple string values:

	var longString = "One piece " + "plus one more piece.";

To accumulate a string value across multiple statements, use the add-by-value (+=) operator:

	var result = "";
	result += "My name is " + document.myForm.myName.value;
	result += " and my age is " + document.myForm.myAge.value;

The add-by-value operator is fully backward-compatible and is more compact than the less elegant approach:

	result = result + "My name is " + document.myForm.myName.value;

Discussion

You can use multiple concatenation operators within a single statement as needed to assemble your larger string, but you must be cautious about word wrapping of your source code. Because JavaScript interpreters have a built-in feature that automatically inserts semicolons at the logical ends of source code lines, you cannot simply break a string with a carriage return character in the source code without putting the syntactically correct breaks in the code to indicate the continuation of a string value. For example, the following statement and format triggers a syntax error as the page loads:

	var longString = "One piece " + "plus one
	more piece.";

The interpreter treats the first line as if it were:

	var longString = "One piece " + "plus one;

To the interpreter, this statement contains an unterminated string and invalidates both this statement and anything coming after it. To break the line correctly, you must terminate the trailing string, and place a plus operator as the final character of the physical source code line (do not put a semicolon there because the statement isn’t finished yet). Also, be sure to start the next line with a quote symbol:

	var longString = "One piece " + "plus one " +
	"more piece.";

Additionally, whitespace outside of the quoted string is ignored. Thus, if you wish to format the source code for improved readability, you can even indent the second line without affecting the content of the string value:

	var longString = "One piece " + "plus one " +
	    "more piece.";

Source code carriage returns do not influence string text. If you want to include a carriage return in a string, you need to include one of the special escaped characters (e.g., \n) in the string. For example, to format a string for a confirm dialog box so that it creates the illusion of two paragraphs, include a pair of the special newline characters in the string:

	var confirmString = "You did not enter a response to the last " +
	    "question.\n\nSubmit form anyway?";

Note that this kind of newline character is for string text that appears in dialog boxes or other string-only containers. It is not a newline character for text that is to be rendered as HTML content. For that kind of newline, you must explicitly include a <br> tag in the string:

	var htmlString = "First line of string.<br />Second line of string.";

Improving String Handling Performance

Problem

You wish to improve the execution speed of routines manipulating large amounts of text.

Solution

Use a JavaScript array as a temporary storage device when accumulating large chunks of text. The push() method of an array object allows you to assemble individual text blocks in the desired order—the method appends to the end of the array. When it comes time to use the full text (e.g., to assign a large string of HTML code to the innerHTML property of an element object), use the join() method of the array object, specifying an empty string as the delimiter character.

Although the technique is intended for large text blocks, the following example uses small strings to demonstrate the sequence:

	var txtArray = new Array();
	txtArray.push("<tr>");
	txtArray.push("<td>Boston</td><td>24</td><td>10</td><td>Partly Cloudy</td>");
	txtArray.push("</tr>");
	txtArray.push("<tr>");
	txtArray.push("<td>New York</td><td>21</td><td>14</td><td>Snow</td>");
	txtArray.push("</tr>");
	document.getElementById("weatherTBody").innerHTML = txtArray.join("");
	txtArray = null;

The sequence ends by emptying the array so that the browser will free up memory occupied by the array.

Discussion

String concatenation, especially when it involves either large amounts of text or an inordinate amount of pieces being stitched together via the add-by-value (+=) operator, can be a performance hog in browsers. You may never notice the problem if your strings are not very large, but the signs start to appear when you use standard string concatenation in repeat loops that assemble huge strings. These situations are excellent candidates for using an array as the temporary string data holder. Scripts typically execute array manipulation with much better performance than string manipulation.

Note that, just as with strings, your code is responsible for handling details, such as spaces between words in joined text. If spaces are needed, they should go in the text being pushed onto the end of the array. Alternatively, if a space is needed between absolutely every string stored in the array, you can specify a space character as the parameter to the join() method:

	var finalString = txtArray.join(" ");

The character you specify as the parameter (if any) is inserted between array items as they are output as a single string.

Invoking the join() method does not alter the contents of the array. To minimize the impact on browser memory once the array’s contents are no longer needed, you should assign null to the array, thus allowing the browser’s garbage collector to do its job.

Accessing Substrings

Problem

You want to obtain a copy of a portion of a string.

Solution

Use the substring() method (in all scriptable browsers) to copy a segment starting at a particular location and ending either at the end of the string (omitting the second parameter does that) or at a fixed position within the string, counting from the start of the string:

	var myString = "Every good boy does fine.";
	var section = myString.substring(0, 10);   // section is now "Every good"

Use the slice() method (in IE 4 or later and all modern scriptable browsers) to set the end position at a point measured from the end of the string, using a negative value as the second parameter:

	var myString = "Every good boy does fine.";
	var section = myString.slice(11, -6);      // section is now "boy does"

Use the nonstandard, but widely supported, variant called substr() to copy a segment starting at a particular location for a string length (the second parameter is an integer representing the length of the substring):

	var myString = "Every good boy does fine.";
	var section = myString.substr(6, 4);       // section is now "good"

If the sum of the two arguments exceeds the length of the string, the method returns a string from the start point to the end of the string.

Discussion

Parameters for the ECMA-compatible slice() and substring() methods are numbers that indicate the zero-based start and end positions within the string from which the extract comes. The first parameter, indicating the start position, is required. When you use two positive integer values for the slice() method arguments (and the first argument is smaller than the second), you receive the same string value as the substring() method with the same arguments.

Note that the integer values for substring() and slice() act as though they point to spaces between characters. Therefore, when a substring() method’s arguments are set to 0 and 4, it means that the substring starts to the right of the “zeroeth” position and ends to the left of the fourth position; the length of the string value returned is four characters, as shown in Figure 1-1.

Figure 1-1. How substring end points are calculated

If you should supply argument values for the substring() or substr() methods in an order that causes the first argument to be larger than the second, the JavaScript interpreter automatically reverses the order of arguments so that the end pointer value is always larger than the start pointer. The slice() method isn’t as forgiving and returns an empty string.

None of the substring methods modifies the original string object or value in any way. This is why you must capture the returned value in a variable, or apply the returned value as an argument to some other function or method.

Changing String Case

Problem

You want to convert a string to all upper- or lowercase letters.

Solution

Use the two dedicated String object methods, toLowerCase() and toUpperCase(), for case changes:

	var myString = "New York";
	var lcString = myString.toLowerCase();
	var ucString = myString.toUpperCase();

Both methods return modified copies of the original string, leaving the original intact. If you want to replace the value of a variable with a case-converted version of the original string (and thus eliminate the original string), reassign the results of the method to the same variable:

	myString = myString.toLowerCase();

Do not, however, redeclare the variable with a var keyword.

Discussion

Because JavaScript strings (like just about everything else in the language) are casesensitive, it is common to use case conversion for tasks such as testing the equivalency of a string entered into a text box by a user against a known string in your code. Because the user might include a variety of case variations in the entry, you need to guard against unorthodox entries by converting the input text to all uppercase or all lowercase letters for comparison (see Testing Equality of Two Strings).

Another common need for case conversion is preparing user entries for submission to a database that prefers or requires all uppercase (or all lowercase) letters. You can accomplish this for a user either at the time of entry or during batch validation prior to submission. For example, an onchange event handler in a text box can convert the text to all uppercase letters as follows:

	<input type="text" name="firstName" id="firstName" size="20" maxlength="25"
	   onchange="this.value=this.value.toUpperCase()" />

Simply reassign a converted version of the element’s value to itself.

Testing Equality of Two Strings

Problem

You want to compare a user’s text entry against a known string value.

Solution

Convert the user input to either all uppercase or all lowercase characters, and then use the JavaScript equality operator to make the comparison:

	if (document.myForm.myTextBox.value.toLowerCase() == "new york") {
	    // process correct entry }

By using the results of the case conversion method as one of the operands of the equality expression, you do not modify the original contents of the text box. (See Changing String Case if you want to convert the text in the text box to all of one case.)

Discussion

JavaScript has two types of equality operators. The fully backward-compatible, standard equality operator (==) employs data type conversion in some cases when the operands on either side are not of the same data type. Consider the following variable assignments:

	var stringA = "My dog has fleas.";
	var stringB = new String("My dog has fleas.");

These two variables might contain the same series of characters but are different data types. The first is a string value, while the second is an instance of a String object. If you place these two values on either side of an equality (==) operator, JavaScript tries various evaluations of the values to see if there is a coincidence somewhere. In this case, the two variable values would show to be equal, and the following expression:

	stringA == stringB

returns true.

But the other type of equality operator, the strict equality operator (===), performs no data type conversions. Given the variable definitions above, the following expression evaluates to false because the two object types differ, even though their payloads are the same:

	stringA === stringB

If the logic of your code requires you to test for the inequality of two strings, you can use the inequality (!=) and strict inequality (!==) operators. For example, if you want to process an incorrect entry, the branching flow of your function would be like the following:

	if (document.getElementById("myTextBox").value.toLowerCase() != "new york") {
	    // process incorrect entry
	}

The same data type conversion issues apply to the inequality and strict inequality operators as to their opposite partners.

Although the equality and inequality operators go to great lengths to find value matches, you may prefer to assist the process by performing obvious data type conversions in advance of the operators. For instance, if you want to see if an entry to a numeric text box (a string value) is a particular number, you could let the equality operator perform the conversion for you, as in:

	if (document.getElementById("myTextBox").value == someNumericVar) { ... }

Or you could act in advance by converting one of the operands so that both are the same data type:

	if (parseInt(document.getElementById("myTextBox").value) == someNumericVar) { ... }

If you are accustomed to more strongly typed programming languages, you can continue the practice in JavaScript without penalty, while perhaps boosting your script’s readability.

Testing String Containment Without Regular Expressions

Problem

You want to know if one string contains another, without using regular expressions.

Solution

Use the JavaScript indexOf() string method on the longer string section, passing the shorter string as an argument. If the shorter string is inside the larger string, the method returns a zero-based indexinteger of the start position of the smaller string within the larger string. If the shorter string is not in the larger string, the method returns –1.

For logic that needs to branch if the smaller string is not contained by the larger string, use the following construction:

	if (largeString.indexOf(shortString) == -1) {
	    // process due to missing shortString
	}

For logic that needs to branch if the smaller string is contained somewhere within the larger string, use the following construction:

	if (largeString.indexOf(shortString) != -1) {
	    // process due to found shortString
	}

In either case, you are not interested in the precise position of the short string but simply whether it is anywhere within the large string.

Discussion

You may also find the integer returned by the indexOf() method to be useful in a variety of situations. For example, an event handler function that gets invoked by all kinds of elements in the event-propagation (bubbling) chain wants to process events that come only from elements whose IDs begin with a particular sequence of characters. This is an excellent spot to look for the returned value of zero, pointing to the start of the larger string:

	function handleClick(evt) {
	    var evt = (evt) ? evt : ((window.event) ? window.event : null);
	    if (evt) {
	        var elem = (evt.target) ? evt.target : ((evt.srcElement) ?
	            evt.srcElement : null);
	        if (elem && elem.id.indexOf("menuImg") == 0) {
	            // process events from elements whose IDs begin with "menuImg"
	        }
	    }
	}

Be aware that if the larger string contains multiple instances of the shorter string, the indexOf() method returns a pointer only to the first instance. If you’re looking to count the number of instances, you can take advantage of the indexOf() method’s optional second parameter, which specifies the starting position for the search. A compact repeat loop can count up the instances quickly:

	function countInstances(mainStr, srchStr) {
	    var count = 0;
	    var offset = 0;
	    do {
	        offset = mainStr.indexOf(srchStr, offset);
	        count += (offset != -1) ? 1 : 0;
	    } while (offset++ != -1)
	    return count
	}

Counting instances is much easier, however, using regular expressions (see Testing String Containment with Regular Expressions). Although many factors can influence performance, for the task of testing only for string containment, the indexOf() approach is typically faster than using a regular expression.

Testing String Containment with Regular Expressions

Problem

You want to use regular expressions to know whether one string contains another.

Solution

Create a regular expression with the short string (or pattern) and the global (g) modifier. Then pass that regular expression as a parameter to the match() method of a string value or object:

	var re = /a string literal/g;
	var result = longString.match(re);

When a global modifier is attached to the regular expression pattern, the match() method returns an array if one or more matches are found in the longer string. If there are no matches, the method returns null.

Discussion

To work this regular expression mechanism into a practical function, you need some helpful surrounding code. If the string you are looking for is in the form of a string variable, you can’t use the literal syntaxfor creating a regular expression as just shown. Instead, use the constructor function:

	var shortStr = "Framistan 2000";
	var re = new RegExp(shortStr, "g");
	var result = longString.match(re);

After you have called the match() method, you can inspect the contents of the array value returned by the method:

	if (result) {
	    alert("Found " + result.length + " instances of the text: " + result[0]);
	} else {
	    alert("Sorry, no matches.");
	}

When matches exist, the array returned by match() contains the found strings. When you use a fixed string as the regular expression pattern, these returned values are redundant. That’s why it’s safe in the previous example to pull the first returned value from the array for display in the alert dialog box. But if you use a regular expression pattern involving the symbols of the regular expression language, each of the returned strings could be quite different, but equally valid because they adhere to the pattern.

As long as you specify the g modifier for the regular expression, you may get multiple matches (instead of just the first). The length of the array indicates the number of matches found in the longer string. For a simple containment test, you can omit the g modifier; as long as there is a match, the returned value will be an array of length 1.

Searching and Replacing Substrings

Problem

You want to perform a global search-and-replace operation on a text string.

Solution

The most efficient way is to use a regular expression with the replace() method of the String object:

	var re = /a string literal/g;
	var result = mainString.replace(re, replacementString);

Invoking the replace() method on a string does not change the source (original) string. Capture the changed string returned by the method, and apply the result where needed in your scripts or page. If no replacements are made, the original string is returned by the method. Be sure to specify the g modifier for the regular expression to force the replace() method to operate globally on the original string; otherwise, only the first instance is replaced.

Discussion

	var searchStr = "F2K";
	var replaceStr = "Framistan 2000";
	var re = new RegExp(searchStr , "g");
	var result = longString.replace(re, replaceStr);

In working with a text-based form control or an element’s text node, you can perform the replace() operation on the value of the existing text, and immediately assign the results back to the original container. For example, if a div element contains one text node with scattered place holders in the form of (ph), and the job of the replace() method is to insert a user’s entry from a text box (called myName), the sequence is as follows:

	var searchStr = "\\(ph\\)";
	var re = new RegExp(searchStr, "g");
	var replaceStr = document.getElementById("myName").value;
	var div = document.getElementById("boilerplate");
	div.firstChild.nodeValue = div.firstChild.nodeValue.replace(re, replaceStr);

The double backslashes are needed to escape the escape character before the parentheses characters, which are otherwise meaningful symbols in the regular expression pattern language.

It is also possible to implement a search-and-replace feature without regular expressions, but it’s a cumbersome exercise. The technique involves substantial text parsing using the indexOf() method to find the starting location of text to be replaced. You need to copy preceding text into a variable and strip away that text from the original string; keep repeating this find-strip-accumulate tactic until the entire string is accounted for, and you have inserted the replacement string in place of each found search string. This was necessary in the early browsers, but the far more convenient and efficient regular expressions are implemented in almost all scriptable browsers that are now in use.

Using Special and Escaped Characters

Problem

You want to add low-order ASCII characters (tab, carriage return, etc.) to a string.

Solution

Use the escape sequences shown in Table 1-2 to represent the desired character. For example, to include a quotation mark inside a literal string, use \", as in:

	var msg = "Today's secret word is \"thesaurus.\"";

Discussion

The core JavaScript language includes a feature common to most programming languages that lets you designate special characters. A special character is not one of the plain alphanumeric characters or punctuation symbols, but has a particular meaning with respect to whitespace in text. Common characters used these days include the tab, newline, and carriage return.

A special character begins with a backslash, followed by the character representing the code, such as \t for tab and \n for newline. The backslash is called an escape character, instructing the interpreter to treat the next character as a special character. To include these characters in a string, include the backslash and special character inside the quoted string:

	var confirmString = "You did not enter a response to the last " +
	    "question.\n\nSubmit form anyway?";

If you want to use one of these symbols between variables that contain string values, be sure the special character is quoted in the concatenation statement:

	var myStr = lineText1 + "\n" + lineText2;

Special characters can be used to influence formatting of text in basic dialog boxes (from the alert(), confirm(), and prompt() methods of the window object) and textarea form controls. Table 1-2 shows the recognized escaped characters and their meanings.

Table 1-2. String escape sequences

Escape sequence	Description
`\b`	Backspace
`\t`	Horizontal tab (ASCII 9)
`\n`	Line feed (newline, ASCII 10)
`\v`	Vertical tab
`\f`	Form feed
`\r`	Carriage return (ASCII 13)
`\"`	Double quote
`\'`	Single quote
`\\`	Backslash

Note that to include a visible backslash character in a string, you must use a double backslash because a single one is treated as the invisible escape character. Use the escaped quote symbols to include single or double quotes inside a string.

While you can use an escaped character in tests for the existence of, say, line feed characters in a string, you have to exercise some care when doing so with the content of a textarea element. The problem accrues from a variety of implementations of how user-entered carriage returns are coded in the textarea’s content. IE for Windows and Opera (all platforms) inserts two escaped characters (\r\n in that sequence) whenever a user presses the Enter key to make a newline in a textarea. Other browsers, including Mozilla and Safari, have settled on a single \n character. This variety in character combinations makes searches for user-typed line breaks difficult to perform accurately across browsers and operating systems.

Going the other way—creating a string for script insertion into a textarea value—is easier because modern browsers accommodate all symbols. Therefore, if you assign just \n or the combination \r\n, all browsers interpret any one of them as a carriage return, and convert the escape character(s) to match their internal handling.

Reading and Writing Strings for Cookies

Problem

You want to use cookies to preserve string data from one page visit to the next.

Solution

Use the cookies.js library shown in the Discussion as a utility for saving and retrieving cookies in modern browsers. To set a cookie via the library, invoke the setCookie() function, passing, at a minimum, the cookie’s name and string value as arguments:

	setCookie("userID", document.entryForm.username.value);

To retrieve a cookie’s value, invoke the library’s getCookie() function, as in:

	var user = getCookie("userID");

Discussion

Example 1-1 shows the code for the entire cookies.js library.

Example 1-1. cookies.js library

// utility function to retrieve an expiration date in proper
// format; pass three integer parameters for the number of days, hours,
// and minutes from now you want the cookie to expire (or negative
// values for a past date); all three parameters are required,
// so use zeros where appropriate
function getExpDate(days, hours, minutes) {
    var expDate = new Date();
    if (typeof days == "number" && typeof hours == "number" &&
        typeof minutes == "number") {
        expDate.setDate(expDate.getDate() + parseInt(days));
        expDate.setHours(expDate.getHours() + parseInt(hours));
        expDate.setMinutes(expDate.getMinutes() + parseInt(minutes));
        return expDate.toUTCString();
    }
}

// utility function called by getCookie()
function getCookieVal(offset) {
    var endstr = document.cookie.indexOf (";", offset);
    if (endstr == -1) {
        endstr = document.cookie.length;
    }
    return decodeURI(document.cookie.substring(offset, endstr));
}

// primary function to retrieve cookie by name
function getCookie(name) {
    var arg = name + "=";
    var alen = arg.length;
    var clen = document.cookie.length;
    var i = 0;
    while (i < clen) {
        var j = i + alen;
        if (document.cookie.substring(i, j) == arg) {
            return getCookieVal(j);
        }
        i = document.cookie.indexOf(" ", i) + 1;
        if (i == 0) break;
    }
    return "";
}

// store cookie value with optional details as needed
function setCookie(name, value, expires, path, domain, secure) {
    document.cookie = name + "=" + encodeURI(value) +
        ((expires) ? "; expires=" + expires : "") +
        ((path) ? "; path=" + path : "") +
        ((domain) ? "; domain=" + domain : "") +
        ((secure) ? "; secure" : "");
}

// remove the cookie by setting ancient expiration date
function deleteCookie(name,path,domain) {
    if (getCookie(name)) {
        document.cookie = name + "=" +
            ((path) ? "; path=" + path : "") +
            ((domain) ? "; domain=" + domain : "") +
            "; expires=Thu, 01-Jan-70 00:00:01 GMT";
    }
}

The library begins with a utility function (getExpDate()) that your scripts use to assist in setting an expiration date for the cookie. A second utility function (getCookieVal()) is invoked internally during the reading of a cookie.

Use the getCookie() function in your scripts to read the value of a named cookie previously saved. The name you pass to the function is a string. If no cookie by that name exists in the browser’s cookie filing system, the function returns an empty string.

To save a cookie, invoke the setCookie() function. The first two parameters (the cookie’s name and the value to be preserved on the client) are required. If you intend the cookie to last beyond the user quitting the browser, be sure to set an expiration date as the third parameter. Filter the expiration time period through the getExpDate() function shown earlier so that the third parameter of setCookie() is in the correct format.

One last function, deleteCookie(), lets you delete an existing cookie before its expiration date. The function is hardwired to set the expiration date to the start of the JavaScript date epoch.

Load the library into your page in the head portion of the document:

	<script src="cookies.js"></script>

All cookie values you save must be string values; all cookie values you retrieve are string values. Strings, however, can contain characters that upset their storage and proper retrieval later on. To compensate for this issue, the cookies.js library uses the global encodeURI() and decodeURI() methods to handle conversions. These methods improve on (and supercede) the old escape() and unescape() methods.

A browser cookie is the only way to preserve a string value on the client between visits to your web site. Scripts on your page may read only cookies that were saved from your domain and server. If you have multiple servers in your domain, you can set the fifth parameter of setCookie() to share cookies between servers at the same domain.

Browsers typically limit capacity to 20 name/value pairs of cookies per server; a cookie should be no more than 4,000 characters, but more practically, the value of an individual named cookie should be less than 2,000 characters. In other words, cookies are not meant to act as high-volume data storage facilities on the client. Also, browsers automatically send domain-specific cookie data to the server as part of each page request. Keep the amount of data small to limit the impact on dial-up users.

When you save a cookie, the name/value pair resides in the browser’s memory. The data, if set to expire sometime in the future, is written to the cookie filesystem only when the browser quits. Therefore, don’t be alarmed if you don’t see your latest entry in the cookie file while the browser is still running. Different browsers save their cookies differently (and in different places in each operating system). IE stores each domain’s cookies in its own text file, whereas Mozilla gangs all cookies together in a single text file.

All of this cookie action is made possible through the document.cookie property. The purpose of the cookies.js library is to act as a friendlier interface between your scripts and the document.cookie property, which isn’t as helpful as it could be in extracting cookie information. Although you can save a cookie with several parameters, only the value of a cookie is available for reading—not the expiration date, path, or domain details.

Cookies are commonly used to preserve user preference settings between visits. A script near the top of the page reads the cookie to see if it exists, and, if so, applies settings to various content or layout attributes while the rest of the page loads. Offering Body Text Size Choices to Users shows how this can work to let users select a relative font size and preserve the settings between visits. For example, the function that preserves the user’s font size choice saves the value to a cookie named fontSize, which is set to expire in 180 days if not updated before then:

	setCookie("fontSize", styleID, getExpDate(180, 0, 0));

The next time the user visits, the cookie is read while the page loads:

	var styleCookie = getCookie("fontSize");

With the information from the cookie, the script applies the previously selected style sheet to the page. If the cookie was not previously set, the script assigns a default style sheet to use in the interim.

Just because cookies can store only strings, don’t let that get in the way of preserving information normally stored in arrays or custom objects. See Customizing an Object’s Prototype and Changing select Element Content for ways to convert more complex data types to strings for preservation, and then restore their original form after retrieval from the cookie on the next visit.

Converting Between Unicode Values and String Characters

Problem

You want to obtain the Unicode code number for an alphanumeric character or vice versa.

Solution

To obtain the Unicode value of a character of a string, use the charCodeAt() method of the string value. A single parameter is an integer pointing to the zero-based position of the character within the string:

	var code = myString.charCodeAt(3);

If the string consists of only one character, use the 0 argument to get the code for that one character:

	var oneChar = myString.substring(12, 13);
	var code = oneChar.charCodeAt(0);

The returned value is an integer.

To convert a Unicode code number to a character, use the fromCharCode() method of the static String object:

	var char = String.fromCharCode(66);

Unlike most string methods, this one must be invoked only from the String object and not from a string value.

Discussion

ASCII values and Unicode values are the same for the basic Latin alphanumeric (low-ASCII) values. But even though Unicode encompasses characters from many written languages around the world, do not expect to see characters from other writing systems displayed in alert boxes, text boxes, or rendered pages simply because you know the Unicode values for those characters; the browser and operating system must be equipped for the language encompassed by the characters. If the character sets are not available, the characters generated by such codes will be question marks or other symbols. A typical North American computer won’t know how to produce a Chinese character on the screen unless the target writing system and font sets are installed for the OS and browser.

Encoding and Decoding URL Strings

Problem

You want to convert a string of plain text to a format suitable for use as a URL or URL search string, or vice versa.

Solution

To convert a string consisting of an entire URL to a URL-encoded form, use the encodeURI() global method, passing the string needing conversion as an argument. For example:

	document.myForm.action = encodeURI(myString);

If you are assembling content for values of search string name/value pairs, apply the encodeURIComponent() global method:

	var srchString = "?name=" + encodeURIComponent(myString);

Both methods have complementary partners that perform conversions in the opposite direction:

	decodeURI(encodedURIString)
	decodeURIComponent(encodedURIComponentString)

In all cases, the original string is not altered when passed as an argument to these methods. Capture the results from the value returned by the methods.

Discussion

Although the escape() and unescape() methods have been available since the first scriptable browsers, they have been deprecated in the formal language specification (ECMA-262) in favor of a set of new methods. The new methods are available in IE 5.5 or later and other modern browsers.

These new encoding methods work by slightly different rules than the old escape() and unescape() methods. As a result, you must encode and decode using the same pairs of methods at all times. In other words, if a URL is encoded with encodeURI(), the resulting string can be decoded only with decodeURI().

The method names use “URI” (Universal Resource Identifier). A URI is an all-encompassing reference to obtain any network-accessible item (document, object, etc.). A URL (Universal Resource Locator) is a type of URI that includes both a network location for the item, as well as an indication of the access mechanism (e.g., http:). That the method names adopt the more general URI nomenclature is not unusual. For most client-side web authoring in HTML, CSS, and JavaScript, the terms URI and URL are interchangeable.

The differences between encodeURI() and encodeURIComponent() are defined by the range of characters that the methods convert to the URI-friendly form of a percent sign (%) followed by the hexadecimal Unicode value of the symbol (e.g., a space becomes %20). Regular alphanumeric characters are not converted, but when it comes to punctuation and special characters, the two methods diverge in their coverage. The encodeURI() method converts the following symbols from the characters in the ASCII range of 32 through 126:

	space "  %  <  >  [  \  ]  ^  `  {  |  }

For example, if you are assembling a URL with a simple search string on the end, pass the URL through encodeURI() before navigating to the URL to make sure the URL is well formed:

	var newURL = "http://www.megacorp.com?prod=Gizmo Deluxe";
	location.href = encodeURI(newURL);
	// encoded URL is: http://www.megacorp.com?prod=Gizmo%20Deluxe

In contrast, the encodeURIComponent() method encodes far more characters that might find their way into value strings of forms or script-generated search strings. Encodable characters unique to encodeURIComponent() are shown in bold:

	space " # $ % &  +  ,  /  :  ; < = > ? @ [  \  ]  ^  `  {  |  }

You may recognize some of the encodeURIComponent() values as those frequently appearing within complexURLs, especially the ?, &, and = symbols. For this reason, you want to apply the encodeURIComponent() only to values of name/value pairs before those values are inserted or appended to a URL. But then it gets dangerous to pass the composite URL through encodeURI() again because the % symbols of the encoded characters will, themselves, be encoded, probably causing problems on the server end when parsing the input from the client.

If, for backward-compatibility reasons, you need to use the escape() method, be aware that this method uses a heavy hand in choosing characters to encode. Encodable characters for the escape() method are as follows:

	space !  \  "  #  $  %  &  '  ()  ,  :  ;  <  =  >  ?  @  [  \  ]  ^  `  {  |  }  ~

The @ symbol, however, is not converted in Internet Explorer browsers via the escape() method.

You can see now why it is important to use the matching decoding method if you need to return one of your encoded strings back into plain language. If the encoded string you are trying to decode comes from an external source (e.g., part of a URL search string returned by the server), try to use the decodeURIComponent() method on only those parts of the search string that are the value portion of a name/value pair. That’s typically where the heart of your passed information is, as well as where you want to obtain the most correct conversion.

Encoding and Decoding Base64 Strings

Problem

You want to convert a string to or from Base64 encoding.

Solution

Use the functions of the base64.js library shown in the Discussion. Syntaxfor invoking the two functions is straightforward. To encode a string, invoke:

	var encodedString = base64Encode("stringToEncode");

To decode a string, invoke:

	var plainString = base64Decode("encodedString");

Discussion

shows the entire base64.js library.

Example 1-2. base64.js library

// Global lookup arrays for base64 conversions
var enc64List, dec64List;

// Load the lookup arrays once
function initBase64() {
    enc64List = new Array();
    dec64List = new Array();
    var i;
    for (i = 0; i < 26; i++) {
        enc64List[enc64List.length] = String.fromCharCode(65 + i);
    }
    for (i = 0; i < 26; i++) {
        enc64List[enc64List.length] = String.fromCharCode(97 + i);
    }
    for (i = 0; i < 10; i++) {
        enc64List[enc64List.length] = String.fromCharCode(48 + i);
    }
    enc64List[enc64List.length] = "+";
    enc64List[enc64List.length] = "/";
    for (i = 0; i < 128; i++) {
        dec64List[dec64List.length] = -1;
    }
    for (i = 0; i < 64; i++) {
        dec64List[enc64List[i].charCodeAt(0)] = i;
    }
}

// Encode a string
function base64Encode(str) {
    var c, d, e, end = 0;
    var u, v, w, x;
    var ptr = -1;
    var input = str.split("");
    var output = "";
    while(end == 0) {
        c = (typeof input[++ptr] != "undefined") ? input[ptr].charCodeAt(0) :
            ((end = 1) ? 0 : 0);
        d = (typeof input[++ptr] != "undefined") ? input[ptr].charCodeAt(0) :
            ((end += 1) ? 0 : 0);
        e = (typeof input[++ptr] != "undefined") ? input[ptr].charCodeAt(0) :
            ((end += 1) ? 0 : 0);
        u = enc64List[c >> 2];
        v = enc64List[(0x00000003 & c) << 4 | d >> 4];
        w = enc64List[(0x0000000F & d) << 2 | e >> 6];
        x = enc64List[e & 0x0000003F];

        // handle padding to even out unevenly divisible string lengths
        if (end >= 1) {x = "=";}
        if (end == 2) {w = "=";}
        if (end < 3) {output += u + v + w + x;}
    }
    // format for 76-character line lengths per RFC
    var formattedOutput = "";
    var lineLength = 76;
    while (output.length > lineLength) {
        formattedOutput += output.substring(0, lineLength) + "\n";
        output = output.substring(lineLength);
    }
    formattedOutput += output;
    return formattedOutput;
}

// Decode a string
function base64Decode(str) {
    var c=0, d=0, e=0, f=0, i=0, n=0;
    var input = str.split("");
    var output = "";
    var ptr = 0;
    do {
        f = input[ptr++].charCodeAt(0);
        i = dec64List[f];
        if ( f >= 0 && f < 128 && i != -1 ) {
            if ( n % 4 == 0 ) {
                c = i << 2;
            } else if ( n % 4 == 1 ) {
                c = c | ( i >> 4 );
                d = ( i & 0x0000000F ) << 4;
            } else if ( n % 4 == 2 ) {
                d = d | ( i >> 2 );
                e = ( i & 0x00000003 ) << 6;
            } else {
                e = e | i;
            }
            n++;
            if ( n % 4 == 0 ) {
                output += String.fromCharCode(c) +
                          String.fromCharCode(d) +
                          String.fromCharCode(e);
            }
        }
    } while (typeof input[ptr] != "undefined");
    output += (n % 4 == 3) ? String.fromCharCode(c) + String.fromCharCode(d) :
              ((n % 4 == 2) ? String.fromCharCode(c) : "");
    return output;
}

// Self-initialize the global variables
initBase64();

The library begins with two global declarations and an initialization function that creates lookup tables for the character conversions. At the end of the library is a statement that invokes the initialization function.

Scripts may call the base64Encode() function directly to convert a standard string to a Base64-encoded string. The value of the original string is not changed, but the function returns an encoded copy. To convert an encoded string to a standard string, use the base64Decode() function, passing the encoded string as an argument.

All Mozilla-based browsers include global methods that perform the same conversions shown at length in the solution. The atob() method converts a Base64-encoded string to a plain string; the btoa() method converts a plain string to a Base64-encoded string. These methods are not part of the ECMAScript standard used as the foundation for these browser versions, so it’s unclear when or if they will find their way into other browsers.

Frankly, there hasn’t been a big need for Base64 encoding in most scripted web pages, but that’s perhaps because the facilities weren’t readily available. A Base64-encoded string contains a very small character set: a–z, A–Z, 0–9, +, /, and =. This low common denominator scheme allows data of any type to be conveyed by virtually any Internet protocol. Binary attachments to your email are encoded as Base64 strings for their journey en route. Your email client decodes the simple string and generates the image, document, or executable file that arrives with the message. You may find additional ways to apply Base64-encoded data in your pages and scripts. To learn more about Base64 encoding, visit http://www.ietf.org/rfc/rfc2045.txt.

Chapter 1. Strings

Introduction

Creating a String

Regular Expressions

Concatenating (Joining) Strings

Problem

Solution

Discussion

See Also

Improving String Handling Performance

Problem

Solution

Discussion

Accessing Substrings

Problem

Solution

Discussion

See Also

Changing String Case

Problem

Solution

Discussion

See Also

Testing Equality of Two Strings

Problem

Solution

Discussion

See Also

Testing String Containment Without Regular Expressions

Problem

Solution

Discussion

See Also

Testing String Containment with Regular Expressions

Problem

Solution

Discussion

See Also

Searching and Replacing Substrings

Problem

Solution

Discussion

See Also

Using Special and Escaped Characters

Problem

Solution

Discussion

See Also

Reading and Writing Strings for Cookies

Problem

Solution

Discussion

See Also

Converting Between Unicode Values and String Characters

Problem

Solution

Discussion

See Also

Encoding and Decoding URL Strings

Problem

Solution

Discussion

See Also

Encoding and Decoding Base64 Strings

Problem

Solution

Discussion

See Also

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly