Strings

We’ll start by taking a closer look at the Java String class (or, more specifically, java.lang.String). Because working with Strings is so fundamental, it’s important to understand how they are implemented and what you can do with them. A String object encapsulates a sequence of Unicode characters. Internally, these characters are stored in a regular Java array, but the String object guards this array jealously and gives you access to it only through its own API. This is to support the idea that Strings are immutable; once you create a String object, you can’t change its value. Lots of operations on a String object appear to change the characters or length of a string, but what they really do is return a new String object that copies or internally references the needed characters of the original. Java implementations make an effort to consolidate identical strings used in the same class into a shared-string pool and to share parts of Strings where possible.

The original motivation for all of this was performance. Immutable Strings can save memory and be optimized for speed by the Java VM. The flip side is that a programmer should have a basic understanding of the String class in order to avoid creating an excessive number of String objects in places where performance is an issue. That was especially true in the past, when VMs were slow and handled memory poorly. Nowadays, string usage is not usually an issue in the overall performance of a real application.[29]

Constructing Strings

Literal strings, defined in your source code, are declared with double quotes and can be assigned to a String variable:

    String quote = "To be or not to be";

Java automatically converts the literal string into a String object and assigns it to the variable.

Strings keep track of their own length, so String objects in Java don’t require special terminators. You can get the length of a String with the length() method. You can also test for a zero length string by using isEmpty():

    int length = quote.length();
    boolean empty = quote.isEmpty();

Strings can take advantage of the only overloaded operator in Java, the + operator, for string concatenation. The following code produces equivalent strings:

    String name = "John " + "Smith";
    String name = "John ".concat("Smith");

Literal strings can’t span lines in Java source files, but we can concatenate lines to produce the same effect:

    String poem =
        "'Twas brillig, and the slithy toves\n" +
        "   Did gyre and gimble in the wabe:\n" +
        "All mimsy were the borogoves,\n" +
        "   And the mome raths outgrabe.\n";

Embedding lengthy text in source code is not normally something you want to do. In this and the following chapter, we’ll talk about ways to load Strings from files, special packages called resource bundles, and URLs. Technologies like Java Server Pages and template engines also provide a way to factor out large amounts of text from your code. For example, in Chapter 14, we’ll see how to load our poem from a web server by opening a URL like this:

    InputStream poem = new URL(
        "http://myserver/~dodgson/jabberwocky.txt").openStream();

In addition to making strings from literal expressions, you can construct a String directly from an array of characters:

    char [] data = new char [] { 'L', 'e', 'm', 'm', 'i', 'n', 'g' };
    String lemming = new String( data );

You can also construct a String from an array of bytes:

    byte [] data = new byte [] { (byte)97, (byte)98, (byte)99 };
    String abc = new String(data, "ISO8859_1");

In this case, the second argument to the String constructor is the name of a character-encoding scheme. The String constructor uses it to convert the raw bytes in the specified encoding to the internally used standard 2-byte Unicode characters. If you don’t specify a character encoding, the default encoding scheme on your system is used. We’ll discuss character encodings more when we talk about the Charset class, IO, in Chapter 12.[30]

Conversely, the charAt() method of the String class lets you access the characters of a String in an array-like fashion:

    String s = "Newton";
    for ( int i = 0; i < s.length(); i++ )
        System.out.println( s.charAt( i ) );

This code prints the characters of the string one at a time. Alternately, we can get the characters all at once with toCharArray(). Here’s a way to save typing a bunch of single quotes and get an array holding the alphabet:

    char [] abcs = "abcdefghijklmnopqrstuvwxyz".toCharArray();

The notion that a String is a sequence of characters is also codified by the String class implementing the interface java.lang.CharSequence, which prescribes the methods length() and charAt() as well as a way to get a subset of the characters.

Strings from Things

Objects and primitive types in Java can be turned into a default textual representation as a String. For primitive types like numbers, the string should be fairly obvious; for object types, it is under the control of the object itself. We can get the string representation of an item with the static String.valueOf() method. Various overloaded versions of this method accept each of the primitive types:

    String one = String.valueOf( 1 ); // integer, "1"
    String two = String.valueOf( 2.384f );  // float, "2.384"
    String notTrue = String.valueOf( false ); // boolean, "false"

All objects in Java have a toString() method that is inherited from the Object class. For many objects, this method returns a useful result that displays the contents of the object. For example, a java.util.Date object’s toString() method returns the date it represents formatted as a string. For objects that do not provide a representation, the string result is just a unique identifier that can be used for debugging. The String.valueOf() method, when called for an object, invokes the object’s toString() method and returns the result. The only real difference in using this method is that if you pass it a null object reference, it returns the String “null” for you, instead of producing a NullPointerException:

    Date date = new Date();
    // Equivalent, e.g., "Fri Dec 19 05:45:34 CST 1969"
    String d1 = String.valueOf( date );
    String d2 = date.toString();

    date = null;
    d1 = String.valueOf( date );  // "null"
    d2 = date.toString();  // NullPointerException!

String concatenation uses the valueOf() method internally, so if you “add” an object or primitive using the plus operator (+), you get a String:

    String today = "Today's date is :" + date;

You’ll sometimes see people use the empty string and the plus operator (+) as shorthand to get the string value of an object. For example:

    String two = "" + 2.384f;
    String today = "" + new Date();

Comparing Strings

The standard equals() method can compare strings for equality; they contain exactly the same characters in the same order. You can use a different method, equalsIgnoreCase(), to check the equivalence of strings in a case-insensitive way:

    String one = "FOO";
    String two = "foo";

    one.equals( two );             // false
    one.equalsIgnoreCase( two );   // true

A common mistake for novice programmers in Java is to compare strings with the == operator when they intend to use the equals() method. Remember that strings are objects in Java, and == tests for object identity; that is, whether the two arguments being tested are the same object. In Java, it’s easy to make two strings that have the same characters but are not the same string object. For example:

    String foo1 = "foo";
    String foo2 = String.valueOf( new char [] { 'f', 'o', 'o' }  );

    foo1 == foo2         // false!
    foo1.equals( foo2 )  // true

This mistake is particularly dangerous because it often works for the common case in which you are comparing literal strings (strings declared with double quotes right in the code). The reason for this is that Java tries to manage strings efficiently by combining them. At compile time, Java finds all the identical strings within a given class and makes only one object for them. This is safe because strings are immutable and cannot change. You can coalesce strings yourself in this way at runtime using the String intern() method. Interning a string returns an equivalent string reference that is unique across the VM.

The compareTo() method compares the lexical value of the String to another String, determining whether it sorts alphabetically earlier than, the same as, or later than the target string. It returns an integer that is less than, equal to, or greater than zero:

    String abc = "abc";
    String def = "def";
    String num = "123";

    if ( abc.compareTo( def ) < 0 )         // true
    if ( abc.compareTo( abc ) == 0 )        // true
    if ( abc.compareTo( num ) > 0 )         // true

The compareTo() method compares strings strictly by their characters’ positions in the Unicode specification. This works for simple text but does not handle all language variations well. The Collator class, discussed next, can be used for more sophisticated comparisons.

The Collator class

The java.text package provides a sophisticated set of classes for comparing strings in specific languages. German, for example, has vowels with umlauts and another character that resembles the Greek letter beta and represents a double “s.” How should we sort these? Although the rules for sorting such characters are precisely defined, you can’t assume that the lexical comparison we used earlier has the correct meaning for languages other than English. Fortunately, the Collator class takes care of these complex sorting problems.

In the following example, we use a Collator designed to compare German strings. You can obtain a default Collator by calling the Collator.getInstance() method with no arguments. Once you have an appropriate Collator instance, you can use its compare() method, which returns values just like String’s compareTo() method. The following code creates two strings for the German translations of “fun” and “later,” using Unicode constants for these two special characters. It then compares them, using a Collator for the German locale. (Locales help you deal with issues relevant to particular languages and cultures; we’ll talk about them in detail later in this chapter.) The result in this case is that “fun” (Spaß) sorts before “later” (später):

    String fun = "Spa\u00df";
    String later = "sp\u00e4ter";

    Collator german = Collator.getInstance(Locale.GERMAN);
    if (german.compare(fun, later) < 0) // true

Using collators is essential if you’re working with languages other than English. In Spanish, for example, “ll” and “ch” are treated as unique characters and alphabetized separately. A collator handles cases like these automatically.

Searching

The String class provides several simple methods for finding fixed substrings within a string. The startsWith() and endsWith() methods compare an argument string with the beginning and end of the String, respectively:

    String url = "http://foo.bar.com/";
    if ( url.startsWith("http:") )  // true

The indexOf() method searches for the first occurrence of a character or substring and returns the starting character position, or -1 if the substring is not found:

    String abcs = "abcdefghijklmnopqrstuvwxyz";
    int i = abcs.indexOf( 'p' );     // 15
    int i = abcs.indexOf( "def" );   // 3
    int I = abcs.indexOf( "Fang" );  // -1

Similarly, lastIndexOf() searches backward through the string for the last occurrence of a character or substring.

The contains() method handles the very common task of checking to see whether a given substring is contained in the target string:

    String log = "There is an emergency in sector 7!";
    if  ( log.contains("emergency") ) pageSomeone();

    // equivalent to
    if ( log.indexOf("emergency") != -1 ) ...

For more complex searching, you can use the Regular Expression API, which allows you to look for and parse complex patterns. We’ll talk about regular expressions later in this chapter.

Editing

A number of methods operate on the String and return a new String as a result. While this is useful, you should be aware that creating lots of strings in this manner can affect performance. If you need to modify a string often or build a complex string from components, you should use the StringBuilder class, as we’ll discuss shortly.

trim() is a useful method that removes leading and trailing whitespace (i.e., carriage return, newline, and tab) from the String:

    String str = "   abc   ";
    str = str.trim();  // "abc"

In this example, we threw away the original String (with excess whitespace), and it will be garbage-collected.

The toUpperCase() and toLowerCase() methods return a new String of the appropriate case:

    String down = "FOO".toLowerCase();      // "foo"
    String up   = down.toUpperCase();       // "FOO"

substring() returns a specified range of characters. The starting index is inclusive; the ending is exclusive:

    String abcs = "abcdefghijklmnopqrstuvwxyz";
    String cde = abcs.substring( 2, 5 ); // "cde"

The replace() method provides simple, literal string substitution. One or more occurrences of the target string are replaced with the replacement string, moving from beginning to end. For example:

    String message = "Hello NAME, how are you?".replace( "NAME", "Penny" );
    // "Hello Penny, how are you?"
    String xy = "xxooxxxoo".replace( "xx", "X" );
    // "XooXxoo"

The String class also has two methods that allow you to do more complex pattern substitution: replaceAll() and replaceFirst(). Unlike the simple replace() method, these methods use regular expressions (a special syntax) to describe the replacement pattern, which we’ll cover later in this chapter.

String Method Summary

Table 10-2 summarizes the methods provided by the String class.

Table 10-2. String methods

Method

Functionality

charAt()

Gets a particular character in the string

compareTo()

Compares the string with another string

concat()

Concatenates the string with another string

contains()

Checks whether the string contains another string

copyValueOf()

Returns a string equivalent to the specified character array

endsWith()

Checks whether the string ends with a specified suffix

equals()

Compares the string with another string

equalsIgnoreCase()

Compares the string with another string, ignoring case

getBytes()

Copies characters from the string into a byte array

getChars()

Copies characters from the string into a character array

hashCode()

Returns a hashcode for the string

indexOf()

Searches for the first occurrence of a character or substring in the string

intern()

Fetches a unique instance of the string from a global shared-string pool

isEmpty()

Returns true if the string is zero length

lastIndexOf()

Searches for the last occurrence of a character or substring in a string

length()

Returns the length of the string

matches()

Determines if the whole string matches a regular expression pattern

regionMatches()

Checks whether a region of the string matches the specified region of another string

replace()

Replaces all occurrences of a character in the string with another character

replaceAll()

Replaces all occurrences of a regular expression pattern with a pattern

replaceFirst()

Replaces the first occurrence of a regular expression pattern with a pattern

split()

Splits the string into an array of strings using a regular expression pattern as a delimiter

startsWith()

Checks whether the string starts with a specified prefix

substring()

Returns a substring from the string

toCharArray()

Returns the array of characters from the string

toLowerCase()

Converts the string to lowercase

toString()

Returns the string value of an object

toUpperCase()

Converts the string to uppercase

trim()

Removes leading and trailing whitespace from the string

valueOf()

Returns a string representation of a value

StringBuilder and StringBuffer

In contrast to the immutable string, the java.lang.StringBuilder class is a modifiable and expandable buffer for characters. You can use it to create a big string efficiently. StringBuilder and StringBuffer are twins; they have exactly the same API. StringBuilder was added in Java 5.0 as a drop-in, unsynchronized replacement for StringBuffer. We’ll come back to that in a bit.

First, let’s look at some examples of String construction:

    // Could be better
    String ball = "Hello";
    ball = ball + " there.";
    ball = ball + " How are you?";

This example creates an unnecessary String object each time we use the concatenation operator (+). Whether this is significant depends on how often this code is run and how big the string actually gets. Here’s a more extreme example:

    // Bad use of + ...
    while( (line = readLine()) != EOF )
        text += line;

This example repeatedly produces new String objects. The character array must be copied over and over, which can adversely affect performance. The solution is to use a StringBuilder object and its append() method:

    StringBuilder sb = new StringBuilder("Hello");
    sb.append(" there.");
    sb.append(" How are you?");

    StringBuilder text = new StringBuilder();
    while( (line = readline()) != EOF )
        text.append( line );

Here, the StringBuilder efficiently handles expanding the array as necessary. We can get a String back from the StringBuilder with its toString() method:

    String message = sb.toString();

You can also retrieve part of a StringBuilder as a String by using one of the substring() methods.

You might be interested to know that when you write a long expression using string concatenation, the compiler generates code that uses a StringBuilder behind the scenes:

    String foo = "To " + "be " + "or";

It is really equivalent to:

    String foo = new
      StringBuilder().append("To ").append("be ").append("or").toString();

In this case, the compiler knows what you are trying to do and takes care of it for you.

The StringBuilder class provides a number of overloaded append() methods for adding any type of data to the buffer. StringBuilder also provides a number of overloaded insert() methods for inserting various types of data at a particular location in the string buffer. Furthermore, you can remove a single character or a range of characters with the deleteCharAt() and delete() methods. Finally, you can replace part of the StringBuilder with the contents of a String using the replace() method. The String and StringBuilder classes cooperate so that, in some cases, no copy of the data has to be made; the string data is shared between the objects.

You should use a StringBuilder instead of a String any time you need to keep adding characters to a string; it’s designed to handle such modifications efficiently. You can convert the StringBuilder to a String when you need it, or simply concatenate or print it anywhere you’d use a String.

As we said earlier, StringBuilder was added in Java 5.0 as a replacement for StringBuffer. The only real difference between the two is that the methods of StringBuffer are synchronized and the methods of StringBuilder are not. This means that if you wish to use StringBuilder from multiple threads concurrently, you must synchronize the access yourself (which is easily accomplished). The reason for the change is that most simple usage does not require any synchronization and shouldn’t have to pay the associated penalty (slight as it is).



[29] When in doubt, measure it! If your String-manipulating code is clean and easy to understand, don’t rewrite it until someone proves to you that it is too slow. Chances are that they will be wrong. And don’t be fooled by relative comparisons. A millisecond is 1,000 times slower than a microsecond, but it still may be negligible to your application’s overall performance.

[30] On Mac OS X, the default encoding is MacRoman. In Windows, it is CP1252. On some Unix platforms it is ISO8859_1.

Get Learning Java, 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.