We’ll start by taking a closer look at the Java String
class (or, more specifically, java.lang.String
). Because working with String
s is so fundamental, it’s important to
understand how they are implemented and what you can do with them. A
String
object encapsulates a sequence
of Unicode characters. Internally, these characters are stored in a
regular Java array, but the String
object guards this array jealously and gives you access to it only through
its own API. This is to support the idea that String
s are immutable; once you create a String
object, you can’t change its value. Lots
of operations on a String
object appear
to change the characters or length of a string, but what they really do is
return a new String
object that copies
or internally references the needed characters of the original. Java
implementations make an effort to consolidate identical strings used in
the same class into a shared-string pool and to share parts of String
s where possible.
The original motivation for all of this was performance. Immutable
String
s can save memory and be
optimized for speed by the Java VM. The flip side is that a programmer
should have a basic understanding of the String
class in order to avoid creating an
excessive number of String
objects in
places where performance is an issue. That was especially true in the
past, when VMs were slow and handled memory poorly. Nowadays, string usage
is not usually an issue in the overall performance of a real
application.[29]
Literal strings, defined in your source code, are declared
with double quotes and can be assigned to a String
variable:
String
quote
=
"To be or not to be"
;
Java automatically converts the literal string into a String
object and assigns it to the
variable.
String
s keep track of their own
length, so String
objects in Java
don’t require special terminators. You can get the length of a String
with the length()
method. You
can also test for a zero length string by using isEmpty()
:
int
length
=
quote
.
length
();
boolean
empty
=
quote
.
isEmpty
();
String
s can take advantage of
the only overloaded operator in Java, the +
operator, for string
concatenation. The following code produces equivalent strings:
String
name
=
"John "
+
"Smith"
;
String
name
=
"John "
.
concat
(
"Smith"
);
Literal strings can’t span lines in Java source files, but we can concatenate lines to produce the same effect:
String
poem
=
"'Twas brillig, and the slithy toves\n"
+
" Did gyre and gimble in the wabe:\n"
+
"All mimsy were the borogoves,\n"
+
" And the mome raths outgrabe.\n"
;
Embedding lengthy text in source code is not normally something
you want to do. In this and the following chapter, we’ll talk about ways
to load String
s from files, special
packages called resource bundles, and URLs. Technologies like Java
Server Pages and template engines also provide a way to factor out large
amounts of text from your code. For example, in Chapter 14, we’ll see how to load our poem from a
web server by opening a URL like this:
InputStream
poem
=
new
URL
(
"http://myserver/~dodgson/jabberwocky.txt"
).
openStream
();
In addition to making strings from literal expressions, you can
construct a String
directly from an
array of characters:
char
[]
data
=
new
char
[]
{
'L'
,
'e'
,
'm'
,
'm'
,
'i'
,
'n'
,
'g'
};
String
lemming
=
new
String
(
data
);
You can also construct a String
from an array of bytes:
byte
[]
data
=
new
byte
[]
{
(
byte
)
97
,
(
byte
)
98
,
(
byte
)
99
};
String
abc
=
new
String
(
data
,
"ISO8859_1"
);
In this case, the second argument to the String
constructor is the name of a
character-encoding scheme. The String
constructor uses it to convert the raw bytes in the specified encoding
to the internally used standard 2-byte Unicode characters. If you don’t
specify a character encoding, the default encoding scheme on your system
is used. We’ll discuss character encodings more when we talk about the
Charset
class, IO, in Chapter 12.[30]
Conversely, the charAt()
method of the
String
class lets you access the
characters of a String
in an
array-like fashion:
String
s
=
"Newton"
;
for
(
int
i
=
0
;
i
<
s
.
length
();
i
++
)
System
.
out
.
println
(
s
.
charAt
(
i
)
);
This code prints the characters of the string one at a time.
Alternately, we can get the characters all at once with toCharArray()
. Here’s a
way to save typing a bunch of single quotes and get an array holding the
alphabet:
char
[]
abcs
=
"abcdefghijklmnopqrstuvwxyz"
.
toCharArray
();
The notion that a String
is a
sequence of characters is also codified by the String
class implementing the interface
java.lang.CharSequence
, which
prescribes the methods length()
and
charAt()
as well as a way to get a
subset of the characters.
Objects and primitive types in Java can be turned into a
default textual representation as a String
. For primitive types like numbers, the
string should be fairly obvious; for object types, it is under the
control of the object itself. We can get the string representation of an item with the static
String.valueOf()
method. Various overloaded versions of this method accept each of the
primitive types:
String
one
=
String
.
valueOf
(
1
);
// integer, "1"
String
two
=
String
.
valueOf
(
2.384f
);
// float, "2.384"
String
notTrue
=
String
.
valueOf
(
false
);
// boolean, "false"
All objects in Java have a toString()
method that
is inherited from the Object
class.
For many objects, this method returns a useful result that displays the
contents of the object. For example, a java
.util
.Date
object’s toString()
method returns
the date it represents formatted as a string. For objects that do not
provide a representation, the string result is just a unique identifier
that can be used for debugging. The String.valueOf()
method, when called for an
object, invokes the object’s toString()
method and returns the result. The
only real difference in using this method is that if you pass it a null
object reference, it returns the String
“null” for you, instead of producing a
NullPointerException
:
Date
date
=
new
Date
();
// Equivalent, e.g., "Fri Dec 19 05:45:34 CST 1969"
String
d1
=
String
.
valueOf
(
date
);
String
d2
=
date
.
toString
();
date
=
null
;
d1
=
String
.
valueOf
(
date
);
// "null"
d2
=
date
.
toString
();
// NullPointerException!
String concatenation uses the valueOf()
method internally, so if you “add”
an object or primitive using the plus operator (+), you get a String
:
String
today
=
"Today's date is :"
+
date
;
You’ll sometimes see people use the empty string and the plus operator (+) as shorthand to get the string value of an object. For example:
String
two
=
""
+
2.384f
;
String
today
=
""
+
new
Date
();
The standard equals()
method can
compare strings for equality; they contain exactly
the same characters in the same order. You can use a different method,
equalsIgnoreCase()
, to
check the equivalence of strings in a case-insensitive way:
String
one
=
"FOO"
;
String
two
=
"foo"
;
one
.
equals
(
two
);
// false
one
.
equalsIgnoreCase
(
two
);
// true
A common mistake for novice programmers in Java is to compare
strings with the ==
operator when they
intend to use the equals()
method.
Remember that strings are objects in Java, and ==
tests for object
identity; that is, whether the two arguments being
tested are the same object. In Java, it’s easy to make two strings that
have the same characters but are not the same string object. For
example:
String
foo1
=
"foo"
;
String
foo2
=
String
.
valueOf
(
new
char
[]
{
'f'
,
'o'
,
'o'
}
);
foo1
==
foo2
// false!
foo1
.
equals
(
foo2
)
// true
This mistake is particularly dangerous because it often works for
the common case in which you are comparing literal strings (strings
declared with double quotes right in the code). The reason for this is
that Java tries to manage strings efficiently by combining them. At
compile time, Java finds all the identical strings within a given class
and makes only one object for them. This is safe because strings are
immutable and cannot change. You can coalesce strings yourself in this
way at runtime using the String
intern()
method. Interning a string returns an equivalent
string reference that is unique across the VM.
The compareTo()
method
compares the lexical value of the String
to another String
, determining whether it sorts
alphabetically earlier than, the same as, or later than the target
string. It returns an integer that is less than, equal to, or greater
than zero:
String
abc
=
"abc"
;
String
def
=
"def"
;
String
num
=
"123"
;
if
(
abc
.
compareTo
(
def
)
<
0
)
// true
if
(
abc
.
compareTo
(
abc
)
==
0
)
// true
if
(
abc
.
compareTo
(
num
)
>
0
)
// true
The compareTo()
method compares
strings strictly by their characters’ positions in the Unicode
specification. This works for simple text but does not handle all
language variations well. The Collator
class, discussed next, can be used
for more sophisticated comparisons.
The java.text
package
provides a sophisticated set of classes for comparing strings in
specific languages. German, for example, has vowels with umlauts and
another character that resembles the Greek letter beta and represents
a double “s.” How should we sort these? Although the rules for sorting
such characters are precisely defined, you can’t assume that the
lexical comparison we used earlier has the correct meaning for
languages other than English. Fortunately, the Collator
class takes care of these complex
sorting problems.
In the following example, we use a Collator
designed to compare German strings.
You can obtain a default Collator
by calling the Collator.getInstance()
method with no
arguments. Once you have an appropriate Collator
instance, you can use its
compare()
method,
which returns values just like String
’s compareTo()
method. The following code
creates two strings for the German translations of “fun” and “later,”
using Unicode constants for these two special characters. It then
compares them, using a Collator
for
the German locale. (Locale
s help
you deal with issues relevant to particular languages and cultures;
we’ll talk about them in detail later in this chapter.) The result in
this case is that “fun” (Spaß) sorts before “later” (später):
String
fun
=
"Spa\u00df"
;
String
later
=
"sp\u00e4ter"
;
Collator
german
=
Collator
.
getInstance
(
Locale
.
GERMAN
);
if
(
german
.
compare
(
fun
,
later
)
<
0
)
// true
Using collators is essential if you’re working with languages other than English. In Spanish, for example, “ll” and “ch” are treated as unique characters and alphabetized separately. A collator handles cases like these automatically.
The String
class
provides several simple methods for finding fixed substrings within a
string. The startsWith()
and
endsWith()
methods
compare an argument string with the beginning and end of the String
, respectively:
String
url
=
"http://foo.bar.com/"
;
if
(
url
.
startsWith
(
"http:"
)
)
// true
The indexOf()
method
searches for the first occurrence of a character or substring and
returns the starting character position, or -1
if the substring is not found:
String
abcs
=
"abcdefghijklmnopqrstuvwxyz"
;
int
i
=
abcs
.
indexOf
(
'p'
);
// 15
int
i
=
abcs
.
indexOf
(
"def"
);
// 3
int
I
=
abcs
.
indexOf
(
"Fang"
);
// -1
Similarly, lastIndexOf()
searches backward through the string for the last occurrence of a
character or substring.
The contains()
method
handles the very common task of checking to see whether a given
substring is contained in the target string:
String
log
=
"There is an emergency in sector 7!"
;
if
(
log
.
contains
(
"emergency"
)
)
pageSomeone
();
// equivalent to
if
(
log
.
indexOf
(
"emergency"
)
!=
-
1
)
...
For more complex searching, you can use the Regular Expression API, which allows you to look for and parse complex patterns. We’ll talk about regular expressions later in this chapter.
A number of methods operate on the String
and return a new String
as a result. While this is useful, you
should be aware that creating lots of strings in this manner can affect
performance. If you need to modify a string often or build a complex
string from components, you should use the StringBuilder
class, as we’ll discuss
shortly.
trim()
is a useful
method that removes leading and trailing whitespace (i.e., carriage
return, newline, and tab) from the String
:
String
str
=
" abc "
;
str
=
str
.
trim
();
// "abc"
In this example, we threw away the original String
(with excess whitespace), and it will
be garbage-collected.
The toUpperCase()
and
toLowerCase()
methods
return a new String
of the
appropriate case:
String
down
=
"FOO"
.
toLowerCase
();
// "foo"
String
up
=
down
.
toUpperCase
();
// "FOO"
substring()
returns a
specified range of characters. The starting index is
inclusive; the ending is
exclusive:
String
abcs
=
"abcdefghijklmnopqrstuvwxyz"
;
String
cde
=
abcs
.
substring
(
2
,
5
);
// "cde"
The replace()
method
provides simple, literal string substitution. One or more occurrences of
the target string are replaced with the replacement string, moving from
beginning to end. For example:
String
message
=
"Hello NAME, how are you?"
.
replace
(
"NAME"
,
"Penny"
);
// "Hello Penny, how are you?"
String
xy
=
"xxooxxxoo"
.
replace
(
"xx"
,
"X"
);
// "XooXxoo"
The String
class also has two
methods that allow you to do more complex pattern substitution:
replaceAll()
and
replaceFirst()
. Unlike
the simple replace()
method, these
methods use regular expressions (a special syntax) to describe the
replacement pattern, which we’ll cover later in this chapter.
Table 10-2 summarizes
the methods provided by the String
class.
Table 10-2. String methods
Method | Functionality |
---|---|
Gets a particular character in the string | |
Compares the string with another string | |
Concatenates the string with another string | |
Checks whether the string contains another string | |
Returns a string equivalent to the specified character array | |
Checks whether the string ends with a specified suffix | |
Compares the string with another string | |
Compares the string with another string, ignoring case | |
Copies characters from the string into a byte array | |
Copies characters from the string into a character array | |
Returns a hashcode for the string | |
Searches for the first occurrence of a character or substring in the string | |
Fetches a unique instance of the string from a global shared-string pool | |
Returns true if the string is zero length | |
Searches for the last occurrence of a character or substring in a string | |
Returns the length of the string | |
Determines if the whole string matches a regular expression pattern | |
Checks whether a region of the string matches the specified region of another string | |
Replaces all occurrences of a character in the string with another character | |
Replaces all occurrences of a regular expression pattern with a pattern | |
Replaces the first occurrence of a regular expression pattern with a pattern | |
Splits the string into an array of strings using a regular expression pattern as a delimiter | |
Checks whether the string starts with a specified prefix | |
| Returns a substring from the string |
Returns the array of characters from the string | |
Converts the string to lowercase | |
Returns the string value of an object | |
Converts the string to uppercase | |
Removes leading and trailing whitespace from the string | |
In contrast to the immutable string, the java.lang.StringBuilder
class is a modifiable
and expandable buffer for characters. You can use it to create a big
string efficiently. StringBuilder
and
StringBuffer
are twins; they have
exactly the same API. StringBuilder
was added in Java 5.0 as a drop-in, unsynchronized replacement for
StringBuffer
. We’ll come back to that
in a bit.
First, let’s look at some examples of String
construction:
// Could be better
String
ball
=
"Hello"
;
ball
=
ball
+
" there."
;
ball
=
ball
+
" How are you?"
;
This example creates an unnecessary String
object each time we use the
concatenation operator (+). Whether this is significant depends on how
often this code is run and how big the string actually gets. Here’s a
more extreme example:
// Bad use of + ...
while
(
(
line
=
readLine
())
!=
EOF
)
text
+=
line
;
This example repeatedly produces new String
objects. The character array must be
copied over and over, which can adversely affect performance. The
solution is to use a StringBuilder
object and its append()
method:
StringBuilder
sb
=
new
StringBuilder
(
"Hello"
);
sb
.
append
(
" there."
);
sb
.
append
(
" How are you?"
);
StringBuilder
text
=
new
StringBuilder
();
while
(
(
line
=
readline
())
!=
EOF
)
text
.
append
(
line
);
Here, the StringBuilder
efficiently handles expanding the array as necessary. We can get a
String
back from the StringBuilder
with its toString()
method:
String
message
=
sb
.
toString
();
You can also retrieve part of a StringBuilder
as a String
by using one of the substring()
methods.
You might be interested to know that when you write a long
expression using string concatenation, the compiler generates code that
uses a StringBuilder
behind the
scenes:
String
foo
=
"To "
+
"be "
+
"or"
;
It is really equivalent to:
String
foo
=
new
StringBuilder
().
append
(
"To "
).
append
(
"be "
).
append
(
"or"
).
toString
();
In this case, the compiler knows what you are trying to do and takes care of it for you.
The StringBuilder
class
provides a number of overloaded append()
methods for adding any type of data
to the buffer. StringBuilder
also
provides a number of overloaded insert()
methods for
inserting various types of data at a particular location in the string
buffer. Furthermore, you can remove a single character or a range of
characters with the deleteCharAt()
and
delete()
methods.
Finally, you can replace part of the StringBuilder
with the contents of a String
using the replace()
method. The
String
and StringBuilder
classes cooperate so that, in
some cases, no copy of the data has to be made; the string data is
shared between the objects.
You should use a StringBuilder
instead of a String
any time you need
to keep adding characters to a string; it’s designed to handle such
modifications efficiently. You can convert the StringBuilder
to a String
when you need it, or simply concatenate
or print it anywhere you’d use a String
.
As we said earlier, StringBuilder
was added in Java 5.0 as a
replacement for StringBuffer
. The
only real difference between the two is that the methods of StringBuffer
are synchronized and the methods
of StringBuilder
are not. This means
that if you wish to use StringBuilder
from multiple threads concurrently, you must synchronize the access
yourself (which is easily accomplished). The reason for the change is
that most simple usage does not require any synchronization and
shouldn’t have to pay the associated penalty (slight as it
is).
[29] When in doubt, measure it! If your String
-manipulating code is clean and easy
to understand, don’t rewrite it until someone proves to you that it is
too slow. Chances are that they will be wrong. And don’t be fooled by
relative comparisons. A millisecond is 1,000 times slower than a
microsecond, but it still may be negligible to your application’s
overall performance.
[30] On Mac OS X, the default encoding is MacRoman. In Windows, it is CP1252. On some Unix platforms it is ISO8859_1.
Get Learning Java, 4th Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.