Chapter 4. Strings
In the simplest terms, a string in a programming language is a sequence of one or more
characters and usually represents some human language, whether written or spoken. You are
probably more likely to use methods from the String
class
than from any other class in Ruby. Manipulating strings is one of the biggest chores a
programmer has to manage. Fortunately, Ruby offers a lot of convenience in this
department.
For more information on string methods, go to http://www.ruby-doc.org/core/classes/String.html. You can also use the command line
to get information on a method. For example, to get information on the String
instance method chop
,
type:
ri String#chop [or]
ri String.chop
You can use #
or .
between the class and method names when returning two methods with ri.
This, of course, assumes that you have the Ruby documentation package installed and that it is
in the path (see "Installing Ruby,” in Chapter 1).
Creating Strings
You can create strings with the new
method. For
example, this line creates a new, empty string called title
:
title = String.new # => ""
Now you have a new string, but it is only filled with virtual air. You can test a string
to see if it is empty with empty?
:
title.empty? # => true
You might want to test a string to see if it is empty before you process it, or to end processing when you run into an empty string. You can also test its length or size:
title.length [or]
title.size # => 0
The length
and size
methods do the same thing: they both return an integer indicating how many
characters a string holds.
The new
method can take a string argument:
title = String.new( "Much Ado about Nothing" )
Now check title
:
title.empty? # => false title.length # => 22
There we go. Not quite so vacuous as before.
Another way to create a string is with Kernel
’s
String
method:
title = String
( "Much Ado about Nothing" )
puts title # => Much Ado about Nothing
But there is an even easier way. You don’t have to use the new
or String
methods to generate a new
string. Just an assignment operator and a pair of double quotes will do fine:
sad_love_story = "Romeo and Juliet"
You can also use single quotes:
sad_love_story = 'Romeo and Juliet'
The difference between using double quotes versus single quotes is that double quotes
interpret escaped characters and single quotes
preserve them. I’ll show you what that means. Here’s what you get
with double quotes (interprets \n
as a newline):
lear = "King Lear\n
A Tragedy\n
by William Shakespeare" puts lear # => King Lear # A Tragedy # by William Shakespeare
And here’s what you get with single quotes (preserves \n
in context):
lear = 'King Lear\n
A Tragedy\n
by William Shakespeare' puts lear # => King Lear\n
A Tragedy\n
by William Shakespeare
For a complete list of escape characters, see Table A-1 in Appendix A.
General Delimited Strings
Another way to create strings is with general delimited strings,
which are all preceded by a %
and then followed by a
matched pair of delimiter characters, such as !
,
{
, or [
(must be
nonalphanumeric). The string is embedded between the delimiters. All of the following
examples are delimited by different characters (you can even use quote characters):
comedy = %!As You Like It! history = %[Henry V] tragedy = %(Julius Ceasar)
You can also use %Q
, which is the equivalent of a
double-quoted string; %q
, which is equivalent to a
single-quoted string; or %x
for a back-quoted string
(`) for command output.
Here Documents
A here document allows you to build strings from multiple lines
on the fly, while preserving newlines. A here document is formed with a <<
and a delimiting character or string of your choice.
I’ll save Shakespeare’s 29th sonnet as a here document, with 29
as the delimiter:
sonnet = <<29 When in disgrace with fortune and men's eyes I all alone beweep my outcast state, And trouble deaf heaven with my bootless cries, And look upon myself, and curse my fate, Wishing me like to one more rich in hope, Featured like him, like him with friends possessed, Desiring this man's art, and that man's scope, With what I most enjoy contented least; Yet in these thoughts my self almost despising, Haply I think on thee, and then my state, Like to the lark at break of day arising From sullen earth, sings hymns at heaven's gate; For thy sweet love remembered such wealth brings That then I scorn to change my state with kings. 29
This document is stored in the string sonnet
, but
you can create a here document without placing it in a string. Wherever the line breaks, a
record separator (such as \n
) is inserted at that
place. Now use:
puts sonnet
You’ll see for yourself how the lines break.
You can also “delimit the delimiter” for various effects:
sonnet =<<hamlet
# same as double-quoted string O my prophetic soul! My uncle! hamlet sonnet =<<"hamlet"
# again as double-quoted string O my prophetic soul! My uncle! hamlet sonnet =<<'ghost'
# same as single-quoted string Pity me not, but lend thy serious hearing To what I shall unfold. ghost my_dir =<<`dir`
# same as back ticks ls -l dir ind =<<-hello
# for indentation Hello, Matz! hello
Concatenating Strings
In Ruby, you can add on to an existing string with various concatenation techniques. With Ruby, you don’t have to jump through the hoops that you might if you were using a language with immutable strings.
Adjacent strings can be concatenated simply because that they are next to each other:
"Hello," " " "Matz" "!" # => "Hello, Matz!"
You can also use the +
method:
"Hello,"+
" "+
"Matz"+
"!" # => "Hello, Matz!"
You can even mix double and single quotes, as long as they are properly paired.
Another way to do this is with the <<
method.
You can add a single string:
"Hello, " <<
"Matz!" # => Hello, Matz!
Or you can chain them together with multiple calls to <<
:
"Hello,"<<
" "<<
"Matz"<<
"!" # => Hello, Matz!
An alternative to <<
is the concat
method (which does not allow you to chain):
"Hello, ".concat "Matz!"
Or you can do it this way:
h = "Hello, " m = "Matz!" h.concat(m)
You can make a string immutable with Object
’s
freeze
method:
greet = "Hello, Matz!" greet.freeze
# try to append something greet.concat("!") # => TypeError: can't modify frozen string # is the object frozen? greet.frozen?
# => true
Accessing Strings
You can extract and manipulate segments of a string using the String
method []
. It’s an alias of the
slice
method: any place you use []
, you can use slice
, with
the same arguments. slice!
performs in-place changes and
is a counterpart to []=
.
We’ll access several strings in the examples that follow:
line = "A horse! a horse! my kingdom for a horse!" cite = "Act V, Scene IV" speaker = "King Richard III"
If you enter a string as the argument to []
, it will
return that string, if found:
speaker['King'] # => "King"
Otherwise, it will return nil
—in other words, it’s
trying to break the news to you: “I didn’t find the string you were looking for.” If you
specify a Fixnum
(integer) as an index, it returns the
decimal character code for the character found at the index location:
line[7] # => 33
At the location 7
, []
found the character 33
(!
). If you add the chr
method
(from the Integer
class), you’ll get the actual
character:
line[7].chr # => "!"
You can use an offset and length (two Fixnum
s) to
tell []
the index location where you want to start, and
then how many characters you want to retrieve:
line[18, 23] # => "my kingdom for a horse!"
You started at index location 18, and then scooped up 23 characters from there,
inclusive. You can capitalize the result with the capitalize
method, if you want:
line[18, 23].capitalize # => "My kingdom for a horse!"
(More on capitalize
and other similar methods later
in the chapter.)
Enter a range to grab a range of characters. Two dots (..
) means include the last character:
cite[0..4] # => "Act V"
Three dots (...
) means exclude
the last value:
cite[0...4] # => "Act "
You can also use regular expressions (see the end of the chapter), as shown here:
line[/horse!$/] # => "horse!"
The regular expression /horse!$/
asks, “Does the word
horse
, followed by !
come at the end of the line ($
)?” If this is true, this
call returns horse!
; nil
if not. Adding another argument, a Fixnum
, returns that portion of the matched data, starting at 0
in this instance:
line[/^A horse/, 0] # => "A horse"
The index
method returns the index location of a
matching substring. So if you use index
like this:
line.index("k") # => 21
21
refers to the index location where the letter
k
occurs in line
.
See if you get what is going on in the following examples:
line[line.index("k")] # => 107 line[line.index("k")].chr # => "k"
If you figured out these statements, you are starting to catch on! It doesn’t take long,
does it? If you didn’t understand what happened, here it is: when line.index("k")
was called, it returned the value 21
, which was fed as a numeric argument to []
; this, in effect, called line[21]
.
Comparing Strings
Sometimes you need to test two strings to see if they are the same or not. You can do
that with the ==
method. For example, you might want to
test a string before printing something:
print "What was the question again?" if question == ""
Also, here are two versions of the opening paragraph of Abraham Lincoln’s Gettysburg Address, one from the so-called Hay manuscript, the other from the Nicolay (see http://www.loc.gov/exhibits/gadd/gadrft.html):
hay = "Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal." nicolay = "Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in liberty, and dedicated to the proposition that \"all men are created equal\""
The strings are only slightly different (for example, Liberty is capitalized in the Hay version). Let’s compare these strings:
hay == nicolay # => false
The result is false
, because they must match exactly.
(We’ll let the historians figure out how to match them up.) You could also apply the
eql?
method and get the same results, though eql?
and ==
are slightly
different:
==
returns true if two objects areString
s, false otherwise.eql?
returns true if two strings are equal in length and content, false otherwise.
Here eql?
returns false:
hay.eql? nicolay # => false
Yet another way to compare strings is with the <=>
method, commonly called the spaceship operator.
It compares the character code values of the strings, returning −1
(less than), 0
(equals), or 1
(greater than), depending on the comparison, which is
case-sensitive:
"a" <=> "a" # => 0 "a" <=> 97.chr # => 0 "a" <=> "b" # => −1 "a" <=> "`" # => 1
A case-insensitive comparison is possible with casecmp
, which has the same possible results as <=>
(−1
, 0
, 1
) but doesn’t care about case:
"a" <=> "A" # => 1 "a".casecmp "A" # => 0 "ferlin husky".casecmp "Ferlin Husky" # => 0 "Ferlin Husky".casecmp "Lefty Frizzell" # => −1
Manipulating Strings
Here’s a fun one to get started with. The *
method
repeats a string by an integer factor:
"A horse! " * 2
# => "A horse! A horse! "
You can concatenate a string to the result:
taf = "That's ".downcase * 3 + "all folks!" # => "that's that's that's all folks!" taf.capitalize # => "That's that's that's all folks!"
Inserting a String in a String
The insert
method lets you insert another string at
a given index in a string. For example, you can correct spelling:
"Be carful.".insert 6, "e"
# => "Be careful."
or add a word (plus a space):
"Be careful!".insert 3, "very "
# => "Be very careful!"
or even throw the *
method in just to prove that
you can:
"Be careful!".insert 3, "very " * 5
# => "Be very very very very very careful!"
Changing All or Part of a String
You can alter all or part of a string, in place, with the []=
method. (Like []
, which is the
counterpart of slice
, []=
is an alias of slice!
, so anywhere you
use []=
, you can use slice!
, with the same arguments.)
Given the following strings (some scoundrel has been editing our Shakespeare text):
line = "A Porsche! a Porsche! my kingdom for a Porsche!" cite = "Act V, Scene V" speaker = "King Richard, 2007"
enter a string as the argument to []=
, and it will
return the new, corrected string, if found; nil
otherwise.
speaker[", 2007"]= "III" # => "III" p speaker # => "King Richard III"
That’s looking better.
If you specify a Fixnum
(integer) as an index, it
returns the corrected string you placed at the index location. (String lengths are
automatically adjusted by Ruby if the replacement string is a different length than the
original.)
cite[13]= "IV" # => "IV" p cite # => "Act V, Scene IV"
At the index 13
, []=
found the substring V
and replaced it
with IV
.
You can use an offset and length (two Fixnum
s) to
tell []=
the index of the substring where you want to
start, and then how many characters you want to retrieve:
line[39,8]= "Porsche 911 Turbo!" # => "Porsche 911 Turbo!" p line # => "A Porsche! a Porsche! my kingdom for a Porsche 911 Turbo!"
You started at index 39
, and went 8
characters from there (inclusive).
You can also enter a range to indicate a range of characters you want to change.
Include the last character with two dots (..
):
speaker[13..15]= "the Third" # => "the Third" p speaker # => "King Richard the Third"
You can also use regular expressions (see "Regular Expressions,” later in this chapter), as shown here:
line[/Porsche!$/]= "Targa!" # => "Targa!" p line # => "A Porsche! a Porsche! my kingdom for a Targa!"
The regular expression /Porsche!$/
matches if
Porsche!
appears at the end of the line ($
). If this is true, the call to []=
exchanges Porsche!
with Targa!
.
The chomp and chop Methods
The chop
(or chop!
) method chops off the last character of a string, and the chomp
(chomp!
) method
chomps off the record separator ($/
)—usually just a
newline—from a string. Consider the string joe
, a
limerick created as a here document:
joe = <<limerick There once was a fellow named Joe quite fond of Edgar Allen Poe He read with delight Nearly half the night When his wife said "Get up!" he said "No." limerick # => "There once was a fellow named Joe\nquite fond of Edgar Allen Poe\n He read with delight\n Nearly half the night\nWhen his wife said \"Get up!\" he said \"No.\"\n"
Apply chomp!
to remove the last record separator
(\n
):
joe.chomp!
# => "There once was a fellow named Joe\nquite
fond of Edgar Allen Poe\n He read with delight\n Nearly half the
night\nWhen his wife said \"Get up!\" he said \"No.\""
Now apply it again, and chomp!
returns nil
without altering the string because there is no record
separator at the end of the string:
joe.chomp! # => nil
chop
, chomp
’s
greedy twin, shows no mercy on the string, removing the last character (a quote) with
abandon:
joe.chop!
= "There once was a fellow named Joe\nquite fond of
Edgar Allen Poe\n He read with delight\n Nearly half the
night\nWhen his wife said \"Get up!\" he said \"No"
The delete Method
With delete
or delete!
, you can delete characters from a string:
"That's call folks!".delete "c"
# => "That's all folks"
That looks easy, because there is only one occurrence of the letter c in the string, so you don’t see any interesting side effects, as you would in the next example. Let’s say you want to get rid of that extra l in alll:
"That's alll folks".delete "l"
# => "That's a foks"
Oh, boy. It cleaned me out of all ls. I can’t use
delete
the way I want, so how do I fix calll
? What if I use two ls instead of one?
"That's alll folks".delete "ll"
# => "That's a foks"
I got the same thing. (I knew I would.) That’s because delete
uses the intersection (what intersects or is the
same in both) of its arguments to decide what part of the string to take out. The nifty
thing about this, though, is you can also negate all or part of an argument with the caret
(^
), similar to its use in regular
expressions:
"That's all folks".delete "abcdefghijklmnopqrstuvwxyz", "^ha" # => "haa"
The caret negates both the characters in the argument, not just the first one (you can
do "^h^a"
, too, and get the same answer).
Substitute the Substring
Try gsub
(or gsub!
). This method replaces a substring (first argument) with a replacement
string (second argument):
"That's alll folks".gsub "alll", "all"
# => "That's all folks"
Or you might do it this way:
"That's alll folks".gsub "lll", "ll"
# => "That's all folks"
The replace
method replaces a string wholesale. Not
just a substring, the whole thing.
call = "All hands on deck!" call.replace "All feet on deck!" # => "All feet on deck!"
So why wouldn’t you just do it this way?
call = "All hands on deck!" call = "All feet on deck!"
Wouldn’t you get the same result? Not exactly. When you use replace
, call
remains the same object,
with the same object ID, but when you assign the string to call
twice, the object and its ID will change. Just a subtlety you ought to
know.
# same object call = "All hands on deck!" # => "All hands on deck!" call.object_id # =>1624370
call.replace "All feet on deck!" # => "All feet on deck!" call.object_id # =>1624370
# different object call = "All hands on deck!" # => "All hands on deck!" call.object_id # =>1600420
call = "All feet on deck!" # => "All feet on deck!" call.object_id # =>1009410
Turn It Around
To reverse the characters means to alter the characters so they read in the opposite
direction. You can do this with the reverse
method (or
reverse!
for permanent damage). Say you want to
reverse the order of the English alphabet:
"abcdefghijklmnopqrstuvwxyz".reverse
# => "zyxwvutsrqponmlkjihgfedcba"
Or, maybe you’d like to reverse a palindrome:
palindrome = "dennis sinned"
palindrome.reverse!
# => "dennis sinned"
p palindrome
Not much harm done, even though reverse!
changed
the string in place. Think about that one for a while.
From a String to an Array
Conveniently, split
converts a string to an array.
The first call to split
is without an argument:
"0123456789".split
# => ["0123456789"]
That was easy, but what about splitting up all the individual values and converting
them into elements? Do that with a regular expression (//
) that cuts up the original string at the junction of characters.
"0123456789".split( // ) # => ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
In the next example, the regular expression matches a comma and a space (/, /
):
c_w = "George Jones, Conway Twitty, Lefty Frizzell, Ferlin Husky"
# => "George Jones, Conway Twitty, Lefty Frizzell, Ferlin Husky"
c_w.split(/, /)
# => ["George Jones", "Conway Twitty",
"Lefty Frizzell", "Ferlin Husky"]
Case Conversion
You can capitalize a word, sentence, or phrase with capitalize
or capitalize!
. (By now you
should know the difference between the two.) Here is a pair of sentences that are under the
influence of capitalize
:
"Ruby finally has a killer app. It's Ruby on Rails.".capitalize
# => "Ruby finally has a killer app. it's ruby on rails."
Notice that the second sentence is not capitalized, which doesn’t look so good. Now you
can see that capitalize
only capitalizes the first letter
of the string, not the beginning of succeeding sentences. Plan accordingly.
Iterating Over a String
To get the effect you want, you may have to split strings up. Here is a list of menu
items, stored in a string. They are separated by \n
.
The each
method (or its synonym each_line
) iterates over each separate item, not just the
first word in the overall string, and capitalizes it:
"new\nopen\nclose\nprint".each { |item| puts item.capitalize }# => # New # Open # Close # Print
By the way, there is one other each
method:
each_byte
. It takes a string apart byte by byte,
returning the decimal value for the character at each index location. Print each character
as a decimal, separated by /
:
"matz".each_byte
{ |b| print b, "/" } # => 109/97/116/122/
Tip
This example assumes that a character is represented by a single byte, which is not
always the case. The default character set for Ruby is ASCII, whose characters may be
represented by bytes. However, if you use UTF-8, characters may be represented in one to
four bytes. You can change your character set from ASCII to UTF-8 by specifying $KCODE = 'u'
at the beginning of your program.
Convert each decimal to its character equivalent with Integer
’s chr
method:
"matz".each_byte
{ |b| print b.chr
, "/" } # => m/a/t/z/
Or append the output to an array—out
:
out = [] # create an empty array "matz".each_byte { |b| p out << b} # => [109] [109, 97] [109, 97, 116] [109, 97, 116, 122] p out # => [109, 97, 116, 122]
You’ll learn more about arrays in Chapter 6.
downcase, upcase, and swapcase
YOU KNOW IT CAN BE ANNOYING TO READ SOMETHING THAT IS ALL IN UPPERCASE LETTERS! It’s
distracting to read. That’s one reason it’s nice that Ruby has the downcase
and downcase!
methods.
"YOU KNOW IT CAN BE ANNOYING TO READ SOMETHING THAT IS IN ALL UPPERCASE
LETTERS!".downcase
# => "you know it can be annoying to
read something that is all in uppercase letters!"
There, that’s better. But now the first letter is lowercase, too. The grammar police
will be on our case. Fix this by adding a call to capitalize
onto the statement.
"YOU KNOW IT CAN BE ANNOYING TO READ SOMETHING THAT IS ALL IN UPPERCASE LETTERS!".downcase
.capitalize
# => "You know it can be annoying to read something that is all in uppercase letters!"
Good. That took care of it.
What if you want to go the other way and change lowercase letters to uppercase? For
example, you may want to get someone’s attention by turning warning text to all uppercase.
You can do that with upcase
or upcase!
.
"warning! keyboard may be hot!".upcase
# => WARNING! KEYBOARD MAY BE HOT!
Sometimes you may want to swap uppercase letters with lowercase. Use swapcase
or swapcase!
. For
example, you can switch an English alphabet list that starts with lowercase first to a
string that starts with uppercase first:
"aAbBcCdDeEfFgGhHiI".swapcase # =>
"AaBbCcDdEeFfGgHhIi"
Managing Whitespace, etc.
You can adjust whitespace (or other characters) on the left or right of a string, center a string in whitespace (or other characters), and strip whitespace away using the following methods. First, create a string—the title of a Shakespeare play:
title = "Love's Labours Lost"
How long is the string? This will be important to you (length
and size
are synonyms).
title.size # => 19
The string title
is 19 characters long. With that
information in tow, we can start making some changes. The ljust
and rjust
methods pad a string with
whitespace or, if specified, some other character. The string will be right justified, and
the number of characters, whitespace or otherwise, must be greater than the length of the
string. Make sense? I hope so. Let’s go over an example or two.
Let’s call these two methods with an argument (an integer) that is less than or equal to the length of the string.
title.ljust 10
# => "Love's Labours Lost" title.rjust 19
# => "Love's Labours Lost"
What happened? Nothing! That’s because the argument must be greater than the length of the string in order to do anything. The added whitespace is calculated based on the length of the string plus the value of the argument. Watch:
title.ljust 20
# => "Love's Labours Lost " title.rjust 25
# => " Love's Labours Lost"
See how it works now? In the call to ljust
, one space
character is added on the right (20 − 19 = 1), and the call to rjust
adds six characters to the left (25 − 19 = 6). If it seems backward, just
remember that the string is always right justified. Still confused? So
am I, but we’ll go on. You can use another character besides the default space character if
you’d like:
title.rjust( 21, "-" )
# => "--
Love's Labours Lost"
or use more than one character—the sequence will be repeated:
title.rjust 25, "->"
# => "->->->
Love's Labours Lost"
OK, now let’s really mess with your head:
title.rjust(20, "-").ljust(21, "-")
# => "-
Love's Labours Lost-
"
You might want to do something like that someday.
If you want to play both ends to the middle, we are be better off using center
instead:
title.center 23
# => " Love's Labours Lost " title.center 23, "-"
# => "--
Love's Labours Lost--
"
With one more tip of the hat, I’ll use center
to
create a comment:
filename = "hack.rb" # => "hack.rb" filename.size # => 7 filename.center 40-7, "#"
# => "#############
hack.rb#############
"
We’ve been adding whitespace and other characters. What if you just want to get rid of
it? Use lstrip
, rstrip
, and strip
(lstrip!
, rstrip!
, and strip!
). Suppose you have a string surrounded by
whitespace:
fear = " Fear is the little darkroom where negatives develope. -- Michael Pritchard "
Oops. Fell asleep with my thumb on the space bar—twice! I can fix it easily now,
starting with the left side (make the change stick to the original string with lstrip!
):
fear.lstrip!
# => "Fear is the little darkroom where
negatives develope. -- Michael Pritchard "
Now the right side:
fear.rstrip!
# => "Fear is the little darkroom where
negatives develope. -- Michael Pritchard"
Or do the whole thing at once:
fear.strip!
# => "Fear is the little darkroom where
negatives develope. -- Michael Pritchard"
strip
removes other kinds of whitespace, too:
"\t\tBye, tabs and line endings!\r\n".strip # => "Bye, tabs and line endings!"
Incrementing Strings
The Ruby String
class has several methods that let
you produce successive strings—that is, strings that increment, starting at the rightmost
character. You can increment strings with next
and
next!
(or succ
and
succ!
). I prefer to use next
. (The methods ending in !
make in-place
changes.) For example:
"a".next [or]
"a".succ # => "b"
Remember, next
increments the rightmost
character:
"aa".next # => "ab"
It adds a character when it reaches a boundary, or adds a digit or decimal place when appropriate, as shown in these lines:
"z".next # => "aa" # two a's after one z "zzzz".next # => "aaaaa" # five a's after four z's "999.0".next # => "999.1" # increment by .1 "999".next # => "1000" # increment from 999 to 1000
We’re not just talking letters here, but any character, based on the character set in use (ASCII in these examples):
" ".next # => "!"
Chain calls of next
together—let’s try three:
"0".next.next.next # => "3"
As you saw earlier, next
works for numbers
represented as strings as well:
"2007".next # => "2008"
Or you can get it to work when numbers are not represented as
strings, though the method will come from a different class, not String
. For example:
2008.next # => 2009
Instead of from String
, this call actually uses the
next
method from Integer
. (The Date
, Generator
, Integer
, and String
classes all have next
methods.)
You can even use a character code via chr
with
next
:
120.chr # => "x" 120.chr.next # => "y"
The upto
method from String
, which uses a block, makes it easy to increment. For example, this call
to upto
prints the English alphabet:
"a".upto("z") { |i| print i } # => abcdefghijklmnopqrstuvwxyz
You could also do this with a for
loop and an
inclusive range:
for i in "a".."z" print i end
You decide what’s simpler. The for
loop takes only
slightly more keystrokes (29 versus 31, including whitespace). But I like upto
.
Converting Strings
You can convert a string into a float (Float
) or
integer (Fixnum
). To convert a string into a float, or,
more precisely, an instance of the String
class into an
instance of Float
, use the to_f
method:
"200".class # => String "200".to_f # => 200.0 "200".to_f.class # => Float
Likewise, to convert a string to an integer, use to_i
:
"100".class # => String "100".to_i # => 100 "100".to_i.class # => Fixnum
To convert a string into a symbol (Symbol
class), you
can use either the to_sym
or intern
methods.
"name".intern # => :name "name".to_sym # => :name
The value of the string, not its name, becomes the symbol:
play = "The Merchant of Venice".intern # => :"The Merchant of Venice"
Convert an object to a string with to_s
. Ruby calls
the to_s
method from the class of the object, not the
String
class (parentheses are optional).
(256.0).class # => Float (256.0).to_s # => "256.0"
Regular Expressions
You have already seen regular expressions in action. A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. The syntax for regular expressions was invented by mathematician Stephen Kleene in the 1950s.
I’ll spend a little time demonstrating some patterns to search for strings. In this little discussion, you’ll learn the fundamentals: how to use basic string patterns, square brackets, alternation, grouping, anchors, shortcuts, repetition operators, and braces. Table 4-1 lists the syntax for regular expressions in Ruby.
We need a little text to munch on. Here are the opening lines of Shakespeare’s 29th sonnet:
opening = "When in disgrace with fortune and men's eyes\nI all alone beweep my outcast state,\n"
Note that this string contains two lines, set off by the newline character \n
.
You can match the first line just by using a word in the pattern:
opening.grep(/men/
) # => ["When in disgrace with fortune and men's eyes\n"]
By the way, grep
is not a String
method; it comes from the Enumerable
module, which the String
class includes, so it is
available for processing strings. grep
takes a pattern as
an argument, and can also take a block (see http://www.ruby-doc.org/core/classes/Enumerable.html).
When you use a pair of square brackets ([]
), you can
match any character in the brackets. Let’s try to match the word man or
men using []
:
opening.grep(/m[ae]
n/) # => ["When in disgrace with fortune and men's eyes\n"]
It would also match a line with the word man in it:
Alternation lets you match alternate forms of a pattern using the
pipe character (|
):
opening.grep(/men|man
/) # => ["When in disgrace with fortune and men's eyes\n"]
Grouping uses parentheses to group a subexpression, like this one that contains an alternation:
opening.grep(/m(e|a)
n/) # => ["When in disgrace with fortune and men's eyes\n"]
Anchors anchor a pattern to the beginning (^
) or end ($
) of a line:
opening.grep(/^
When in/) # => ["When in disgrace with fortune and men's eyes\n"] opening.grep(/outcast state,$
/) # => ["I all alone beweep my outcast state,\n"]
The ^
means that a match is found when the text
When in
is at the beginning of a line, and $
will only match outcast
state
if it is found at the end of a line.
One way to specify the beginning and ending of strings in a pattern is with
shortcuts. Shortcut syntax is brief—a single character preceded by a
backslash. For example, the \d
shortcut represents a
digit; it is the same as using [0-9]
but, well, shorter.
Similarly to ^
, the shortcut \A
matches the beginning of a string, not a line:
opening.grep(/\A
When in/) # => ["When in disgrace with fortune and men's eyes\n"]
Similar to $
, t
he
shortcut \z
matches the end of a string, not a
line:
opening.grep(/outcast state,\z
/) # => ["I all alone beweep my outcast state,"]
The shortcut \Z
matches the end of a string before
the newline character, assuming that a newline character (\n
) is at the end of the string (it won’t work otherwise).
Let’s figure out how to match a phone number in the form (555)123-4567
. Supposing that the string phone
contains a phone number like this, the following pattern will find
it:
phone.grep(/[\(\d\d\d\)]?\d\d\d-\d\d\d\d/
) # => ["(555)123-4567"]
The backslash precedes the parentheses (\(...\)
) to
let the regexp engine know that these are literal characters. Otherwise, the engine will see
the parentheses as enclosing a subexpression. The three \d
s in the parentheses represent three digits. The hyphen (-) is just an
unambiguous character, so you can use it in the pattern as is.
The question mark (?
) is a repetition
operator. It indicates zero or one occurrence of the previous pattern. So the
phone number you are looking for can have an area code in parentheses, or not. The area-code
pattern is surrounded by [
and ]
so that the ?
operator applies to the
entire area code. Either form of the phone number, with or without the area code, will work.
Here is a way to use ?
with just a single character,
u
:
color.grep(/colou?r/
) # => ["I think that colour is just right for you office."]
The plus sign (+
) operator indicates one or more of
the previous pattern, in this case digits:
phone.grep(/[\(\d+
\)]?\d+
-\d+
/) # => ["(555)123-4567"]
Braces ({}
) let you specify the exact number of
digits, such as \d{3}
or \d{4}
:
phone.grep(/[\(\d{3}
\)]?\d{3}
-\d{4}
/)# => ["(555)123-4567"]
Tip
It is also possible to indicate an “at least” amount with {
m
,}
, and a
minimum/maximum number with {
m
,
n
}
.
The String
class also has the =~
method and the !~
operator. If =~
finds a match, it returns the offset
position where the match starts in the string:
color =~
/colou?r/ # => 13
The !~
operator returns true
if it does not match the string, false
otherwise:
color !~
/colou?r/ # => false
Also of interest are the Regexp
and MatchData
classes. The Regexp
class (http://www.ruby-doc.org/core/classes/Regexp.html) lets you create a
regular expression object. The MatchData
class (http://www.ruby-doc.org/core/classes/MatchData.html) provides the special $-
variable, which encapsulates all search results from a
pattern match.
This discussion has given you a decent foundation in regular expressions (see Table 4-1 for a listing). With these fundamentals, you can define most any pattern.
Pattern |
Description |
|
Pattern |
|
General delimited string for a regular expression, where |
|
Matches beginning of line |
|
Matches end of line |
|
Matches any character |
|
Matches nth grouped subexpression |
|
Matches nth grouped subexpression, if already matched; otherwise, refers to octal representation of a character code |
|
Matches character in backslash notation |
|
Matches word character, as in |
|
Matches nonword character |
|
Matches whitespace character, as in |
|
Matches nonwhitespace character |
|
Matches digit, same as |
|
Matches nondigit |
|
Matches beginning of a string |
|
Matches end of a string, or before newline at the end |
|
Matches end of a string |
|
Matches word boundary outside |
|
Matches nonword boundary |
|
Matches point where last match finished |
|
Matches any single character in brackets, such as |
|
Matches any single character not in brackets |
|
Matches 0 or more of previous regular expressions |
|
Matches zero or more of previous regular expressions (nongreedy) |
|
Matches one or more of previous regular expressions |
|
Matches one or more of previous regular expressions (nongreedy) |
|
Matches exactly |
|
Matches at least |
|
Matches at least |
|
Matches at least |
|
Matches zero or one of previous regular expressions |
|
Alternation, such as |
|
Grouping regular expressions or subexpression, such as |
|
Comment |
|
Grouping without back-references (without remembering matched text) |
|
Specify position with pattern |
|
Specify position with pattern negation |
|
Matches independent pattern without backtracking |
|
Toggles |
|
Toggles |
|
Toggles |
|
Toggles |
|
Turns on (or off) i and x options within this noncapturing group |
1.9 and Beyond
In the versions of Ruby that follow, String
will
likely:
Add the
start_with?
andend_with?
methods, which will return true if a string starts with or ends with a given prefix or suffix of the string.Add a
clear
method that will turn a string with a length greater than 1 to an empty string.Add an
ord
method that will return a character code.Add the
partition
andrpartition
methods to partition a string at a given separator.Add a
bytes
method that will return the bytes of a string, one by one.Return a single character string instead of a character code when a string is indexed with
[]
.Consider characters to be more than one byte in length.
Review Questions
Name two ways to concatenate strings.
What happens when you reverse a palindrome?
How do you iterate over a string?
Name two or more case conversion methods.
What methods would you use to adjust space in a string?
Describe alternation in a regular expression pattern?
What does
/\d{3}/
match?How do you convert a string to an array?
What do you think is the easiest way to create a string?
Get Learning Ruby now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.