BUY THIS BOOK
Add to Cart

Print Book $49.99


Add to Cart

Print+PDF $64.99

Add to Cart

PDF $39.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £35.50

What is this?

Looking to Reprint or License this content?


Ruby Cookbook
Ruby Cookbook By Lucas Carlson, Leonard Richardson
July 2006
Pages: 906

Cover | Table of Contents


Table of Contents

Chapter 1: Strings
Ruby is a programmer-friendly language. If you are already familiar with object oriented programming, Ruby should quickly become second nature. If you've struggled with learning object-oriented programming or are not familiar with it, Ruby should make more sense to you than other object-oriented languages because Ruby's methods are consistently named, concise, and generally act the way you expect.
Throughout this book, we demonstrate concepts through interactive Ruby sessions. Strings are a good place to start because not only are they a useful data type, they're easy to create and use. They provide a simple introduction to Ruby, a point of comparison between Ruby and other languages you might know, and an approachable way to introduce important Ruby concepts like duck typing (see Recipe 1.12), open classes (demonstrated in Recipe 1.10), symbols (Recipe 1.7), and even Ruby gems (Recipe 1.20).
If you use Mac OS X or a Unix environment with Ruby installed, go to your command line right now and type irb. If you're using Windows, you can download and install the One-Click Installer from http://rubyforge.org/projects/rubyinstaller/, and do the same from a command prompt (you can also run the fxri program, if that's more comfortable for you). You've now entered an interactive Ruby shell, and you can follow along with the code samples in most of this book's recipes.
Strings in Ruby are much like strings in other dynamic languages like Perl, Python and PHP. They're not too much different from strings in Java and C. Ruby strings are dynamic, mutable, and flexible. Get started with strings by typing this line into your interactive Ruby session:
	string = "My first string"
You should see some output that looks like this:
	=> "My first string"
You typed in a Ruby expression that created a string "My first string", and assigned it to the variable string. The value of that expression is just the new value of string, which is what your interactive Ruby session printed out on the right side of the arrow. Throughout this book, we'll represent this kind of interaction in the following form:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Building a String from Parts
You want to iterate over a data structure, building a string from it as you do.
There are two efficient solutions. The simplest solution is to start with an empty string, and repeatedly append substrings onto it with the << operator:
	hash = { "key1" => "val1", "key2" => "val2" }
	string = ""
	hash.each { |k,v| string << "#{k} is #{v}\n" }
	puts string
	# key1 is val1
	# key2 is val2
This variant of the simple solution is slightly more efficient, but harder to read:
	string = ""
	hash.each { |k,v| string << k << " is " << v << "\n" }
If your data structure is an array, or easily transformed into an array, it's usually more efficient to use Array#join:
	puts hash.keys.join("\n") + "\n"
	# key1
	# key2
In languages like Python and Java, it's very inefficient to build a string by starting with an empty string and adding each substring onto the end. In those languages, strings are immutable, so adding one string to another builds an entirely new string. Doing this multiple times creates a huge number of intermediary strings, each of which is only used as a stepping stone to the next string. This wastes time and memory.
In those languages, the most efficient way to build a string is always to put the substrings into an array or another mutable data structure, one that expands dynamically rather than by implicitly creating entirely new objects. Once you're done processing the substrings, you get a single string with the equivalent of Ruby's Array#join. In Java, this is the purpose of the StringBuffer class.
In Ruby, though, strings are just as mutable as arrays. Just like arrays, they can expand as needed, without using much time or memory. The fastest solution to this problem in Ruby is usually to forgo a holding array and tack the substrings directly onto a base string. Sometimes using Array#join is faster, but it's usually pretty close, and the << construction is generally easier to understand.
If efficiency is important to you, don't build a new string when you can append items onto an existing string. Constructs like str << 'a' + 'b'
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Substituting Variables into Strings
You want to create a string that contains a representation of a Ruby variable or expression.
Within the string, enclose the variable or expression in curly brackets and prefix it with a hash character.
	number = 5
	"The number is #{number}."                      # => "The number is 5."
	"The number is #{5}."                           # => "The number is 5."
	"The number after #{number} is #{number.next}."
	# => "The number after 5 is 6."
	"The number prior to #{number} is #{number-1}."
	# => "The number prior to 5 is 4."
	"We're ##{number}!"                             # => "We're #5!"
When you define a string by putting it in double quotes, Ruby scans it for special substitution codes. The most common case, so common that you might not even think about it, is that Ruby substitutes a single newline character every time a string contains slash followed by the letter n ("\n").
Ruby supports more complex string substitutions as well. Any text kept within the brackets of the special marker #{} (that is, #{text in here}) is interpreted as a Ruby expression. The result of that expression is substituted into the string that gets created. If the result of the expression is not a string, Ruby calls its to_s method and uses that instead.
Once such a string is created, it is indistinguishable from a string created without using the string interpolation feature:
	"#{number}" == '5'                             # => true
You can use string interpolation to run even large chunks of Ruby code inside a string. This extreme example defines a class within a string; its result is the return value of a method defined in the class. You should never have any reason to do this, but it shows the power of this feature.
	%{Here is #{class InstantClass
	   def bar
	      "some text"
	    end
	 end
	 InstantClass.new.bar
	}.}
	# => "Here is some text."
The code run in string interpolations runs in the same context as any other Ruby code in the same location. To take the example above, the InstantClass class has now been defined like any other class, and can be used outside the string that defines it.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Substituting Variables into an Existing String
You want to create a string that contains Ruby expressions or variable substitutions, without actually performing the substitutions. You plan to substitute values into the string later, possibly multiple times with different values each time.
There are two good solutions: printf-style strings, and ERB templates.
Ruby supports a printf-style string format like C's and Python's. Put printf directives into a string and it becomes a template. You can interpolate values into it later using the modulus operator:
	template = 'Oceania has always been at war with %s.'
	template % 'Eurasia'  # => "Oceania has always been at war with Eurasia."
	template % 'Eastasia' # => "Oceania has always been at war with Eastasia."

	'To 2 decimal places: %.2f' % Math::PI       # => "To 2 decimal places: 3.14"
	'Zero-padded: %.5d' % Math::PI               # => "Zero-padded: 00003"
An ERB template looks something like JSP or PHP code. Most of it is treated as a normal string, but certain control sequences are executed as Ruby code. The control sequence is replaced with either the output of the Ruby code, or the value of its last expression:
	require 'erb'

	template = ERB.new %q{Chunky <%= food %>!}
	food = "bacon"
	template.result(binding)                     # => "Chunky bacon!"
	food = "peanut butter"
	template.result(binding)                     # => "Chunky peanut butter!"
You can omit the call to Kernel#binding if you're not in an irb session:
	puts template.result
	# Chunky peanut butter!
You may recognize this format from the .rhtml files used by Rails views: they use ERB behind the scenes.
An ERB template can reference variables like food before they're defined. When you call ERB#result, or ERB#run, the template is executed according to the current values of those variables.
Like JSP and PHP code, ERB templates can contain loops and conditionals. Here's a more sophisticated template:
	template = %q{
	<% if problems.empty? %>
	  Looks like your code is clean!
	<% else %>
	  I found the following possible problems with your code:
	  <% problems.each do |problem, line| %>
	    * <%= problem %> on line <%= line %>
	  <% end %>
	<% end %>}.gsub(/^\s+/, '')
	template = ERB.new(template, nil, '<>')

	problems = [["Use of is_a? instead of duck typing", 23],
	            ["eval() is usually dangerous", 44]]
	template.run(binding)
	# I found the following possible problems with your code:
	# * Use of is_a? instead of duck typing on line 23
	# * eval() is usually dangerous on line 44

	problems = []
	template.run(binding)
	# Looks like your code is clean!
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Reversing a String by Words or Characters
The letters (or words) of your string are in the wrong order.
To create a new string that contains a reversed version of your original string, use the reverse method. To reverse a string in place, use the reverse! method.
	s = ".sdrawkcab si gnirts sihT"
	s.reverse                            # => "This string is backwards."
	s                                    # => ".sdrawkcab si gnirts sihT"

	s. 
reverse!                           # => "This string is backwards."
	s                                    # => "This string is backwards."
To reverse the order of the words in a string, split the string into a list of whitespaceseparated words, then join the list back into a string.
	s = "order. wrong the in are words These"
	s.split(/(\s+)/). 
reverse!.join('')   # => "These words are in the wrong order."
	s.split(/\b/).reverse!.join('')      # => "These words are in the wrong. order"
The String#split method takes a regular expression to use as a separator. Each time the separator matches part of the string, the portion of the string before the separator goes into a list. split then resumes scanning the rest of the string. The result is a list of strings found between instances of the separator. The regular expression /(\s+)/ matches one or more whitespace characters; this splits the string on word boundaries, which works for us because we want to reverse the order of the words.
The regular expression \b matches a word boundary. This is not the same as matching whitespace, because it also matches punctuation. Note the difference in punctuation between the two final examples in the Solution.
Because the regular expression /(\s+)/ includes a set of parentheses, the separator strings themselves are included in the returned list. Therefore, when we join the strings back together, we've preserved whitespace. This example shows the difference between including the parentheses and omitting them:
	"Three little words".split(/\s+/)   # => ["Three", "little", "words"]
	"Three little words".split(/(\s+)/)
	# => ["Three", " ", "little", " ", "words"]
  • Recipe 1.9, "Processing a String One Word at a Time," has some regular expressions for alternative definitions of "word"
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Representing Unprintable Characters
You need to make reference to a control character, a strange UTF-8 character, or some other character that's not on your keyboard.
Ruby gives you a number of escaping mechanisms to refer to unprintable characters. By using one of these mechanisms within a double-quoted string, you can put any binary character into the string.
You can reference any any binary character by encoding its octal representation into the format "\000", or its hexadecimal representation into the format "\x00".
	octal = "\000\001\010\020"
	octal.each_byte { |x| puts x }
	# 0
	# 1
	# 8
	# 16

	hexadecimal = "\x00\x01\x10\x20"
	hexadecimal.each_byte { |x| puts x }
	# 0
	# 1
	# 16
	# 32
This makes it possible to represent UTF-8 characters even when you can't type them or display them in your terminal. Try running this program, and then opening the generated file smiley.html in your web browser:
	open('smiley.html', 'wb') do |f|
	  f << '<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">'
	  f << "\xe2\x98\xBA"
	end
The most common unprintable characters (such as newline) have special mneumonic aliases consisting of a backslash and a letter.
	"\a" == "\x07" # => true # ASCII 0x07 = BEL (Sound system bell)
	"\b" == "\x08" # => true # ASCII 0x08 = BS (Backspace)
	"\e" == "\x1b" # => true # ASCII 0x1B = ESC (Escape)
	"\f" == "\x0c" # => true # ASCII 0x0C = FF (Form feed)
	"\n" == "\x0a" # => true # ASCII 0x0A = LF (Newline/line feed)
	"\r" == "\x0d" # => true # ASCII 0x0D = CR (Carriage return)
	"\t" == "\x09" # => true # ASCII 0x09 = HT (Tab/horizontal tab)
	"\v" == "\x0b" # => true # ASCII 0x0B = VT (Vertical tab)
Ruby stores a string as a sequence of bytes. It makes no difference whether those bytes are printable ASCII characters, binary characters, or a mix of the two.
When Ruby prints out a human-readable string representation of a binary character, it uses the character's \xxx octal representation. Characters with special \x mneumonics are printed as the mneumonic. Printable characters are output as their printable representation, even if another representation was used to create the string.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Converting Between Characters and Values
You want to see the ASCII code for a character, or transform an ASCII code into a string.
To see the ASCII code for a specific character as an integer, use the ? operator:
	?a                 # => 97
	?!                 # => 33
	?\n                # => 10
To see the integer value of a particular in a string, access it as though it were an element of an array:
	'a'[0]             # => 97
	'bad sound'[1]     # => 97
To see the ASCII character corresponding to a given number, call its #chr method. This returns a string containing only one character:
	97.chr              # => "a"
	33.chr              # => "!"
	10.chr              # => "\n"
	0.chr               # => "\000"
	256.chr             # RangeError: 256 out of char range
Though not technically an array, a string acts a lot like like an array of Fixnum objects: one Fixnum for each byte in the string. Accessing a single element of the "array" yields a Fixnum for the corresponding byte: for textual strings, this is an ASCII code. Calling String#each_byte lets you iterate over the Fixnum objects that make up a string.
  • Recipe 1.8, "Processing a String One Character at a Time"
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Converting Between Strings and Symbols
You want to get a string containing the label of a Ruby symbol, or get the Ruby symbol that corresponds to a given string.
To turn a symbol into a string, use Symbol#to_s, or Symbol#id2name, for which to_s is an alias.
	:a_ 
symbol.to_s                        # => "a_symbol"
	:AnotherSymbol.id2name                # => "AnotherSymbol"
	:"Yet another symbol!".to_s           # => "Yet another symbol!"
You usually reference a symbol by just typing its name. If you're given a string in code and need to get the corresponding symbol, you can use String.intern:
	:dodecahedron.object_id               # => 4565262
	symbol_name = "dodecahedron"
	symbol_name.intern                    # => :dodecahedron
	symbol_name.intern.object_id          # => 4565262
A Symbol is about the most basic Ruby object you can create. It's just a name and an internal ID. Symbols are useful becase a given symbol name refers to the same object throughout a Ruby program.
Symbols are often more efficient than strings. Two strings with the same contents are two different objects (one of the strings might be modified later on, and become different), but for any given name there is only one Symbol object. This can save both time and memory.
	"string".object_id          # => 1503030
	"string".object_id          # => 1500330
	:symbol.object_id           # => 4569358
	:symbol.object_id           # => 4569358
If you have n references to a name, you can keep all those references with only one symbol, using only one object's worth of memory. With strings, the same code would use n different objects, all containing the same data. It's also faster to compare two symbols than to compare two strings, because Ruby only has to check the object IDs.
	"string1" == "string2"       # => false
	:symbol1 == :symbol2         # => false
Finally, to quote Ruby hacker Jim Weirich on when to use a string versus a symbol:
  • If the contents (the sequence of characters) of the object are important, use a string.
  • If the identity of the object is important, use a symbol.
  • See Recipe 5.1, "Using Symbols as Hash Keys" for one use of symbols
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Processing a String One Character at a Time
You want to process each character of a string individually.
If you're processing an ASCII document, then each byte corresponds to one character. Use String#each_byte to yield each byte of a string as a number, which you can turn into a one-character string:
	'foobar'.each_byte { |x| puts "#{x} = #{x.chr}" }
	# 102 = f
	# 111 = o
	# 111 = o
	# 98 = b
	# 97 = a
	# 114 = r
Use String#scan to yield each character of a string as a new one-character string:
	'foobar'.scan( /./ ) { |c| puts c }
	# f
	# o
	# o
	# b
	# a
	# r
Since a string is a sequence of bytes, you might think that the String#each method would iterate over the sequence, the way Array#each does. But String#each is actually used to split a string on a given record separator (by default, the newline):
	"foo\nbar".each { |x| puts x }
	# foo
	# bar
The string equivalent of Array#each method is actually each_byte. A string stores its characters as a sequence of Fixnum objects, and each_bytes yields that sequence.
String#each_byte is faster than String#scan, so if you're processing an ASCII file, you might want to use String#each_byte and convert to a string every number passed into the code block (as seen in the Solution).
String#scan works by applying a given regular expression to a string, and yielding each match to the code block you provide. The regular expression /./ matches every character in the string, in turn.
If you have the $KCODE variable set correctly, then the scan technique will work on UTF-8 strings as well. This is the simplest way to sneak a notion of "character" into Ruby's byte-based strings.
Here's a Ruby string containing the UTF-8 encoding of the French phrase "ça va":
	french = "\xc3\xa7a va"
Even if your terminal can't properly display the character "ç", you can see how the behavior of String#scan changes when you make the regular expression Unicodeaware, or set $KCODE so that Ruby handles all strings as UTF-8:
	french.scan(/./) { |c| puts c }
	#
	#
	# a
	#
	# v
	# a

	french.scan(/./u) { |c| puts c }
	# ç
	# a
	#
	# v
	# a

	$KCODE = 'u'
	french.scan(/./) { |c| puts c }
	# ç
	# a
	#
	# v
	# a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Processing a String One Word at a Time
You want to split a piece of text into words, and operate on each word.
First decide what you mean by "word." What separates one word from another? Only whitespace? Whitespace or punctuation? Is "johnny-come-lately" one word or three? Build a regular expression that matches a single word according to whatever definition you need (there are some samples are in the Discussion).
Then pass that regular expression into String#scan. Every word it finds, it will yield to a code block. The word_count method defined below takes a piece of text and creates a histogram of word frequencies. Its regular expression considers a "word" to be a string of Ruby identifier characters: letters, numbers, and underscores.
	class String
	  def 
word_count
	    frequencies = Hash.new(0)
	    downcase.scan(/\w+/) { |word| frequencies[word] += 1 }
	   return frequencies
	  end
	end

	%{Dogs dogs dog dog dogs.}.word_count
	# => {"dogs"=>3, "dog"=>2}
	%{"I have no shame," I said.}.word_count
	# => {"no"=>1, "shame"=>1, "have"=>1, "said"=>1, "i"=>2}
The regular expression /\w+/ is nice and simple, but you can probably do better for your application's definition of "word." You probably don't consider two words separated by an underscore to be a single word. Some English words, like "pan-fried" and "fo'c'sle", contain embedded punctuation. Here are a few more definitions of "word" in regular expression form:
	# Just like /\w+/, but doesn't consider underscore part of a word.
	/[0-9A-Za-z]/

	# Anything that's not whitespace is a word.
	/[^\S]+/

	# Accept dashes and apostrophes as parts of words.
	/[-'\w]+/

	# A pretty good heuristic for matching English words.
	/(\w+([-'.]\w+)*/
The last one deserves some explanation. It matches embedded punctuation within a word, but not at the edges. "Work-in-progress" is recognized as a single word, and "—-never—-" is recognized as the word "never" surrounded by punctuation. This regular expression can even pick out abbreviations and acronyms such as "Ph.D" and "U.N.C.L.E.", though it can't distinguish between the final period of an acronym and the period that ends a sentence. This means that "E.F.F." will be recognized as the word "E.F.F" and then a nonword period.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Changing the Case of a String
Your string is in the wrong case, or no particular case at all.
The String class provides a variety of case-shifting methods:
	s = 'HELLO, I am not here. I WENT to tHe MaRKEt.'
	s. 
upcase           # => "HELLO, I AM NOT HERE. I WENT TO THE MARKET."
	s. 
downcase         # => "hello, i am not here. i went to the market."
	s.swapcase         # => "hello, i AM NOT HERE. i went TO ThE mArkeT."
	s.capitalize       # => "Hello, i am not here. i went to the market."
The upcase and downcase methods force all letters in the string to upper-or lowercase, respectively. The swapcase method transforms uppercase letters into lowercase letters and vice versa. The capitalize method makes the first character of the string uppercase, if it's a letter, and makes all other letters in the string lowercase.
All four methods have corresponding methods that modify a string in place rather than creating a new one: upcase!, downcase!, swapcase!, and capitalize!. Assuming you don't need the original string, these methods will save memory, especially if the string is large.
	un_banged = 'Hello world.'
	un_banged.upcase     # => "HELLO WORLD."
	un_banged            # => "Hello world."

	banged = 'Hello world.'
	banged.upcase!       # => "HELLO WORLD."
	banged               # => "HELLO WORLD."
To capitalize a string without lowercasing the rest of the string (for instance, because the string contains proper nouns), you can modify the first character of the string in place. This corresponds to the capitalize! method. If you want something more like capitalize, you can create a new string out of the old one.
	class String
	  def capitalize_first_letter
	    self[0].chr.capitalize + self[1, size]
	  end

	  def capitalize_first_letter!
	    unless self[0] == (c = self[0,1].upcase[0])
	      self[0] = c
	      self
	    end
	    # Return nil if no change was made, like upcase! et al.
	  end
	end

	s = 'i told Alice. She remembers now.'
	s.capitalize_first_letter        # => "I told Alice. She remembers now."
	s                                # => "i told Alice. She remembers now."
	s.capitalize_first_letter!
	s                                # => "I told Alice. She remembers now."
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Managing Whitespace
Your string contains too much whitespace, not enough whitespace, or the wrong kind of whitespace.
Use strip to remove whitespace from the beginning and end of a string:
	" \tWhitespace at beginning and end. \t\n\n". 
strip
Add whitespace to one or both ends of a string with ljust, rjust, and center:
	s = "Some text."
	s. 
center(15)
	s. 
ljust(15)
	s. 
rjust(15)
Use the gsub method with a string or regular expression to make more complex changes, such as to replace one type of whitespace with another.
	#Normalize Ruby source code by replacing tabs with spaces
	rubyCode.gsub("\t", "     ")

	#Transform Windows-style newlines to Unix-style newlines
	"Line one\n\rLine two\n\r".gsub(\n\r", "\n")
	# => "Line one\nLine two\n"

	#Transform all runs of whitespace into a single space character
	"\n\rThis string\t\t\tuses\n all\tsorts\nof whitespace.".gsub(/\s+/," ")
	# => " This string uses all sorts of whitespace."
What counts as whitespace? Any of these five characters: space, tab (\t), newline (\n), linefeed (\r), and form feed (\f). The regular expression /\s/ matches any one character from that set. The strip method strips any combination of those characters from the beginning or end of a string.
In rare cases you may need to handle oddball "space" characters like backspace (\b or \010) and vertical tab (\v or \012). These are not part of the \s character group in a regular expression, so use a custom character group to catch these characters.
	" \bIt's whitespace, Jim,\vbut not as we know it.\n".gsub(/[\s\b\v]+/, " ")
	# => "It's whitespace, Jim, but not as we know it."
To remove whitespace from only one end of a string, use the lstrip or rstrip method:
	s = "   Whitespace madness! "
	s.lstrip                        # => "Whitespace madness! "
	s.rstrip                        # => "   Whitespace madness!"
The methods for adding whitespace to a string (center, ljust, and rjust) take a single argument: the total length of the string they should return, counting the original string and any added whitespace. If
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Testing Whether an Object Is String-Like
You want to see whether you can treat an object as a string.
Check whether the object defines the to_str method.
	'A string'.respond_to? :to_str        # => true
	Exception.new.respond_to? :to_str     # => true
	4.respond_to? :to_str                 # => false
More generally, check whether the object defines the specific method of String you're thinking about calling. If the object defines that method, the right thing to do is usually to go ahead and call the method. This will make your code work in more places:
	def join_to_successor(s)
	  raise ArgumentError, 'No successor method!' unless s.respond_to? :succ
	  return "#{s}#{s.succ}"
	end

	join_to_successor('a')           # => "ab"	
	join_to_successor(4)             # => "45"
	join_to_successor(4.01)
	# ArgumentError: No successor method!
If I'd checked s.is_a? String instead of s.respond_to? :succ, then I wouldn't have been able to call join_to_successor on an integer.
This is the simplest example of Ruby's philosophy of "duck typing:" if an object quacks like a duck (or acts like a string), just go ahead and treat it as a duck (or a string). Whenever possible, you should treat objects according to the methods they define rather than the classes from which they inherit or the modules they include.
Calling obj.is_a? String will tell you whether an object derives from the String class, but it will overlook objects that, though intended to be used as strings, don't inherit from String.
Exceptions, for instance, are essentially strings that have extra information associated with them. But they don't subclass class name "String". Code that uses is_a? String to check for stringness will overlook the essential stringness of Exceptions. Many add-on Ruby modules define other classes that can act as strings: code that calls is_a? String will break when given an instance of one of those classes.
The idea to take to heart here is the general rule of duck typing: to see whether provided data implements a certain method, use respond_to? instead of checking the class. This lets a future user (possibly yourself!) create new classes that offer the same capability, without being tied down to the preexisting class structure. All you have to do is make the method names match up.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Getting the Parts of a String You Want
You want only certain pieces of a string.
To get a substring of a string, call its slice method, or use the array index operator (that is, call the [] method). Either method accepts a Range describing which characters to retrieve, or two Fixnum arguments: the index at which to start, and the length of the substring to be extracted.
	s = 'My kingdom for a string!'
	s. 
slice(3,7)                      # => "kingdom"
	s[3,7]                            # => "kingdom"
	s[0,3]                            # => "My "
	s[11, 5]                          # => "for a"
	s[11, 17]                         # => "for a string!"
To get the first portion of a string that matches a regular expression, pass the regular expression into slice or []:
	s[/.ing/]                         # => "king"
	s[/str.*/]                        # => "string!"
To access a specific byte of a string as a Fixnum, pass only one argument (the zerobased index of the character) into String#slice or [] method. To access a specific byte as a single-character string, pass in its index and the number 1.
	s.slice(3)                        # => 107
	s[3]                              # => 107
	107.chr                           # => "k"
	s.slice(3,1)                      # => "k"
	s[3,1]                            # => "k"
To count from the end of the string instead of the beginning, use negative indexes:
	s.slice(-7,3)                     # => "str"
	s[-7,6]                           # => "string"
If the length of your proposed substring exceeds the length of the string, slice or [] will return the entire string after that point. This leads to a simple shortcut for getting the rightmost portion of a string:
	s[15…s.length]                  # => "a string!"
  • Recipe 1.9, "Processing a String One Word at a Time"
  • Recipe 1.17, "Matching Strings with Regular Expressions"
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Handling International Encodings
You need to handle strings that contain nonASCII characters: probably Unicode characters encoded in UTF-8.
To use Unicode in Ruby, simply add the following to the beginning of code.
	$KCODE='u'
	require 'jcode'
You can also invoke the Ruby interpreter with arguments that do the same thing:
	$ ruby -Ku -rjcode
If you use a Unix environment, you can add the arguments to the shebang line of your Ruby application:
	#!/usr/bin/ruby -Ku -rjcode
The jcode library overrides most of the methods of String and makes them capable of handling multibyte text. The exceptions are String#length, String#count, and String#size, which are not overridden. Instead jcode defines three new methods: String#jlength, string#jcount, and String#jsize.
Consider a UTF-8 string that encodes six Unicode characters: efbca1 (A), efbca2 (B), and so on up to UTF-8 efbca6 (F):
	string = "\xef\xbc\xa1" + "\xef\xbc\xa2" + "\xef\xbc\xa3" +
	         "\xef\xbc\xa4" + "\xef\xbc\xa5" + "\xef\xbc\xa6"
The string contains 18 bytes that encode 6 characters:
	string.size                                          # => 18
	string.jsize                                         # => 6
String#count is a method that takes a strong of bytes, and counts how many times those bytes occurs in the string. String#jcount takes a string of characters and counts how many times those characters occur in the string:
	string.count "\xef\xbc\xa2"                          # => 13
	string.jcount "\xef\xbc\xa2"                         # => 1
String#count treats "\xef\xbc\xa2" as three separate bytes, and counts the number of times each of those bytes shows up in the string. String#jcount treats the same string as a single character, and looks for that character in the string, finding it only once.
	"\xef\xbc\xa2".length                                # => 3
	"\xef\xbc\xa2".jlength                               # => 1
Apart from these differences, Ruby handles most Unicode behind the scenes. Once you have your data in UTF-8 format, you really don't have to worry. Given that Ruby's creator Yukihiro Matsumoto is Japanese, it is no wonder that Ruby handles Unicode so elegantly.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Word-Wrapping Lines of Text
You want to turn a string full of miscellaneous whitespace into a string formatted with linebreaks at appropriate intervals, so that the text can be displayed in a window or sent as an email.
The simplest way to add newlines to a piece of text is to use a regular expression like the following.
	def wrap(s, width=78)
	  s.gsub(/(.{1,#{width}})(\s+|\Z)/, "\\1\n")
	end

	wrap("This text is too short to be wrapped.")
	# => "This text is too short to be wrapped.\n"

	puts wrap("This text is not too short to be wrapped.", 20)
	# This text is not too
	# short to be wrapped.

	puts wrap("These ten-character columns are stifling my creativity!", 10)
	# These
	# ten-character
	# columns
	# are
	# stifling
	# my
	# creativity!
The code given in the Solution preserves the original formatting of the string, inserting additional line breaks where necessary. This works well when you want to preserve the existing formatting while squishing everything into a smaller space:
	poetry = %q{It is an ancient Mariner,
	And he stoppeth one of three.
	"By thy long beard and glittering eye,
	Now wherefore stopp'st thou me?}

	puts wrap(poetry, 20)
	# It is an ancient
	# Mariner,
	# And he stoppeth one
	# of three.
	# "By thy long beard
	# and glittering eye,
	# Now wherefore
	# stopp'st thou me?
But sometimes the existing whitespace isn't important, and preserving it makes the result look bad:
	prose = %q{I find myself alone these days, more often than not,
	watching the rain run down nearby windows. How long has it been
	raining? The newspapers now print the total, but no one reads them
	anymore.}

	puts wrap(prose, 60)
	# I find myself alone these days, more often than not,
	# watching the rain run down nearby windows. How long has it
	# been
	# raining? The newspapers now print the total, but no one
	# reads them
	# anymore.
Looks pretty ragged. In this case, we want to get replace the original newlines with new ones. The simplest way to do this is to preprocess the string with another regular expression:
	def reformat_wrapped(s, width=78)
	 s.gsub(/\s+/, " ").gsub(/(.{1,#{width}})( |\Z)/, "\\1\n")
	end
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Generating a Succession of Strings
You want to iterate over a series of alphabetically-increasing strings as you would over a series of numbers.
If you know both the start and end points of your succession, you can simply create a range and use Range#each, as you would for numbers:
	('aa'..'ag').each { |x| puts x }
	# aa
	# ab
	# ac
	# ad
	# ae
	# af
	# ag
The method that generates the successor of a given string is String#succ. If you don't know the end point of your succession, you can define a generator that uses succ, and break from the generator when you're done.
	def endless_string_succession(start)
	  while true
	    yield start
	    start = start.succ
	  end
	end
This code iterates over an endless succession of strings, stopping when the last two letters are the same:
	endless_string_succession('fol') do |x|
	  puts x
	  break if x[-1] == x[-2]
	end
	# fol
	# fom
	# fon
	# foo
Imagine a string as an odometer. Each character position of the string has a separate dial, and the current odometer reading is your string. Each dial always shows the same kind of character. A dial that starts out showing a number will always show a number. A character that starts out showing an uppercase letter will always show an uppercase letter.
The string succession operation increments the odometer. It moves the rightmost dial forward one space. This might make the rightmost dial wrap around to the beginning: if that happens, the dial directly to its left is also moved forward one space. This might make that dial wrap around to the beginning, and so on:
	'89999'.succ                   # => "90000"
	'nzzzz'.succ                   # => "oaaaa"
When the leftmost dial wraps around, a new dial is added to the left of the odometer. The new dial is always of the same type as the old leftmost dial. If the old leftmost dial showed capital letters, then so will the new leftmost dial:
	'Zzz'.succ                     # => "AAaa"
Lowercase letters wrap around from "z" to "a". If the first character is a lowercase letter, then when it wraps around, an "a" is added on to the beginning of the string:
	'z'.succ                       # => "aa"
	'aa'.succ                      # =>  "ab"
	'zz'.succ                      # => "aaa"
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Matching Strings with Regular Expressions
You want to know whether or not a string matches a certain pattern.
You can usually describe the pattern as a regular expression. The =~ operator tests a string against a regular expression:
	string = 'This is a 30-character string.'

	if string =~ /([0-9]+)-character/ and $1.to_i == string.length
	  "Yes, there are #$1 characters in that string."
	end
	# => "Yes, there are 30 characters in that string."
You can also use Regexp#match:
	match = Regexp.compile('([0-9]+)-character').match(string)
	if match && match[1].to_i == string.length
	  "Yes, there are #{match[1]} characters in that string."
	end
	# => "Yes, there are 30 characters in that string."
You can check a string against a series of regular expressions with a case statement:
	string = "123"

	case string
	when /^[a-zA-Z]+$/
	  "Letters"
	when /^[0-9]+$/
	  "Numbers"
	else
	  "Mixed"
	end
	# => "Numbers"
Regular expressions are a cryptic but powerful minilanguage for string matching and substring extraction. They've been around for a long time in Unix utilities like sed, but Perl was the first general-purpose programming language to include them. Now almost all modern languages have support for Perl-style regular expression.
Ruby provides several ways of initializing regular expressions. The following are all equivalent and create equivalent Regexp objects:
	/something/
	Regexp.new("something")
	Regexp.compile("something")
	%r{something}
The following modifiers are also of note.
Regexp::IGNORECASE
i
Makes matches case-insensitive.
Regexp::MULTILINE
m
Normally, a regexp matches against a single line of a string. This will cause a regexp to treat line breaks like any other character.
Regexp::EXTENDED
x
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Replacing Multiple Patterns in a Single Pass
You want to perform multiple, simultaneous search-and-replace operations on a string.
Use the Regexp.union method to aggregate the regular expressions you want to match into one big regular expression that matches any of them. Pass the big regular expression into String#gsub, along with a code block that takes a MatchData object. You can detect which of your search terms actually triggered the regexp match, and choose the appropriate replacement term:
	class String
	  def mgsub(key_value_pairs=[].freeze)
	    regexp_fragments = key_value_pairs.collect { |k,v| k }
	    gsub( 
Regexp.union(*regexp_fragments)) do |match|
	      key_value_pairs.detect{|k,v| k =~ match}[1]
	    end
	  end
	end
Here's a simple example:
	"GO HOME!".mgsub([[/.*GO/i, 'Home'], [/home/i, 'is where the heart is']])
	# => "Home is where the heart is!"
This example replaces all letters with pound signs, and all pound signs with the letter P:
	"Here is number #123".mgsub([[/[a-z]/i, '#'], [/#/, 'P']])
	# => "#### ## ###### P123"
The naive solution is to simply string together multiple gsub calls. The following examples, copied from the solution, show why this is often a bad idea:
	"GO HOME!".gsub(/.*GO/i, 'Home').gsub(/home/i, 'is where the heart is')
	# => "is where the heart is is where the heart is!"

	"Here is number #123".gsub(/[a-z]/i, "#").gsub(/#/, "P")
	# => "PPPP PP PPPPPP P123"
In both cases, our replacement strings turned out to match the search term of a later gsub call. Our replacement strings were themselves subject to search-and-replace. In the first example, the conflict can be fixed by reversing the order of the substitutions. The second example shows a case where reversing the order won't help. You need to do all your replacements in a single pass over the string.
The mgsub method will take a hash, but it's safer to pass in an array of key-value pairs. This is because elements in a hash come out in no particular order, so you can't control the order of substution. Here's a demonstration of the problem:
	"between".mgsub(/ee/ => 'AA', /e/ => 'E') # Bad code
	# => "bEtwEEn"

	"between".mgsub([[/ee/, 'AA'], [/e/, 'E']]) # Good code
	# => "bEtwAAn"
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Validating an Email Address
You need to see whether an email address is valid.
Here's a sampling of valid email addresses you might encounter:
	test_addresses = [ #The following are valid addresses according to RFC822.
	                   'joe@example.com', 'joe.bloggs@mail.example.com',
	                   'joe+ruby-mail@example.com', 'joe(and-mary)@example.museum',
	                   'joe@localhost',
Here are some invalid email addresses you might encounter:
	                   # Complete the list with some invalid addresses
	                   'joe', 'joe@', '@example.com',
	                   'joe@example@example.com',
	                   'joe and mary@example.com' ]
And here are some regular expressions that do an okay job of filtering out bad email addresses. The first one does very basic checking for ill-formed addresses:
	valid = '[^ @]+' # Exclude characters always invalid in email addresses
	username_and_machine = /^#{valid}@#{valid}$/

	test_addresses.collect { |i| i =~ username_and_machine }
	# => [0, 0, 0, 0, 0, nil, nil, nil, nil, nil]
The second one prohibits the use of local-network addresses like "joe@localhost". Most applications should prohibit such addresses.
	username_and_machine_with_tld = /^#{valid}@#{valid}\.#{valid}$/

	test_addresses.collect { |i| i =~ username_and_machine_with_tld }
	# => [0, 0, 0, 0, nil, nil, nil, nil, nil, nil]
However, the odds are good that you're solving the wrong problem.
Most email address validation is done with naive regular expressions like the ones given above. Unfortunately, these regular expressions are usually written too strictly, and reject many email addresses. This is a common source of frustration for people with unusual email addresses like joe(and-mary)@example.museum, or people taking advantage of special features of email, as in joe+ruby-mail@example.com. The regular expressions given above err on the opposite side: they'll accept some syntactically invalid email addresses, but they won't reject valid addresses.
Why not give a simple regular expression that always works? Because there's no such thing. The definition of the syntax is anything but simple. Perl hacker Paul Warren defined an 6343-character regular expression for Perl's Mail::RFC822::Address module, and even it needs some preprocessing to accept absolutely every allowable email address. Warren's regular expression will work unaltered in Ruby, but if you really want it, you should go online and find it, because it would be foolish to try to type it in.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Classifying Text with a Bayesian Analyzer
You want to classify chunks of text by example: an email message is either spam or not spam, a joke is either funny or not funny, and so on.
Use Lucas Carlson's Classifier library, available as the classifier gem. It provides a naive Bayesian classifier, and one that implements Latent Semantic Indexing, a more advanced technique.
The interface for the naive Bayesian classifier is very straightforward. You create a Classifier::Bayes object with some classifications, and train it on text chunks whose classification is known:
	require 'rubygems'
	require 'classifier'

	classifier = Classifier::Bayes.new('Spam', 'Not spam')

	classifier.train_spam 'are you in the market for viagra? we sell viagra'
	classifier.train_not_spam 'hi there, are we still on for lunch?'
You can then feed the classifier text chunks whose classification is unknown, and have it guess:
	classifier.classify "we sell the cheapest viagra on the market"
	# => "Spam"
	classifier.classify "lunch sounds great"
	# => "Not spam"
Bayesian analysis is based on probablities. When you train the classifier, you are giving it a set of words and the classifier keeps track of how often words show up in each category. In the simple spam filter built in the Solution, the frequency hash looks like the @categories variable below:
	classifier
	# => #<Classifier::Bayes:0xb7cec7c8
	#       @categories={:"Not spam"=>
	#                      { :lunch=>1, :for=>1, :there=>1,
	#                        :"?"=>1, :still=>1, :","=>1 },
	#                    :Spam=>
	#                      { :market=>1, :for=>1, :viagra=>2, :"?"=>1, :sell=>1 }
	#                   },
	#       @total_words=12>
These hashes are used to build probability calculations. Note that since we mentioned the word "viagra" twice in spam messages, there is a 2 in the "Spam" frequency hash for that word. That makes it more spam-like than other words like "for" (which also shows up in nonspam) or "sell" (which only shows up once in spam). The classifier can apply these probabilities to previously unseen text and guess at a classification for it.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Numbers
Numbers are as fundamental to computing as breath is to human life. Even programs that have nothing to do with math need to count the items in a data structure, display average running times, or use numbers as a source of randomness. Ruby makes it easy to represent numbers, letting you breathe easy and tackle the harder problems of programming.
An issue that comes up when you're programming with numbers is that there are several different implementations of "number," optimized for different purposes: 32bit integers, floating-point numbers, and so on. Ruby tries to hide these details from you, but it's important to know about them because they often manifest as mysteriously incorrect calculations.
The first distinction is between small numbers and large ones. If you've used other programming languages, you probably know that you must use different data types to hold small numbers and large numbers (assuming that the language supports large numbers at all). Ruby has different classes for small numbers (Fixnum) and large numbers (Bignum), but you don't usually have to worry about the difference. When you type in a number, Ruby sees how big it is and creates an object of the appropriate class.
	1000.class                           # => Fixnum
	10000000000.class                    # => Bignum
	(2**30 - 1).class                    # => Fixnum
	(2**30).class                        # => Bignum
When you perform arithmetic, Ruby automatically does any needed conversions. You don't have to worry about the difference between small and large numbers:
	small = 1000
	big = small ** 5                     # => 1000000000000000
	big.class                            # => Bignum
	smaller = big / big                  # => 1
	smaller.class                        # => Fixnum
The other major distinction is between whole numbers (integers) and fractional numbers. Like all modern programming languages, Ruby implements the IEEE floating-point standard for representing fractional numbers. If you type a number that includes a decimal point, Ruby creates a Float object instead of a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Parsing a Number from a String
Given a string that contains some representation of a number, you want to get the corresponding integer or floating-point value.
Use String#to_i to turn a string into an integer. Use String#to_f to turn a string into a floating-point number.
	'400'.to_i                           # => 400
	'3.14'.to_f                          # => 3.14
	'1.602e-19'.to_f                     # => 1.602e-19
Unlike Perl and PHP, Ruby does not automatically make a number out of a string that contains a number. You must explicitly call a conversion method that tells Ruby how you want the string to be converted.
Along with to_i and to_f, there are other ways to convert strings into numbers. If you have a string that represents a hex or octal string, you can call String#hex or String#oct to get the decimal equivalent. This is the same as passing the base of the number into to_i:
	'405'.oct                            # => 261
	'405'.to_i(8)                        # => 261
	'405'.hex                            # => 1029
	'405'.to_i(16)                       # => 1029
	'fed'.hex                            # => 4077
	'fed'.to_i(16)                       # => 4077
If to_i, to_f, hex,or oct find a character that can't be part of the kind of number they're looking for, they stop processing the string at that character and return the number so far. If the string's first character is unusable, the result is zero.
	"13: a baker's dozen".to_i                         # => 13
	'1001 Nights'.to_i                                 # => 1001
	'The 1000 Nights and a Night'.to_i                 # => 0
	'60.50 Misc. Agricultural Equipment'.to_f          # => 60.5
	'$60.50'.to_f                                      # => 0.0
	'Feed the monster!'.hex                            # => 65261
	'I fed the monster at Canoga Park Waterslides'.hex # => 0
	'0xA2Z'.hex                                        # => 162
	'-10'.oct                                          # => -8
	'-109'.oct                                         # => -8
	'3.14'.to_i                                        # => 3
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Comparing Floating-Point Numbers
Floating-point numbers are not suitable for exact comparison. Often, two numbers that should be equal are actually slightly different. The Ruby interpreter can make seemingly nonsensical assertions when floating-point numbers are involved:
	1.8 + 0.1                            # => 1.9
	1.8 + 0.1 == 1.9                     # => false
	1.8 + 0.1 > 1.9                      # => true
You want to do comparison operations approximately, so that floating-point numbers infintesimally close together can be treated equally.
You can avoid this problem altogether by using BigDecimal numbers instead of floats (see Recipe 2.3). BigDecima