Cover | Table of Contents
irb. If you're using Windows, you can download and install the One-Click Installer from http://rubyforge.org/projects/rubyinstaller/, and do the same from a command prompt (you can also run the fxri program, if that's more comfortable for you). You've now entered an interactive Ruby shell, and you can follow along with the code samples in most of this book's recipes.string = "My first string"
=> "My first string"
string. The value of that expression is just the new value of string, which is what your interactive Ruby session printed out on the right side of the arrow. Throughout this book, we'll represent this kind of interaction in the following form:
<< operator: hash = { "key1" => "val1", "key2" => "val2" }
string = ""
hash.each { |k,v| string << "#{k} is #{v}\n" }
puts string
# key1 is val1
# key2 is val2 string = ""
hash.each { |k,v| string << k << " is " << v << "\n" }
Array#join: puts hash.keys.join("\n") + "\n"
# key1
# key2
Array#join. In Java, this is the purpose of the StringBuffer class.Array#join is faster, but it's usually pretty close, and the << construction is generally easier to understand.str << 'a' + 'b' number = 5
"The number is #{number}." # => "The number is 5."
"The number is #{5}." # => "The number is 5."
"The number after #{number} is #{number.next}."
# => "The number after 5 is 6."
"The number prior to #{number} is #{number-1}."
# => "The number prior to 5 is 4."
"We're ##{number}!" # => "We're #5!"to_s method and uses that instead. "#{number}" == '5' # => true %{Here is #{class InstantClass
def bar
"some text"
end
end
InstantClass.new.bar
}.}
# => "Here is some text."InstantClass class has now been defined like any other class, and can be used outside the string that defines it.printf-style
strings, and
ERB templates.printf-style string format like C's and Python's. Put printf directives into a string and it becomes a template. You can interpolate values into it later using the modulus operator:template = 'Oceania has always been at war with %s.' template % 'Eurasia' # => "Oceania has always been at war with Eurasia." template % 'Eastasia' # => "Oceania has always been at war with Eastasia." 'To 2 decimal places: %.2f' % Math::PI # => "To 2 decimal places: 3.14" 'Zero-padded: %.5d' % Math::PI # => "Zero-padded: 00003"
require 'erb'
template = ERB.new %q{Chunky <%= food %>!}
food = "bacon"
template.result(binding) # => "Chunky bacon!"
food = "peanut butter"
template.result(binding) # => "Chunky peanut butter!"Kernel#binding if you're not in an irb session:puts template.result # Chunky peanut butter!
rhtml files used by Rails views: they use ERB behind the scenes.food before they're defined. When you call ERB#result, or ERB#run, the template is executed according to the current values of those variables. template = %q{
<% if problems.empty? %>
Looks like your code is clean!
<% else %>
I found the following possible problems with your code:
<% problems.each do |problem, line| %>
* <%= problem %> on line <%= line %>
<% end %>
<% end %>}.gsub(/^\s+/, '')
template = ERB.new(template, nil, '<>')
problems = [["Use of is_a? instead of duck typing", 23],
["eval() is usually dangerous", 44]]
template.run(binding)
# I found the following possible problems with your code:
# * Use of is_a? instead of duck typing on line 23
# * eval() is usually dangerous on line 44
problems = []
template.run(binding)
# Looks like your code is clean!reverse method. To reverse a string in place, use the reverse! method.s = ".sdrawkcab si gnirts sihT" s.reverse # => "This string is backwards." s # => ".sdrawkcab si gnirts sihT" s. reverse! # => "This string is backwards." s # => "This string is backwards."
s = "order. wrong the in are words These"
s.split(/(\s+)/).
reverse!.join('') # => "These words are in the wrong order."
s.split(/\b/).reverse!.join('') # => "These words are in the wrong. order"
String#split method takes a regular expression to use as a separator. Each time the separator matches part of the string, the portion of the string before the separator goes into a list. split then resumes scanning the rest of the string. The result is a list of strings found between instances of the separator. The regular expression /(\s+)/ matches one or more whitespace characters; this splits the string on word boundaries, which works for us because we want to reverse the order of the words.\b matches a word boundary. This is not the same as matching whitespace, because it also matches punctuation. Note the difference in punctuation between the two final examples in the Solution./(\s+)/ includes a set of parentheses, the separator strings themselves are included in the returned list. Therefore, when we join the strings back together, we've preserved whitespace. This example shows the difference between including the parentheses and omitting them:"Three little words".split(/\s+/) # => ["Three", "little", "words"] "Three little words".split(/(\s+)/) # => ["Three", " ", "little", " ", "words"]
octal = "\000\001\010\020"
octal.each_byte { |x| puts x }
# 0
# 1
# 8
# 16
hexadecimal = "\x00\x01\x10\x20"
hexadecimal.each_byte { |x| puts x }
# 0
# 1
# 16
# 32 open('smiley.html', 'wb') do |f|
f << '<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">'
f << "\xe2\x98\xBA"
end"\a" == "\x07" # => true # ASCII 0x07 = BEL (Sound system bell) "\b" == "\x08" # => true # ASCII 0x08 = BS (Backspace) "\e" == "\x1b" # => true # ASCII 0x1B = ESC (Escape) "\f" == "\x0c" # => true # ASCII 0x0C = FF (Form feed) "\n" == "\x0a" # => true # ASCII 0x0A = LF (Newline/line feed) "\r" == "\x0d" # => true # ASCII 0x0D = CR (Carriage return) "\t" == "\x09" # => true # ASCII 0x09 = HT (Tab/horizontal tab) "\v" == "\x0b" # => true # ASCII 0x0B = VT (Vertical tab)
\xxx octal representation. Characters with special \x mneumonics are printed as the mneumonic. Printable characters are output as their printable representation, even if another representation was used to create the string.? operator:?a # => 97 ?! # => 33 ?\n # => 10
'a'[0] # => 97 'bad sound'[1] # => 97
#chr method. This returns a string containing only one character:97.chr # => "a" 33.chr # => "!" 10.chr # => "\n" 0.chr # => "\000" 256.chr # RangeError: 256 out of char range
Fixnum objects: one Fixnum for each byte in the string. Accessing a single element of the "array" yields a Fixnum for the corresponding byte: for textual
strings, this is an ASCII code. Calling String#each_byte lets you iterate over the Fixnum objects that make up a string.
Symbol#to_s, or
Symbol#id2name, for which to_s is an alias.:a_ symbol.to_s # => "a_symbol" :AnotherSymbol.id2name # => "AnotherSymbol" :"Yet another symbol!".to_s # => "Yet another symbol!"
String.intern::dodecahedron.object_id # => 4565262 symbol_name = "dodecahedron" symbol_name.intern # => :dodecahedron symbol_name.intern.object_id # => 4565262
Symbol is about the most basic Ruby object you can create. It's just a name and an internal ID. Symbols are useful becase a given symbol name refers to the same object throughout a Ruby program.Symbol object. This can save both time and memory."string".object_id # => 1503030 "string".object_id # => 1500330 :symbol.object_id # => 4569358 :symbol.object_id # => 4569358
"string1" == "string2" # => false :symbol1 == :symbol2 # => false
String#each_byte to yield each byte of a string as a number, which you can turn into a one-character string: 'foobar'.each_byte { |x| puts "#{x} = #{x.chr}" }
# 102 = f
# 111 = o
# 111 = o
# 98 = b
# 97 = a
# 114 = r
String#scan to yield each character of a string as a new one-character string: 'foobar'.scan( /./ ) { |c| puts c }
# f
# o
# o
# b
# a
# rString#each method would iterate over the sequence, the way
Array#each does. But String#each is actually used to split a string on a given record separator (by default, the newline): "foo\nbar".each { |x| puts x }
# foo
# bar
Array#each method is actually each_byte. A string stores its
characters as a sequence of Fixnum objects, and each_bytes yields that sequence.String#each_byte is faster than
String#scan, so if you're processing an ASCII file, you might want to use String#each_byte and convert to a string every number passed into the code block (as seen in the Solution).String#scan works by applying a given regular expression to a string, and yielding each match to the code block you provide. The regular expression /./ matches every character in the string, in turn.$KCODE variable set correctly, then the scan technique will work on UTF-8
strings as well. This is the simplest way to sneak a notion of "character" into Ruby's byte-based strings.french = "\xc3\xa7a va"
String#scan changes when you make the regular expression Unicodeaware, or set $KCODE so that Ruby handles all strings as UTF-8: french.scan(/./) { |c| puts c }
#
#
# a
#
# v
# a
french.scan(/./u) { |c| puts c }
# ç
# a
#
# v
# a
$KCODE = 'u'
french.scan(/./) { |c| puts c }
# ç
# a
#
# v
# aString#scan. Every word it finds, it will yield to a code block. The
word_count method defined below takes a piece of text and creates a histogram of word frequencies. Its regular expression considers a "word" to be a string of Ruby identifier characters: letters, numbers, and underscores. class String
def
word_count
frequencies = Hash.new(0)
downcase.scan(/\w+/) { |word| frequencies[word] += 1 }
return frequencies
end
end
%{Dogs dogs dog dog dogs.}.word_count
# => {"dogs"=>3, "dog"=>2}
%{"I have no shame," I said.}.word_count
# => {"no"=>1, "shame"=>1, "have"=>1, "said"=>1, "i"=>2}/\w+/ is nice and simple, but you can probably do better for your application's definition of "word." You probably don't consider two
words separated by an underscore to be a single word. Some English
words, like "pan-fried" and "fo'c'sle", contain embedded punctuation. Here are a few more definitions of "word" in regular expression form:# Just like /\w+/, but doesn't consider underscore part of a word. /[0-9A-Za-z]/ # Anything that's not whitespace is a word. /[^\S]+/ # Accept dashes and apostrophes as parts of words. /[-'\w]+/ # A pretty good heuristic for matching English words. /(\w+([-'.]\w+)*/
s = 'HELLO, I am not here. I WENT to tHe MaRKEt.' s. upcase # => "HELLO, I AM NOT HERE. I WENT TO THE MARKET." s. downcase # => "hello, i am not here. i went to the market." s.swapcase # => "hello, i AM NOT HERE. i went TO ThE mArkeT." s.capitalize # => "Hello, i am not here. i went to the market."
upcase and
downcase methods force all letters in the string to upper-or lowercase, respectively. The swapcase method transforms uppercase letters into lowercase letters and vice versa. The capitalize method makes the first character of the string uppercase, if it's a letter, and makes all other letters in the string lowercase.upcase!, downcase!, swapcase!, and capitalize!. Assuming you don't need the original string, these methods will save memory, especially if the string is large.un_banged = 'Hello world.' un_banged.upcase # => "HELLO WORLD." un_banged # => "Hello world." banged = 'Hello world.' banged.upcase! # => "HELLO WORLD." banged # => "HELLO WORLD."
capitalize! method. If you want something more like capitalize, you can create a new string out of the old one.class String def capitalize_first_letter self[0].chr.capitalize + self[1, size] end def capitalize_first_letter! unless self[0] == (c = self[0,1].upcase[0]) self[0] = c self end # Return nil if no change was made, like upcase! et al. end end s = 'i told Alice. She remembers now.' s.capitalize_first_letter # => "I told Alice. She remembers now." s # => "i told Alice. She remembers now." s.capitalize_first_letter! s # => "I told Alice. She remembers now."
strip to remove whitespace from the beginning and end of a string:" \tWhitespace at beginning and end. \t\n\n". strip
ljust,
rjust, and
center:s = "Some text." s. center(15) s. ljust(15) s. rjust(15)
gsub method with a string or regular expression to make more complex changes, such as to replace one type of whitespace with another. #Normalize Ruby source code by replacing tabs with spaces
rubyCode.gsub("\t", " ")
#Transform Windows-style newlines to Unix-style newlines
"Line one\n\rLine two\n\r".gsub(\n\r", "\n")
# => "Line one\nLine two\n"
#Transform all runs of whitespace into a single space character
"\n\rThis string\t\t\tuses\n all\tsorts\nof whitespace.".gsub(/\s+/," ")
# => " This string uses all sorts of whitespace."\t), newline (\n), linefeed (\r), and form feed (\f). The regular expression /\s/ matches any one character from that set. The strip method strips any combination of those characters from the beginning or end of a
string.\b or \010) and vertical tab (\v or \012). These are not part of the \s character group in a regular expression, so use a custom character group to catch these characters." \bIt's whitespace, Jim,\vbut not as we know it.\n".gsub(/[\s\b\v]+/, " ") # => "It's whitespace, Jim, but not as we know it."
lstrip or rstrip method:s = " Whitespace madness! " s.lstrip # => "Whitespace madness! " s.rstrip # => " Whitespace madness!"
center, ljust, and rjust) take a single argument: the total length of the string they should return, counting the original string and any added whitespace. If
to_str method.'A string'.respond_to? :to_str # => true Exception.new.respond_to? :to_str # => true 4.respond_to? :to_str # => false
String you're thinking about calling. If the object defines that method, the right thing to do is usually to go ahead and call the method. This will make your code work in more places: def join_to_successor(s)
raise ArgumentError, 'No successor method!' unless s.respond_to? :succ
return "#{s}#{s.succ}"
end
join_to_successor('a') # => "ab"
join_to_successor(4) # => "45"
join_to_successor(4.01)
# ArgumentError: No successor method!s.is_a? String instead of s.respond_to? :succ, then I wouldn't have been able to call join_to_successor on an integer.obj.is_a? String will tell you whether an object derives from the String class, but it will overlook objects that, though intended to be used as strings, don't inherit from String.Exceptions, for instance, are essentially strings that have extra information associated with them. But they don't subclass class name "String". Code that uses is_a? String to check for stringness will overlook the essential stringness of Exceptions. Many add-on Ruby modules define other classes that can act as strings: code that calls is_a? String will break when given an instance of one of those classes.respond_to? instead of checking the class. This lets a future user (possibly yourself!) create new classes that offer the same capability, without being tied down to the preexisting class structure. All you have to do is make the method names match up.
slice method, or use the array index operator (that is, call the [] method). Either method accepts a Range describing which characters to retrieve, or two Fixnum arguments: the index at which to start, and the length of the substring to be extracted.s = 'My kingdom for a string!' s. slice(3,7) # => "kingdom" s[3,7] # => "kingdom" s[0,3] # => "My " s[11, 5] # => "for a" s[11, 17] # => "for a string!"
slice or []:s[/.ing/] # => "king" s[/str.*/] # => "string!"
Fixnum, pass only one argument (the zerobased index of the character) into String#slice or [] method. To access a specific byte as a single-character string, pass in its index and the number 1.s.slice(3) # => 107 s[3] # => 107 107.chr # => "k" s.slice(3,1) # => "k" s[3,1] # => "k"
s.slice(-7,3) # => "str" s[-7,6] # => "string"
slice or [] will return the entire string after that point. This leads to a simple shortcut for getting the rightmost portion of a string:s[15…s.length] # => "a string!"
$KCODE='u' require 'jcode'
$ ruby -Ku -rjcode
#!/usr/bin/ruby -Ku -rjcode
jcode library overrides most of the methods of String and makes them capable of handling multibyte text. The exceptions are String#length, String#count, and String#size, which are not overridden. Instead jcode defines three new methods: String#jlength, string#jcount, and String#jsize.efbca1 (A), efbca2 (B), and so on up to UTF-8 efbca6 (F):string = "\xef\xbc\xa1" + "\xef\xbc\xa2" + "\xef\xbc\xa3" + "\xef\xbc\xa4" + "\xef\xbc\xa5" + "\xef\xbc\xa6"
string.size # => 18 string.jsize # => 6
String#count is a method that takes a strong of bytes, and counts how many times those bytes occurs in the string. String#jcount takes a string of characters and counts how many times those characters occur in the string:string.count "\xef\xbc\xa2" # => 13 string.jcount "\xef\xbc\xa2" # => 1
"\xef\xbc\xa2" as three separate bytes, and counts the number of times each of those bytes shows up in the string. String#jcount treats the same string as a single character, and looks for that character in the string, finding it only once."\xef\xbc\xa2".length # => 3 "\xef\xbc\xa2".jlength # => 1
def wrap(s, width=78)
s.gsub(/(.{1,#{width}})(\s+|\Z)/, "\\1\n")
end
wrap("This text is too short to be wrapped.")
# => "This text is too short to be wrapped.\n"
puts wrap("This text is not too short to be wrapped.", 20)
# This text is not too
# short to be wrapped.
puts wrap("These ten-character columns are stifling my creativity!", 10)
# These
# ten-character
# columns
# are
# stifling
# my
# creativity! poetry = %q{It is an ancient Mariner,
And he stoppeth one of three.
"By thy long beard and glittering eye,
Now wherefore stopp'st thou me?}
puts wrap(poetry, 20)
# It is an ancient
# Mariner,
# And he stoppeth one
# of three.
# "By thy long beard
# and glittering eye,
# Now wherefore
# stopp'st thou me? prose = %q{I find myself alone these days, more often than not,
watching the rain run down nearby windows. How long has it been
raining? The newspapers now print the total, but no one reads them
anymore.}
puts wrap(prose, 60)
# I find myself alone these days, more often than not,
# watching the rain run down nearby windows. How long has it
# been
# raining? The newspapers now print the total, but no one
# reads them
# anymore. def reformat_wrapped(s, width=78)
s.gsub(/\s+/, " ").gsub(/(.{1,#{width}})( |\Z)/, "\\1\n")
end
Range#each, as you would for numbers: ('aa'..'ag').each { |x| puts x }
# aa
# ab
# ac
# ad
# ae
# af
# agString#succ. If you don't know the end point of your succession, you can define a generator that uses succ, and break from the generator when you're done.def endless_string_succession(start) while true yield start start = start.succ end end
endless_string_succession('fol') do |x|
puts x
break if x[-1] == x[-2]
end
# fol
# fom
# fon
# foo'89999'.succ # => "90000" 'nzzzz'.succ # => "oaaaa"
'Zzz'.succ # => "AAaa"
'z'.succ # => "aa" 'aa'.succ # => "ab" 'zz'.succ # => "aaa"
=~ operator tests a string against a regular expression:string = 'This is a 30-character string.' if string =~ /([0-9]+)-character/ and $1.to_i == string.length "Yes, there are #$1 characters in that string." end # => "Yes, there are 30 characters in that string."
Regexp#match: match = Regexp.compile('([0-9]+)-character').match(string)
if match && match[1].to_i == string.length
"Yes, there are #{match[1]} characters in that string."
end
# => "Yes, there are 30 characters in that string."case statement:string = "123" case string when /^[a-zA-Z]+$/ "Letters" when /^[0-9]+$/ "Numbers" else "Mixed" end # => "Numbers"
sed, but Perl was the first general-purpose programming language to include them. Now almost all modern languages have support for Perl-style regular expression.Regexp objects: /something/
Regexp.new("something")
Regexp.compile("something")
%r{something}|
Regexp::IGNORECASE
|
i
|
Makes matches case-insensitive.
|
|
Regexp::MULTILINE
|
m
|
Normally, a regexp matches against a single line of a string. This will cause a regexp to treat line breaks like any other character.
|
|
Regexp::EXTENDED
|
x
|
Regexp.union method to aggregate the regular expressions you want to match into one big regular expression that matches any of them. Pass the big regular expression into String#gsub, along with a code block that takes a MatchData object. You can detect which of your search terms actually triggered the regexp match, and choose the appropriate replacement term: class String
def mgsub(key_value_pairs=[].freeze)
regexp_fragments = key_value_pairs.collect { |k,v| k }
gsub(
Regexp.union(*regexp_fragments)) do |match|
key_value_pairs.detect{|k,v| k =~ match}[1]
end
end
end"GO HOME!".mgsub([[/.*GO/i, 'Home'], [/home/i, 'is where the heart is']]) # => "Home is where the heart is!"
"Here is number #123".mgsub([[/[a-z]/i, '#'], [/#/, 'P']]) # => "#### ## ###### P123"
gsub calls. The following examples, copied from the solution, show why this is often a bad idea:"GO HOME!".gsub(/.*GO/i, 'Home').gsub(/home/i, 'is where the heart is') # => "is where the heart is is where the heart is!" "Here is number #123".gsub(/[a-z]/i, "#").gsub(/#/, "P") # => "PPPP PP PPPPPP P123"
gsub call. Our replacement strings were themselves subject to search-and-replace. In the first example, the conflict can be fixed by reversing the order of the substitutions. The second example shows a case where reversing the order won't help. You need to do all your replacements in a single pass over the string.mgsub method will take a hash, but it's safer to pass in an array of key-value pairs. This is because elements in a hash come out in no particular order, so you can't control the order of substution. Here's a demonstration of the problem:"between".mgsub(/ee/ => 'AA', /e/ => 'E') # Bad code # => "bEtwEEn" "between".mgsub([[/ee/, 'AA'], [/e/, 'E']]) # Good code # => "bEtwAAn"
test_addresses = [ #The following are valid addresses according to RFC822. 'joe@example.com', 'joe.bloggs@mail.example.com', 'joe+ruby-mail@example.com', 'joe(and-mary)@example.museum', 'joe@localhost',
# Complete the list with some invalid addresses 'joe', 'joe@', '@example.com', 'joe@example@example.com', 'joe and mary@example.com' ]
valid = '[^ @]+' # Exclude characters always invalid in email addresses
username_and_machine = /^#{valid}@#{valid}$/
test_addresses.collect { |i| i =~ username_and_machine }
# => [0, 0, 0, 0, 0, nil, nil, nil, nil, nil] username_and_machine_with_tld = /^#{valid}@#{valid}\.#{valid}$/
test_addresses.collect { |i| i =~ username_and_machine_with_tld }
# => [0, 0, 0, 0, nil, nil, nil, nil, nil, nil]
Classifier library, available as the
classifier gem. It provides a naive Bayesian
classifier, and one that implements Latent Semantic Indexing, a more advanced technique.
Classifier::Bayes object with some classifications, and train it on text chunks whose classification is known: require 'rubygems'
require 'classifier'
classifier = Classifier::Bayes.new('Spam', 'Not spam')
classifier.train_spam 'are you in the market for viagra? we sell viagra'
classifier.train_not_spam 'hi there, are we still on for lunch?'classifier.classify "we sell the cheapest viagra on the market" # => "Spam" classifier.classify "lunch sounds great" # => "Not spam"
@categories variable below: classifier
# => #<Classifier::Bayes:0xb7cec7c8
# @categories={:"Not spam"=>
# { :lunch=>1, :for=>1, :there=>1,
# :"?"=>1, :still=>1, :","=>1 },
# :Spam=>
# { :market=>1, :for=>1, :viagra=>2, :"?"=>1, :sell=>1 }
# },
# @total_words=12>Fixnum) and large numbers (Bignum), but you don't usually have to worry about the difference. When you type in a number, Ruby sees how big it is and creates an object of the appropriate class.1000.class # => Fixnum 10000000000.class # => Bignum (2**30 - 1).class # => Fixnum (2**30).class # => Bignum
small = 1000 big = small ** 5 # => 1000000000000000 big.class # => Bignum smaller = big / big # => 1 smaller.class # => Fixnum
Float object instead of a
String#to_i to turn a string into an integer. Use
String#to_f to turn a string into a floating-point number.'400'.to_i # => 400 '3.14'.to_f # => 3.14 '1.602e-19'.to_f # => 1.602e-19
to_i and to_f, there are other ways to convert strings into
numbers. If you have a string that represents a hex or octal string, you can call String#hex or String#oct to get the decimal equivalent. This is the same as passing the base of the number into to_i:'405'.oct # => 261 '405'.to_i(8) # => 261 '405'.hex # => 1029 '405'.to_i(16) # => 1029 'fed'.hex # => 4077 'fed'.to_i(16) # => 4077
to_i, to_f, hex,or oct find a character that can't be part of the kind of number they're looking for, they stop processing the string at that character and return the number so far. If the string's first character is unusable, the result is zero."13: a baker's dozen".to_i # => 13 '1001 Nights'.to_i # => 1001 'The 1000 Nights and a Night'.to_i # => 0 '60.50 Misc. Agricultural Equipment'.to_f # => 60.5 '$60.50'.to_f # => 0.0 'Feed the monster!'.hex # => 65261 'I fed the monster at Canoga Park Waterslides'.hex # => 0 '0xA2Z'.hex # => 162 '-10'.oct # => -8 '-109'.oct # => -8 '3.14'.to_i # => 3
1.8 + 0.1 # => 1.9 1.8 + 0.1 == 1.9 # => false 1.8 + 0.1 > 1.9 # => true
BigDecimal numbers instead of floats (see Recipe 2.3).
BigDecima