A Tour of Ruby

This section is a guided, but meandering, tour through some of the most interesting features of Ruby. Everything discussed here will be documented in detail later in the book, but this first look will give you the flavor of the language.

Ruby Is Object-Oriented

We’ll begin with the fact that Ruby is a completely object-oriented language. Every value is an object, even simple numeric literals and the values true, false, and nil (nil is a special value that indicates the absence of value; it is Ruby’s version of null). Here we invoke a method named class on these values. Comments begin with # in Ruby, and the => arrows in the comments indicate the value returned by the commented code (this is a convention used throughout this book):

1.class      # => Fixnum: the number 1 is a Fixnum
0.0.class    # => Float: floating-point numbers have class Float
true.class   # => TrueClass: true is a the singleton instance of TrueClass
false.class  # => FalseClass
nil.class    # => NilClass

In many languages, function and method invocations require parentheses, but there are no parentheses in any of the code above. In Ruby, parentheses are usually optional and they are commonly omitted, especially when the method being invoked takes no arguments. The fact that the parentheses are omitted in the method invocations here makes them look like references to named fields or named variables of the object. This is intentional, but the fact is, Ruby is very strict about encapsulation of its objects; there is no access to the internal state of an object from outside the object. Any such access must be mediated by an accessor method, such as the class method shown above.

Blocks and Iterators

The fact that we can invoke methods on integers isn’t just an esoteric aspect of Ruby. It is actually something that Ruby programmers do with some frequency:

3.times { print "Ruby! " }   # Prints "Ruby! Ruby! Ruby! "
1.upto(9) {|x| print x }     # Prints "123456789"

times and upto are methods implemented by integer objects. They are a special kind of method known as an iterator, and they behave like loops. The code within curly braces—known as a block—is associated with the method invocation and serves as the body of the loop. The use of iterators and blocks is another notable feature of Ruby; although the language does support an ordinary while loop, it is more common to perform loops with constructs that are actually method calls.

Integers are not the only values that have iterator methods. Arrays (and similar “enumerable” objects) define an iterator named each, which invokes the associated block once for each element in the array. Each invocation of the block is passed a single element from the array:

a = [3, 2, 1]     # This is an array literal
a[3] = a[2] - 1   # Use square brackets to query and set array elements
a.each do |elt|   # each is an iterator. The block has a parameter elt
  print elt+1     # Prints "4321"
end               # This block was delimited with do/end instead of {}

Various other useful iterators are defined on top of each:

a = [1,2,3,4]                # Start with an array
b = a.map {|x| x*x }         # Square elements: b is [1,4,9,16]
c = a.select {|x| x%2==0 }   # Select even elements: c is [2,4]
a.inject do |sum,x|          # Compute the sum of the elements => 10
  sum + x 
end

Hashes, like arrays, are a fundamental data structure in Ruby. As their name implies, they are based on the hashtable data structure and serve to map arbitrary key objects to value objects. (To put this another way, we can say that a hash associates arbitrary value objects with key objects.) Hashes use square brackets, like arrays do, to query and set values in the hash. Instead of using an integer index, they expect key objects within the square brackets. Like the Array class, the Hash class also defines an each iterator method. This method invokes the associated block of code once for each key/value pair in the hash, and (this is where it differs from Array) passes both the key and the value as parameters to the block:

h = {                         # A hash that maps number names to digits
  :one => 1,                  # The "arrows" show mappings: key=>value
  :two => 2                   # The colons indicate Symbol literals
}  
h[:one]                       # => 1.  Access a value by key
h[:three] = 3                 # Add a new key/value pair to the hash
h.each do |key,value|         # Iterate through the key/value pairs
  print "#{value}:#{key}; "   # Note variables substituted into string 
end                           # Prints "1:one; 2:two; 3:three; "

Ruby’s hashes can use any object as a key, but Symbol objects are the most commonly used. Symbols are immutable, interned strings. They can be compared by identity rather than by textual content (because two distinct Symbol objects will never have the same content).

The ability to associate a block of code with a method invocation is a fundamental and very powerful feature of Ruby. Although its most obvious use is for loop-like constructs, it is also useful for methods that only invoke the block once. For example:

File.open("data.txt") do |f| # Open named file and pass stream to block
  line = f.readline          # Use the stream to read from the file
end                          # Stream automatically closed at block end

t = Thread.new do       # Run this block in a new thread
  File.read("data.txt") # Read a file in the background
end                     # File contents available as thread value

As an aside, notice that the Hash.each example previously included this interesting line of code:

print "#{value}:#{key}; "    # Note variables substituted into string

Double-quoted strings can include arbitrary Ruby expressions delimited by #{ and }. The value of the expression within these delimiters is converted to a string (by calling its to_s method, which is supported by all objects). The resulting string is then used to replace the expression text and its delimiters in the string literal. This substitution of expression values into strings is usually called string interpolation.

Expressions and Operators in Ruby

Ruby’s syntax is expression-oriented. Control structures such as if that would be called statements in other languages are actually expressions in Ruby. They have values like other simpler expressions do, and we can write code like this:

minimum = if x < y then x else y end

Although all “statements” in Ruby are actually expressions, they do not all return meaningful values. while loops and method definitions, for example, are expressions that normally return the value nil.

As in most languages, expressions in Ruby are usually built out of values and operators. For the most part, Ruby’s operators will be familiar to anyone who knows C, Java, JavaScript, or any similar programming language. Here are examples of some commonplace and some more unusual Ruby operators:

1 + 2                    # => 3: addition
1 * 2                    # => 2: multiplication
1 + 2 == 3               # => true: == tests equality
2 ** 1024                # 2 to the power 1024: Ruby has arbitrary size ints
"Ruby" + " rocks!"       # => "Ruby rocks!": string concatenation
"Ruby! " * 3             # => "Ruby! Ruby! Ruby! ": string repetition
"%d %s" % [3, "rubies"]  # => "3 rubies": Python-style, printf formatting
max = x > y ? x : y      # The conditional operator

Many of Ruby’s operators are implemented as methods, and classes can define (or redefine) these methods however they want. (They can’t define completely new operators, however; there is only a fixed set of recognized operators.) As examples, notice that the + and * operators behave differently for integers and strings. And you can define these operators any way you want in your own classes. The << operator is another good example. The integer classes Fixnum and Bignum use this operator for the bitwise left-shift operation, following the C programming language. At the same time (following C++), other classes—such as strings, arrays, and streams—use this operator for an append operation. If you create a new class that can have values appended to it in some way, it is a very good idea to define <<.

One of the most powerful operators to override is []. The Array and Hash classes use this operator to access array elements by index and hash values by key. But you can define [] in your classes for any purpose you want. You can even define it as a method that expects multiple arguments, comma-separated between the square brackets. (The Array class accepts an index and a length between the square brackets to indicate a subarray or “slice” of the array.) And if you want to allow square brackets to be used on the lefthand side of an assignment expression, you can define the corresponding []= operator. The value on the righthand side of the assignment will be passed as the final argument to the method that implements this operator.

Methods

Methods are defined with the def keyword. The return value of a method is the value of the last expression evaluated in its body:

def square(x)   # Define a method named square with one parameter x
  x*x           # Return x squared
end             # End of the method

When a method, like the one above, is defined outside of a class or a module, it is effectively a global function rather than a method to be invoked on an object. (Technically, however, a method like this becomes a private method of the Object class.) Methods can also be defined on individual objects by prefixing the name of the method with the object on which it is defined. Methods like these are known as singletonmethods, and they are how Ruby defines class methods:

def Math.square(x)  # Define a class method of the Math module
  x*x
end

The Math module is part of the core Ruby library, and this code adds a new method to it. This is a key feature of Ruby—classes and modules are “open” and can be modified and extended at runtime.

Method parameters may have default values specified, and methods may accept arbitrary numbers of arguments.

Assignment

The (nonoverridable) = operator in Ruby assigns a value to a variable:

x = 1

Assignment can be combined with other operators such as + and -:

x += 1          # Increment x: note Ruby does not have ++.
y -= 1          # Decrement y: no -- operator, either.

Ruby supports parallel assignment, allowing more than one value and more than one variable in assignment expressions:

x, y = 1, 2     # Same as x = 1; y = 2
a, b = b, a     # Swap the value of two variables
x,y,z = [1,2,3] # Array elements automatically assigned to variables

Methods in Ruby are allowed to return more than one value, and parallel assignment is helpful in conjunction with such methods. For example:

# Define a method to convert Cartesian (x,y) coordinates to Polar
def polar(x,y)
  theta = Math.atan2(y,x)   # Compute the angle
  r = Math.hypot(x,y)       # Compute the distance
  [r, theta]                # The last expression is the return value
end

# Here's how we use this method with parallel assignment
distance, angle = polar(2,2)

Methods that end with an equals sign (=) are special because Ruby allows them to be invoked using assignment syntax. If an object o has a method named x=, then the following two lines of code do the very same thing:

o.x=(1)         # Normal method invocation syntax
o.x = 1         # Method invocation through assignment

Punctuation Suffixes and Prefixes

We saw previously that methods whose names end with = can be invoked by assignment expressions. Ruby methods can also end with a question mark or an exclamation point. A question mark is used to mark predicates—methods that return a Boolean value. For example, the Array and Hash classes both define methods named empty? that test whether the data structure has any elements. An exclamation mark at the end of a method name is used to indicate that caution is required with the use of the method. A number of core Ruby classes define pairs of methods with the same name, except that one ends with an exclamation mark and one does not. Usually, the method without the exclamation mark returns a modified copy of the object it is invoked on, and the one with the exclamation mark is a mutator method that alters the object in place. The Array class, for example, defines methods sort and sort!.

In addition to these punctuation characters at the end of method names, you’ll notice punctuation characters at the start of Ruby variable names: global variables are prefixed with $, instance variables are prefixed with @, and class variables are prefixed with @@. These prefixes can take a little getting used to, but after a while you may come to appreciate the fact that the prefix tells you the scope of the variable. The prefixes are required in order to disambiguate Ruby’s very flexible grammar. One way to think of variable prefixes is that they are one price we pay for being able to omit parentheses around method invocations.

Regexp and Range

We mentioned arrays and hashes earlier as fundamental data structures in Ruby. We demonstrated the use of numbers and strings as well. Two other datatypes are worth mentioning here. A Regexp (regular expression) object describes a textual pattern and has methods for determining whether a given string matches that pattern or not. And a Range represents the values (usually integers) between two endpoints. Regular expressions and ranges have a literal syntax in Ruby:

/[Rr]uby/        # Matches "Ruby" or "ruby"
/\d{5}/          # Matches 5 consecutive digits
1..3             # All x where 1 <= x <= 3
1...3            # All x where 1 <= x < 3

Regexp and Range objects define the normal == operator for testing equality. In addition, they also define the === operator for testing matching and membership. Ruby’s case statement (like the switch statement of C or Java) matches its expression against each of the possible cases using ===, so this operator is often called the case equality operator. It leads to conditional tests like these:

# Determine US generation name based on birth year
# Case expression tests ranges with ===
generation = case birthyear
             when 1946..1963: "Baby Boomer"
             when 1964..1976: "Generation X"
             when 1978..2000: "Generation Y"
             else nil
             end

# A method to ask the user to confirm something
def are_you_sure?                  # Define a method. Note question mark!
  while true                       # Loop until we explicitly return
    print "Are you sure? [y/n]: "  # Ask the user a question
    response = gets                # Get her answer
    case response                  # Begin case conditional
    when /^[yY]/                   # If response begins with y or Y
      return true                  # Return true from the method
    when /^[nN]/, /^$/             # If response begins with n,N or is empty
      return false                 # Return false
    end
  end
end

Classes and Modules

A class is a collection of related methods that operate on the state of an object. An object’s state is held by its instance variables: variables whose names begin with @ and whose values are specific to that particular object. The following code defines an example class named Sequence and demonstrates how to write iterator methods and define operators:

#
# This class represents a sequence of numbers characterized by the three
# parameters from, to, and by. The numbers x in the sequence obey the
# following two constraints:
#
#    from <= x <= to
#    x = from + n*by, where n is an integer
# 
class Sequence
  # This is an enumerable class; it defines an each iterator below.
  include Enumerable   # Include the methods of this module in this class

  # The initialize method is special; it is automatically invoked to
  # initialize newly created instances of the class
  def initialize(from, to, by)
    # Just save our parameters into instance variables for later use
    @from, @to, @by = from, to, by  # Note parallel assignment and @ prefix
  end

  # This is the iterator required by the Enumerable module
  def each
    x = @from       # Start at the starting point
    while x <= @to  # While we haven't reached the end
      yield x       # Pass x to the block associated with the iterator
      x += @by      # Increment x
    end
  end

  # Define the length method (following arrays) to return the number of
  # values in the sequence
  def length
    return 0 if @from > @to       # Note if used as a statement modifier 
    Integer((@to-@from)/@by) + 1  # Compute and return length of sequence
  end

  # Define another name for the same method.
  # It is common for methods to have multiple names in Ruby
  alias size length  # size is now a synonym for length

  # Override the array-access operator to give random access to the sequence
  def[](index)
    return nil if index < 0 # Return nil for negative indexes
    v = @from + index*@by   # Compute the value
    if v <= @to             # If it is part of the sequence
      v                     # Return it
    else                    # Otherwise...
      nil                   # Return nil
    end
  end

  # Override arithmetic operators to return new Sequence objects
  def *(factor)
    Sequence.new(@from*factor, @to*factor, @by*factor)
  end

  def +(offset)
    Sequence.new(@from+offset, @to+offset, @by)
  end
end

Here is some code that uses this Sequence class:

s = Sequence.new(1, 10, 2)  # From 1 to 10 by 2's
s.each {|x| print x }       # Prints "13579"
print s[s.size-1]           # Prints 9
t = (s+1)*2                 # From 4 to 22 by 4's

The key feature of our Sequence class is its each iterator. If we are only interested in the iterator method, there is no need to define the whole class. Instead, we can simply write an iterator method that accepts the from, to, and by parameters. Instead of making this a global function, let’s define it in a module of its own:

module Sequences                   # Begin a new module
  def self.fromtoby(from, to, by)  # A singleton method of the module
    x = from
    while x <= to
      yield x
      x += by
    end
  end
end

With the iterator defined this way, we write code like this:

Sequences.fromtoby(1, 10, 2) {|x| print x }  # Prints "13579"

An iterator like this makes it unnecessary to create a Sequence object to iterate a sequence of numbers. But the name of the method is quite long, and its invocation syntax is unsatisfying. What we really want is a way to iterate numeric Range objects by steps other than 1. One of the amazing features of Ruby is that its classes, even the built-in core classes, are open: any program can add methods to them. So we really can define a new iterator method for ranges:

class Range                  # Open an existing class for additions
  def by(step)               # Define an iterator named by
    x = self.begin           # Start at one endpoint of the range
    if exclude_end?          # For ... ranges that exclude the end
      while x < self.end     # Test with the < operator
        yield x
        x += step
      end
    else                     # Otherwise, for .. ranges that include the end
      while x <= self.end    # Test with <= operator
        yield x
        x += step
      end
    end
  end                        # End of method definition
end                          # End of class modification

# Examples
(0..10).by(2) {|x| print x}  # Prints "0246810"
(0...10).by(2) {|x| print x} # Prints "02468"

This by method is convenient but unnecessary; the Range class already defines an iterator named step that serves the same purpose. The core Ruby API is a rich one, and it is worth taking the time to study the platform (see Chapter 9) so you don’t end up spending time writing methods that have already been implemented for you!

Ruby Surprises

Every language has features that trip up programmers who are new to the language. Here we describe two of Ruby’s surprising features.

Ruby’s strings are mutable, which may be surprising to Java programmers in particular. The []= operator allows you to alter the characters of a string or to insert, delete, and replace substrings. The << operator allows you to append to a string, and the String class defines various other methods that alter strings in place. Because strings are mutable, string literals in a program are not unique objects. If you include a string literal within a loop, it evaluates to a new object on each iteration of the loop. Call the freeze method on a string (or on any object) to prevent any future modifications to that object.

Ruby’s conditionals and loops (such as if and while) evaluate conditional expressions to determine which branch to evaluate or whether to continue looping. Conditional expressions often evaluate to true or false, but this is not required. The value of nil is treated the same as false, and any other value is the same as true. This is likely to surprise C programmers who expect 0 to work like false, and JavaScript programmers who expect the empty string "" to be the same as false.