Buy this Book
Print Book $39.99 PDF $27.99 Read it Now! Print Book £24.99
Add to UK Cart
Reprint Licensing

The Ruby Programming Language
The Ruby Programming Language

By David Flanagan, Yukihiro Matsumoto
Book Price: $39.99 USD
£24.99 GBP
PDF Price: $27.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction
Ruby is a dynamic programming language with a complex but expressive grammar and a core class library with a rich and powerful API. Ruby draws inspiration from Lisp, Smalltalk, and Perl, but uses a grammar that is easy for C and Java™ programmers to learn. Ruby is a pure object-oriented language, but it is also suitable for procedural and functional programming styles. It includes powerful metaprogramming capabilities and can be used to create domain-specific languages or DSLs.
This section is a guided, but meandering, tour through some of the most interesting features of Ruby. Everything discussed here will be documented in detail later in the book, but this first look will give you the flavor of the language.
We’ll begin with the fact that Ruby is a completely
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Tour of Ruby
This section is a guided, but meandering, tour through some of the most interesting features of Ruby. Everything discussed here will be documented in detail later in the book, but this first look will give you the flavor of the language.
We’ll begin with the fact that Ruby is a completely object-oriented language. Every value is an object, even simple numeric literals and the values true, false, and nil (nil is a special value that indicates the absence of value; it is Ruby’s version of null). Here we invoke a method named class on these values. Comments begin with # in Ruby, and the => arrows in the comments indicate the value returned by the commented code (this is a convention used throughout this book):
1.class      # => Fixnum: the number 1 is a Fixnum
0.0.class    # => Float: floating-point numbers have class Float
true.class   # => TrueClass: true is a the singleton instance of TrueClass
false.class  # => FalseClass
nil.class    # => NilClass
In many languages, function and method invocations require parentheses, but there are no parentheses in any of the code above. In Ruby, parentheses are usually optional and they are commonly omitted, especially when the method being invoked takes no arguments. The fact that the parentheses are omitted in the method invocations here makes them look like references to named fields or named variables of the object. This is intentional, but the fact is, Ruby is very strict about encapsulation of its objects; there is no access to the internal state of an object from outside the object. Any such access must be mediated by an accessor method, such as the class method shown above.
The fact that we can invoke methods on integers isn’t just an esoteric aspect of Ruby. It is actually something that Ruby programmers do with some frequency:
3.times { print "Ruby! " }   # Prints "Ruby! Ruby! Ruby! "
1.upto(9) {|x| print x }     # Prints "123456789"
times and upto are methods implemented by integer objects. They are a special kind of method known as an
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Try Ruby
We hope our tour of Ruby’s key features has piqued your interest and you are eager to try Ruby out. To do that, you’ll need a Ruby interpreter, and you’ll also want to know how to use three tools—irb, ri, and gem—that are bundled with the interpreter. This section explains how to get and use them.
The official web site for Ruby is . If Ruby is not already on your computer, you can follow the download link on the ruby-lang.org home page for instructions on downloading and installing the standard C-based reference implementation of Ruby.
Once you have Ruby installed, you can invoke the Ruby interpreter with the ruby :
% ruby -e 'puts "hello world!"'
hello world!
The -e command-line option causes the interpreter to execute a single specified line of Ruby code. More commonly, you’d place your Ruby program in a file and tell the interpreter to invoke it:
% ruby hello.rb
hello world!
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
About This Book
As its title implies, this book covers the Ruby programming language and aspires to do so comprehensively and accessibly. This edition of the book covers language versions 1.8 and 1.9. Ruby blurs the distinction between language and platform, and so our coverage of the language includes a detailed overview of the core Ruby API. But this book is not an API reference and does not cover the core classes comprehensively. Also, this is not a book about Ruby frameworks (like Rails), nor a book about Ruby tools (like rake and gem).
This chapter concludes with a heavily commented extended example demonstrating a nontrivial Ruby program. The chapters that follow cover Ruby from the bottom up:
  • covers the lexical and syntactic structure of Ruby, including basic issues like character set, case sensitivity, and reserved words.
  • explains the kinds of data—numbers, strings, ranges, arrays, and so on—that Ruby programs can manipulate, and it covers the basic features of all Ruby objects.
  • covers primary expressions in Ruby—literals, variable references, invocations, and assignments—and it explains the operators used to combine primary expressions into compound expressions.
  • explains conditionals, loops (including blocks and iterator methods), exceptions, and the other Ruby expressions that would be called statements or control structures in other languages.
  • formally documents Ruby’s method definition and invocation syntax, and it also covers the invocable objects known as procs and lambdas. This chapter includes an explanation of closures and an exploration of functional programming techniques in Ruby.
  • explains how to define classes and modules in Ruby. Classes are fundamental to object-oriented programming, and this chapter also covers topics such as inheritance, method visibility, mixin modules, and the method name resolution algorithm.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Sudoku Solver in Ruby
This chapter concludes with a nontrivial Ruby application to give you a better idea of what Ruby programs actually look like. We’ve chosen a Sudoku solver as a good short to medium-length program that demonstrates a number of features of Ruby. Don’t expect to understand every detail of , but do read through the code; it is very thoroughly commented, and you should have little difficulty following along.
Example . A Sudoku solver in Ruby
#
# This module defines a Sudoku::Puzzle class to represent a 9x9
# Sudoku puzzle and also defines exception classes raised for 
# invalid input and over-constrained puzzles. This module also defines 
# the method Sudoku.solve to solve a puzzle. The solve method uses
# the Sudoku.scan method, which is also defined here.
# 
# Use this module to solve Sudoku puzzles with code like this:
#
#  require 'sudoku'
#  puts Sudoku.solve(Sudoku::Puzzle.new(ARGF.readlines))
#
module Sudoku

  #
  # The Sudoku::Puzzle class represents the state of a 9x9 Sudoku puzzle.
  # 
  # Some definitions and terminology used in this implementation: 
  #
  # - Each element of a puzzle is called a "cell".
  # - Rows and columns are numbered from 0 to 8, and the coordinates [0,0]
  #   refer to the cell in the upper-left corner of the puzzle.
  # - The nine 3x3 subgrids are known as "boxes" and are also numbered from
  #   0 to 8, ordered from left to right and top to bottom. The box in
  #   the upper-left is box 0. The box in the upper-right is box 2. The
  #   box in the middle is box 4. The box in the lower-right is box 8.
  # 
  # Create a new puzzle with Sudoku::Puzzle.new, specifying the initial
  # state as a string or as an array of strings. The string(s) should use
  # the characters 1 through 9 for the given values, and '.' for cells
  # whose value is unspecified. Whitespace in the input is ignored.
  #
  # Read and write access to individual cells of the puzzle is through the
  # [] and []= operators, which expect two-dimensional [row,column] indexing.
  # These methods use numbers (not characters) 0 to 9 for cell contents.
  # 0 represents an unknown value.
  # 
  # The has_duplicates? predicate returns true if the puzzle is invalid
  # because any row, column, or box includes the same digit twice.
  #
  # The each_unknown method is an iterator that loops through the cells of
  # the puzzle and invokes the associated block once for each cell whose
  # value is unknown.
  #
  # The possible method returns an array of integers in the range 1..9.
  # The elements of the array are the only values allowed in the specified
  # cell. If this array is empty, then the puzzle is over-specified and 
  # cannot be solved. If the array has only one element, then that element
  # must be the value for that cell of the puzzle.
  #
  class Puzzle

    # These constants are used for translating between the external 
    # string representation of a puzzle and the internal representation.
    ASCII = ".123456789"
    BIN = "\000\001\002\003\004\005\006\007\010\011"

    # This is the initialization method for the class. It is automatically
    # invoked on new Puzzle instances created with Puzzle.new. Pass the input
    # puzzle as an array of lines or as a single string. Use ASCII digits 1
    # to 9 and use the '.' character for unknown cells. Whitespace, 
    # including newlines, will be stripped.
    def initialize(lines)
      if (lines.respond_to? :join)  # If argument looks like an array of lines
        s = lines.join              # Then join them into a single string
      else                          # Otherwise, assume we have a string
        s = lines.dup               # And make a private copy of it
      end

      # Remove whitespace (including newlines) from the data
      # The '!' in gsub! indicates that this is a mutator method that
      # alters the string directly rather than making a copy.
      s.gsub!(/\s/, "")  # /\s/ is a Regexp that matches any whitespace

      # Raise an exception if the input is the wrong size.
      # Note that we use unless instead of if, and use it in modifier form.
      raise Invalid, "Grid is the wrong size" unless s.size == 81
      
      # Check for invalid characters, and save the location of the first.
      # Note that we assign and test the value assigned at the same time.
      if i = s.index(/[^123456789\.]/)
        # Include the invalid character in the error message.
        # Note the Ruby expression inside #{} in string literal.
        raise Invalid, "Illegal character #{s[i,1]} in puzzle"
      end

      # The following two lines convert our string of ASCII characters
      # to an array of integers, using two powerful String methods.
      # The resulting array is stored in the instance variable @grid
      # The number 0 is used to represent an unknown value.
      s.tr!(ASCII, BIN)      # Translate ASCII characters into bytes
      @grid = s.unpack('c*') # Now unpack the bytes into an array of numbers

      # Make sure that the rows, columns, and boxes have no duplicates.
      raise Invalid, "Initial puzzle has duplicates" if has_duplicates?
    end

    # Return the state of the puzzle as a string of 9 lines with 9 
    # characters (plus newline) each.  
    def to_s
      # This method is implemented with a single line of Ruby magic that
      # reverses the steps in the initialize() method. Writing dense code
      # like this is probably not good coding style, but it demonstrates
      # the power and expressiveness of the language.
      #
      # Broken down, the line below works like this:
      # (0..8).collect invokes the code in curly braces 9 times--once
      # for each row--and collects the return value of that code into an
      # array. The code in curly braces takes a subarray of the grid
      # representing a single row and packs its numbers into a string.
      # The join() method joins the elements of the array into a single
      # string with newlines between them. Finally, the tr() method
      # translates the binary string representation into ASCII digits.
      (0..8).collect{|r| @grid[r*9,9].pack('c9')}.join("\n").tr(BIN,ASCII)
    end

    # Return a duplicate of this Puzzle object.
    # This method overrides Object.dup to copy the @grid array.
    def dup
      copy = super       # Make a shallow copy by calling Object.dup
      @grid = @grid.dup  # Make a new copy of the internal data 
      copy               # Return the copied object
    end

    # We override the array access operator to allow access to the 
    # individual cells of a puzzle. Puzzles are two-dimensional,
    # and must be indexed with row and column coordinates.
    def [](row, col)
      # Convert two-dimensional (row,col) coordinates into a one-dimensional
      # array index and get and return the cell value at that index
      @grid[row*9 + col]
    end

    # This method allows the array access operator to be used on the 
    # lefthand side of an assignment operation. It sets the value of 
    # the cell at (row, col) to newvalue.
    def []=(row, col, newvalue)
      # Raise an exception unless the new value is in the range 0 to 9.
      unless (0..9).include? newvalue
        raise Invalid, "illegal cell value" 
      end
      # Set the appropriate element of the internal array to the value.
      @grid[row*9 + col] = newvalue
    end

    # This array maps from one-dimensional grid index to box number.
    # It is used in the method below. The name BoxOfIndex begins with a 
    # capital letter, so this is a constant. Also, the array has been
    # frozen, so it cannot be modified.
    BoxOfIndex = [
      0,0,0,1,1,1,2,2,2,0,0,0,1,1,1,2,2,2,0,0,0,1,1,1,2,2,2,
      3,3,3,4,4,4,5,5,5,3,3,3,4,4,4,5,5,5,3,3,3,4,4,4,5,5,5,
      6,6,6,7,7,7,8,8,8,6,6,6,7,7,7,8,8,8,6,6,6,7,7,7,8,8,8
    ].freeze

    # This method defines a custom looping construct (an "iterator") for
    # Sudoku puzzles.  For each cell whose value is unknown, this method
    # passes ("yields") the row number, column number, and box number to the 
    # block associated with this iterator.
    def each_unknown
      0.upto 8 do |row|             # For each row
        0.upto 8 do |col|           # For each column
          index = row*9+col         # Cell index for (row,col)
          next if @grid[index] != 0 # Move on if we know the cell's value 
          box = BoxOfIndex[index]   # Figure out the box for this cell
          yield row, col, box       # Invoke the associated block
        end
      end
    end

    # Returns true if any row, column, or box has duplicates.
    # Otherwise returns false. Duplicates in rows, columns, or boxes are not
    # allowed in Sudoku, so a return value of true means an invalid puzzle.
    def has_duplicates?
      # uniq! returns nil if all the elements in an array are unique.
      # So if uniq! returns something then the board has duplicates.
      0.upto(8) {|row| return true if rowdigits(row).uniq! }
      0.upto(8) {|col| return true if coldigits(col).uniq! }
      0.upto(8) {|box| return true if boxdigits(box).uniq! }
      
      false  # If all the tests have passed, then the board has no duplicates
    end

    # This array holds a set of all Sudoku digits. Used below.
    AllDigits = [1, 2, 3, 4, 5, 6, 7, 8, 9].freeze

    # Return an array of all values that could be placed in the cell 
    # at (row,col) without creating a duplicate in the row, column, or box.
    # Note that the + operator on arrays does concatenation but that the - 
    # operator performs a set difference operation.
    def possible(row, col, box)
      AllDigits - (rowdigits(row) + coldigits(col) + boxdigits(box))
    end

    private  # All methods after this line are private to the class

    # Return an array of all known values in the specified row.
    def rowdigits(row)
      # Extract the subarray that represents the row and remove all zeros.
      # Array subtraction is set difference, with duplicate removal.
      @grid[row*9,9] - [0]
    end

    # Return an array of all known values in the specified column.
    def coldigits(col)
      result = []                # Start with an empty array
      col.step(80, 9) {|i|       # Loop from col by nines up to 80
        v = @grid[i]             # Get value of cell at that index
        result << v if (v != 0)  # Add it to the array if non-zero
      }
      result                     # Return the array
    end

    # Map box number to the index of the upper-left corner of the box.
    BoxToIndex = [0, 3, 6, 27, 30, 33, 54, 57, 60].freeze

    # Return an array of all the known values in the specified box.
    def boxdigits(b)
      # Convert box number to index of upper-left corner of the box.
      i = BoxToIndex[b]
      # Return an array of values, with 0 elements removed.
      [
        @grid[i],    @grid[i+1],  @grid[i+2],
        @grid[i+9],  @grid[i+10], @grid[i+11],
        @grid[i+18], @grid[i+19], @grid[i+20]
      ] - [0]
    end
  end  # This is the end of the Puzzle class

  # An exception of this class indicates invalid input,
  class Invalid < StandardError
  end

  # An exception of this class indicates that a puzzle is over-constrained
  # and that no solution is possible.
  class Impossible < StandardError
  end

  #
  # This method scans a Puzzle, looking for unknown cells that have only
  # a single possible value. If it finds any, it sets their value. Since
  # setting a cell alters the possible values for other cells, it 
  # continues scanning until it has scanned the entire puzzle without 
  # finding any cells whose value it can set.
  #
  # This method returns three values. If it solves the puzzle, all three 
  # values are nil. Otherwise, the first two values returned are the row and
  # column of a cell whose value is still unknown. The third value is the
  # set of values possible at that row and column. This is a minimal set of
  # possible values: there is no unknown cell in the puzzle that has fewer
  # possible values. This complex return value enables a useful heuristic 
  # in the solve() method: that method can guess at values for cells where
  # the guess is most likely to be correct.
  # 
  # This method raises Impossible if it finds a cell for which there are
  # no possible values. This can happen if the puzzle is over-constrained,
  # or if the solve() method below has made an incorrect guess.
  #
  # This method mutates the specified Puzzle object in place.
  # If has_duplicates? is false on entry, then it will be false on exit.
  #
  def Sudoku.scan(puzzle)
    unchanged = false  # This is our loop variable

    # Loop until we've scanned the whole board without making a change.
    until unchanged 
      unchanged = true      # Assume no cells will be changed this time
      rmin,cmin,pmin = nil  # Track cell with minimal possible set
      min = 10              # More than the maximal number of possibilities

      # Loop through cells whose value is unknown.
      puzzle.each_unknown do |row, col, box|
        # Find the set of values that could go in this cell
        p = puzzle.possible(row, col, box)
        
        # Branch based on the size of the set p. 
        # We care about 3 cases: p.size==0, p.size==1, and p.size > 1.
        case p.size
        when 0  # No possible values means the puzzle is over-constrained
          raise Impossible
        when 1  # We've found a unique value, so set it in the grid
          puzzle[row,col] = p[0] # Set that position on the grid to the value
          unchanged = false      # Note that we've made a change
        else    # For any other number of possibilities
          # Keep track of the smallest set of possibilities.
          # But don't bother if we're going to repeat this loop.
          if unchanged && p.size < min
            min = p.size                    # Current smallest size
            rmin, cmin, pmin = row, col, p  # Note parallel assignment
          end
        end
      end
    end
      
    # Return the cell with the minimal set of possibilities.
    # Note multiple return values.
    return rmin, cmin, pmin
  end

  # Solve a Sudoku puzzle using simple logic, if possible, but fall back
  # on brute-force when necessary. This is a recursive method. It either
  # returns a solution or raises an exception. The solution is returned
  # as a new Puzzle object with no unknown cells. This method does not 
  # modify the Puzzle it is passed. Note that this method cannot detect
  # an under-constrained puzzle.
  def Sudoku.solve(puzzle)
    # Make a private copy of the puzzle that we can modify.
    puzzle = puzzle.dup

    # Use logic to fill in as much of the puzzle as we can.
    # This method mutates the puzzle we give it, but always leaves it valid.
    # It returns a row, a column, and set of possible values at that cell.
    # Note parallel assignment of these return values to three variables.
    r,c,p = scan(puzzle)

    # If we solved it with logic, return the solved puzzle.
    return puzzle if r == nil
    
    # Otherwise, try each of the values in p for cell [r,c].
    # Since we're picking from a set of possible values, the guess leaves
    # the puzzle in a valid state. The guess will either lead to a solution
    # or to an impossible puzzle. We'll know we have an impossible
    # puzzle if a recursive call to scan throws an exception. If this happens
    # we need to try another guess, or re-raise an exception if we've tried
    # all the options we've got.
    p.each do |guess|        # For each value in the set of possible values
      puzzle[r,c] = guess    # Guess the value
      
      begin
        # Now try (recursively) to solve the modified puzzle.
        # This recursive invocation will call scan() again to apply logic
        # to the modified board, and will then guess another cell if needed.
        # Remember that solve() will either return a valid solution or 
        # raise an exception.  
        return solve(puzzle)  # If it returns, we just return the solution
      rescue Impossible
        next                  # If it raises an exception, try the next guess
      end
    end

    # If we get here, then none of our guesses worked out
    # so we must have guessed wrong sometime earlier.
    raise Impossible
  end
end
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: The Structure and Execution of Ruby Programs
This chapter explains the structure of Ruby programs. It starts with the lexical structure, covering tokens and the characters that comprise them. Next, it covers the syntactic structure of a Ruby program, explaining how expressions, control structures, methods, classes, and so on are written as a series of tokens. Finally, the chapter describes files of Ruby code, explaining how Ruby programs can be split across multiple files and how the Ruby interpreter executes a file of Ruby code.
The Ruby interpreter parses a program as a sequence of tokens. Tokens include comments, literals, punctuation, identifiers, and keywords. This section introduces these types of tokens and also includes important information about the characters that comprise the tokens and the whitespace that separates the tokens.
Comments in Ruby begin with a # character and continue to the end of the line. The Ruby interpreter ignores the # character and any text that follows it (but does not ignore the newline character, which is meaningful whitespace and may serve as a statement terminator). If a # character appears within a string or regular expression literal (see ), then it is simply part of the string or regular expression and does not a comment:
# This entire line is a comment
x = "#This is a string"               # And this is a comment
y = /#This is a regular expression/   # Here's another comment
Multiline comments are usually written simply by beginning each line with a separate # character:
#
# This class represents a Complex number
# Despite its name, it is not complex at all.
#
Note that Ruby has no equivalent of the C-style /*...*/ comment. There is no way to embed a comment in the middle of a line of code.

Embedded documents

Ruby supports another style of multiline comment known as an embedded document. These start on a line that begins =begin and continue until (and include) a line that begins =end. Any text that appears after
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Lexical Structure
The Ruby interpreter parses a program as a sequence of tokens. Tokens include comments, literals, punctuation, identifiers, and keywords. This section introduces these types of tokens and also includes important information about the characters that comprise the tokens and the whitespace that separates the tokens.
Comments in Ruby begin with a # character and continue to the end of the line. The Ruby interpreter ignores the # character and any text that follows it (but does not ignore the newline character, which is meaningful whitespace and may serve as a statement terminator). If a # character appears within a string or regular expression literal (see ), then it is simply part of the string or regular expression and does not a comment:
# This entire line is a comment
x = "#This is a string"               # And this is a comment
y = /#This is a regular expression/   # Here's another comment
Multiline comments are usually written simply by beginning each line with a separate # character:
#
# This class represents a Complex number
# Despite its name, it is not complex at all.
#
Note that Ruby has no equivalent of the C-style /*...*/ comment. There is no way to embed a comment in the middle of a line of code.

Embedded documents

Ruby supports another style of multiline comment known as an embedded document. These start on a line that begins =begin and continue until (and include) a line that begins =end. Any text that appears after =begin or =end is part of the comment and is also ignored, but that extra text must be separated from the =begin and =end by at least one space.
Embedded documents are a convenient way to comment out long blocks of code without prefixing each line with a # character:
=begin Someone needs to fix the broken code below!
    Any code here is commented out
=end
Note that embedded documents only work if the = signs are the first characters of each line:
# =begin This used to begin a comment. Now it is itself commented out!
    
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Syntactic Structure
So far, we’ve discussed the tokens of a Ruby program and the characters that make them up. Now we move on to briefly describe how those lexical tokens combine into the larger syntactic structures of a Ruby program. This section describes the syntax of Ruby programs, from the simplest expressions to the largest modules. This section is, in effect, a roadmap to the chapters that follow.
The basic unit of syntax in Ruby is the expression. The Ruby interpreter evaluates expressions, producing values. The simplest expressions are primary expressions, which represent values directly. Number and string literals, described earlier in this chapter, are primary expressions. Other primary expressions include certain keywords such as true, false, nil, and self. Variable references are also primary expressions; they evaluate to the value of the variable.
More complex values can be written as compound expressions:
[1,2,3]                # An Array literal
{1=>"one", 2=>"two"}   # A Hash literal
1..3                   # A Range literal
Operators are used to perform computations on values, and compound expressions are built by combining simpler subexpressions with operators:
1         # A primary expression
x         # Another primary expression
x = 1     # An assignment expression
x = x + 1 # An expression with two operators
covers operators and expressions, including variables and assignment .
Expressions can be combined with Ruby’s keywords to create statements, such as the if statement for conditionally executing code and the while statement for repeatedly executing code:
if x < 10 then   # If this expression is true
  x = x + 1      # Then execute this statement
end              # Marks the end of the conditional

while x < 10 do  # While this expression is true...
  print x        # Execute this statement
  x = x + 1      # Then execute this statement
end              # Marks the end of the loop
In Ruby, these statements are technically expressions, but there is still a useful distinction between expressions that affect the control flow of a program and those that do not. explains Ruby’s control structures.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
File Structure
There are only a few rules about how a file of Ruby code must be structured. These rules are related to the deployment of Ruby programs and are not directly relevant to the language itself.
First, if a Ruby program contains a “shebang” comment, to tell the (Unix-like) operating system how to execute it, that comment must appear on the first line.
Second, if a Ruby program contains a “coding” comment (as described in ), that comment must appear on the first line or on the second line if the first line is a shebang.
Third, if a file contains a line that consists of the single token __END__ with no whitespace before or after, then the Ruby interpreter stops processing the file at that point. The remainder of the file may contain arbitrary data that the program can read using the IO stream object DATA. (See and for more about this global constant.)
Ruby programs are not required to fit in a single file. Many programs load additional Ruby code from external libraries, for example. Programs use require to load code from another file. require searches for specified modules of code against a search path, and prevents any given module from being loaded more than once. See for details.
The following code illustrates each of these points of Ruby file structure:
#!/usr/bin/ruby -w          shebang comment
# -*- coding: utf-8 -*-     coding comment
require 'socket'            load networking library

  ...                       program code goes here

__END__                     mark end of code
  ...                       program data goes here
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Program Encoding
At the lowest level, a Ruby program is simply a sequence of characters. Ruby’s lexical rules are defined using characters of the ASCII character set. Comments begin with the # character (ASCII code 35), for example, and allowed whitespace characters are horizontal tab (ASCII 9), newline (10), vertical tab (11), form feed (12), carriage return (13), and space (32). All Ruby keywords are written using ASCII characters, and all operators and other punctuation are drawn from the ASCII character set.
By default, the Ruby interpreter assumes that Ruby source code is encoded in ASCII. This is not required, however; the interpreter can also process files that use other encodings, as long as those encodings can represent the full set of ASCII characters. In order for the Ruby interpreter to be able to interpret the bytes of a source file as characters, it must know what encoding to use. Ruby files can identify their own encodings or you can tell the interpreter how they are encoded. Doing so is explained shortly.
The Ruby interpreter is actually quite flexible about the characters that appear in a Ruby program. Certain ASCII characters have specific meanings, and certain ASCII characters are not allowed in identifiers, but beyond that, a Ruby program may contain any characters allowed by the encoding. We explained earlier that identifiers may contain characters outside of the ASCII character set. The same is true for comments and string and regular expression literals: they may contain any characters other than the delimiter character that marks the end of the comment or literal. In ASCII-encoded files, strings may include arbitrary bytes, including those that represent nonprinting control characters. (Using raw bytes like this is not recommended, however; Ruby string literals support escape sequences so that arbitrary characters can be included by numeric code instead.) If the file is written using the UTF-8 encoding, then comments, strings, and regular expressions may include arbitrary Unicode characters. If the file is encoded using the Japanese SJIS or EUC encodings, then strings may include Kanji characters.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Program Execution
Ruby is a scripting language. This means that Ruby programs are simply lists, or scripts, of statements to be executed. By default, these statements are executed sequentially, in the order they appear. Ruby’s control structures (described in ) alter this default execution order and allow statements to be executed conditionally or repeatedly, for example.
Programmers who are used to traditional static compiled languages like C or Java may find this slightly confusing. There is no special main method in Ruby from which execution begins. The Ruby interpreter is given a script of statements to execute, and it begins executing at the first line and continues to the last line.
(Actually, that last statement is not quite true. The Ruby interpreter first scans the file for BEGIN statements, and executes the code in their bodies. Then it goes back to line 1 and starts executing sequentially. See for more on BEGIN.)
Another difference between Ruby and compiled languages has to do with module, class, and method definitions. In compiled languages, these are syntactic structures that are processed by the compiler. In Ruby, they are statements like any other. When the Ruby interpreter encounters a class definition, it executes it, causing a new class to come into existence. Similarly, when the Ruby interpreter encounters a method definition, it it, causing a new method to be defined. Later in the program, the interpreter will probably encounter and execute a method invocation expression for the method, and this will cause the statements in the method body to be executed.
The Ruby interpreter is invoked from the command line and given a script to execute. Very simple one-line scripts are sometimes written directly on the command line. More commonly, however, the name of the file containing the script is specified. The Ruby interpreter reads the file and executes the script. It first executes any BEGIN blocks. Then it starts at the first line of the file and continues until one of the following happens:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Datatypes and Objects
In order to understand a programming language, you have to know what kinds of data it can manipulate and what it can do with that data. This chapter is about the values manipulated by Ruby programs. It begins with comprehensive coverage of numeric and textual values. Next, it explains arrays and hashes—two important data structures that are a fundamental part of Ruby. The chapter then moves on to explain ranges, symbols, and the special values true, false, and nil. All Ruby values are objects, and this chapter concludes with detailed coverage of the features that all objects share.
The classes described in this chapter are the fundamental datatypes of the Ruby language. This chapter explains the basic behavior of those types: how literal values are written in a program, how integer and floating-point arithmetic work, how textual data is encoded, how values can serve as hash keys, and so on. Although we cover numbers, strings, arrays, and hashes here, this chapter makes no attempt to explain the APIs defined by those types. Instead, demonstrates those APIs by example, and it also covers many other important (but nonfundamental) classes.
Ruby includes five built-in classes for representing numbers, and the standard library includes three more numeric classes that are sometimes useful. shows the class hierarchy.
Figure : Numeric class hierarchy
All number objects in Ruby are instances of Numeric. All integers are instances of Integer. If an integer value fits within 31 bits (on most implementations), it is an of Fixnum. Otherwise, it is a Bignum. Bignum objects represent integers of arbitrary size, and if the result of an operation on Fixnum operands is too big to fit in a Fixnum, that result is transparently converted to a Bignum. Similarly, if the result of an operation on Bignum objects falls within the range of Fixnum, then the result is a Fixnum. Real numbers are approximated in Ruby with the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Numbers
Ruby includes five built-in classes for representing numbers, and the standard library includes three more numeric classes that are sometimes useful. shows the class hierarchy.
Figure : Numeric class hierarchy
All number objects in Ruby are instances of Numeric. All integers are instances of Integer. If an integer value fits within 31 bits (on most implementations), it is an of Fixnum. Otherwise, it is a Bignum. Bignum objects represent integers of arbitrary size, and if the result of an operation on Fixnum operands is too big to fit in a Fixnum, that result is transparently converted to a Bignum. Similarly, if the result of an operation on Bignum objects falls within the range of Fixnum, then the result is a Fixnum. Real numbers are approximated in Ruby with the Float class, which uses the native floating-point representation of the platform.
The Complex class represents complex numbers, of course. BigDecimal represents real numbers with arbitrary precision, using a decimal representation rather than a binary representation. And Rational represents rational numbers: one integer divided by another. In Ruby 1.8 these classes are in the standard library. In Ruby 1.9, Complex and Rational are built-in.
All numeric objects are immutable; there are no methods that allow you to change the value held by the object. If you pass a reference to a numeric object to a method, you need not worry that the method will modify the object. Fixnum objects are commonly used, and Ruby implementations typically treat them as immediate values rather than as references. Because numbers are immutable, however, there is really no way to tell the difference.
An integer literal is simply a sequence of digits:
0
123
12345678901234567890
If the integer values fit within the range of the Fixnum class, the value is a Fixnum. , it is a Bignum, which supports integers of any size. Underscores may be inserted into integer literals (though not at the beginning or end), and this feature is sometimes used as a thousands separator:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Text
Text is represented in Ruby by objects of the String class. Strings are mutable objects, and the String class defines a powerful set of operators and methods for extracting substrings, inserting and deleting text, searching, replacing, and so on. Ruby provides a number of ways to express string literals in your programs, and some of them support a powerful string interpolation syntax by which the values of arbitrary Ruby expressions can be substituted into string literals. The sections that follow explain string and literals and string operators. The full string API is covered in .
Textual patterns are represented in Ruby as Regexp objects, and Ruby defines a syntax for including regular expressions literally in your programs. The code /[a-z]\d+/, for example, represents a single lowercase letter followed by one or more digits. Regular expressions are a commonly used feature of Ruby, but regexps are not a fundamental datatype in the way that numbers, strings, and arrays are. See for documentation of regular expression syntax and the Regexp API.
Ruby provides quite a few ways to embed strings literally into your programs.

Single-quoted string literals

The simplest string literals are enclosed in single quotes (the apostrophe character). The text within the quote marks is the value of the string:
'This is a simple Ruby string literal'
If you need to place an apostrophe within a single-quoted string literal, precede it with a backslash so that the Ruby interpreter does not think that it terminates the string:
'Won\'t you read O\'Reilly\'s book?'
The backslash also works to escape another backslash, so that the second backslash is not itself interpreted as an escape character. Here are some situations in which you need to use a double backslash:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Arrays
An array is a sequence of values that allows values to be accessed by their position, or index, in the sequence. In Ruby, the first value in an array has index 0. The size and length methods return the number of elements in an array. The last element of the array is at index size-1. Negative index values count from the end of the array, so the last element of an array can also be accessed with an index of –1. The second-to-last has an index of –2, and so on. If you attempt to read an element beyond the end of an array (with an index >= size) or before the beginning of an array (with an index < -size), Ruby simply returns nil and does not throw an exception.
Ruby’s arrays are untyped and mutable. The elements of an array need not all be of the same class, and they can be changed at any time. Furthermore, arrays are dynamically resizeable; you can append elements to them and they grow as needed. If you assign a value to an element beyond the end of the array, the array is automatically extended with nil elements. (It is an error, however, to assign a value to an element before the beginning of an array.)
An array literal is a comma-separated list of values, enclosed in square brackets:
[1, 2, 3]         # An array that holds three Fixnum objects
[-10...0, 0..10,] # An array of two ranges; trailing commas are allowed
[[1,2],[3,4],[5]] # An array of nested arrays
[x+y, x-y, x*y]   # Array elements can be arbitrary expressions
[]                # The empty array has size 0
Ruby includes a special-case syntax for expressing array literals whose elements are short strings without spaces:
words = %w[this is a test]  # Same as: ['this', 'is', 'a', 'test']
open = %w| ( [ { < |        # Same as: ['(', '[', '{', '<']
white = %W(\s \t \r \n)     # Same as: ["\s", "\t", "\r", "\n"]
%w and %W introduce an array literal, much like %q and %Q introduce a String literal. In particular, the delimiter rules for %w and %W are the same as for %q and %Q. Within the delimiters, no quotation marks are required around the array element strings, and no commas are required between the elements. Array elements are delimited by .
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Hashes
A hash is a data structure that maintains a set of objects known as keys, and associates a value with each key. Hashes are also known as maps because they map keys to values. They are sometimes called associative arrays because they associate values with each of the keys, and can be thought of as arrays in which the array index can be any object instead of an integer. An example makes this clearer:
# This hash will map the names of digits to the digits themselves
numbers = Hash.new     # Create a new, empty, hash object
numbers["one"] = 1     # Map the String "one" to the Fixnum 1
numbers["two"] = 2     # Note that we are using array notation here
numbers["three"] = 3

sum = numbers["one"] + numbers["two"]  # Retrieve values like this
This introduction to hashes documents Ruby’s hash literal syntax and explains the requirements for an object to be used as a hash key. More information on the API defined by the Hash class is provided in .
A hash literal is written as a comma-separated list of key/value pairs, enclosed within curly braces. Keys and values are separated with a two-character “arrow”: =>. The Hash object created earlier could also be created with the following literal:
numbers = { "one" => 1, "two" => 2, "three" => 3 }
In general, Symbol objects work more efficiently as hash keys than strings do:
numbers = { :one => 1, :two => 2, :three => 3 }
Symbols are immutable interned strings, written as colon-prefixed identifiers; they are explained in more detail in later in this chapter.
Ruby 1.8 allows commas in place of arrows, but this deprecated syntax is no longer supported in Ruby 1.9:
numbers = { :one, 1, :two, 2, :three, 3 } # Same, but harder to read
Both Ruby 1.8 and Ruby 1.9 allow a single trailing comma at the end of the key/value list:
numbers = { :one => 1, :two => 2, } # Extra comma ignored
Ruby 1.9 supports a very useful and succinct hash literal syntax when the keys are symbols. In this case, the colon moves to the end of the hash key and replaces the arrow:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Ranges
A Range object represents the values between a start value and an end value. Range literals are written by placing two or three dots between the start and end value. If two dots are used, then the range is inclusive and the end value is part of the range. If three dots are used, then the range is exclusive and the end value is not part of the range:
1..10      # The integers 1 through 10, including 10
1.0...10.0 # The numbers between 1.0 and 10.0, excluding 10.0 itself
Test whether a value is included in a range with the include? method (but see below for a discussion of alternatives):
cold_war = 1945..1989
cold_war.include? birthdate.year
Implicit in the definition of a range is the notion of ordering. If a range is the values between two endpoints, there obviously must be some way to compare values to those endpoints. In Ruby, this is done with the comparison operator <=>, which compares its two operands and evaluates to –1, 0, or 1, depending on their relative order (or equality). Classes such as numbers and strings that have an ordering define the <=> operator. A value can only be used as a range endpoint if it responds to this operator. The endpoints of a range and the values “in” the range are typically all of the same class. Technically, however, any value that is compatible with the <=> operators of the range endpoints can be considered a member of the range.
The primary purpose for ranges is comparison: to be able to determine whether a value is in or out of the range. An important secondary purpose is iteration: if the class of the endpoints of a range defines a succ method (for successor), then there is a discrete set of range members, and they can be iterated with each, step, and Enumerable methods. Consider the range 'a'..'c', for example:
r = 'a'..'c'
r.each {|l| print "[#{l}]"}     # Prints "[a][b][c]"
r.step(2) { |l| print "[#{l}]"} # Prints "[a][c]"
r.to_a                          # => ['a','b','c']: Enumerable defines to_a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Symbols
A typical implementation of a Ruby interpreter maintains a symbol table in which it stores the names of all the classes, methods, and variables it knows about. This allows such an interpreter to avoid most string comparisons: it refers to method names (for example) by their position in this symbol table. This turns a relatively expensive string operation into a relatively cheap integer operation.
These symbols are not purely internal to the interpreter; they can also be used by Ruby programs. A Symbol object refers to a symbol. A symbol literal is written by prefixing an identifier or string with a colon:
:symbol                   # A Symbol literal
:"symbol"                 # The same literal
:'another long symbol'    # Quotes are useful for symbols with spaces
s = "string"
sym = :"#{s}"             # The Symbol :string
Symbols also have a %s literal syntax that allows arbitrary delimiters in the same way that %q and %Q can be used for string literals:
%s["]     # Same as :'"'
Symbols are often used to refer to method names in reflective code. For example, we want to know if some object has an each method:
o.respond_to? :each
Here’s another example. It tests whether a given object responds to a specified method, and, if so, invokes that method:
name = :size
if o.respond_to? name
  o.send(name)
end
You can convert a String to a Symbol using the intern or to_sym methods. And you can convert a Symbol back into a String with the to_s method or its alias id2name:
str = "string"     # Begin with a string
sym = str.intern   # Convert to a symbol
sym = str.to_sym   # Another way to do the same thing
str = sym.to_s     # Convert back to a string
str = sym.id2name  # Another way to do it
Two strings may hold the same content and yet be completely distinct objects. This is never the case with symbols. Two strings with the same content will both convert to exactly the same Symbol object. Two distinct Symbol objects will always have different content.
Whenever you write code that uses strings not for their textual content but as a kind of unique identifier, consider using symbols instead. Rather than writing a method that expects an argument to be either the string “AM” or “PM”, for example, you could write it to expect the symbol
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
True, False, and Nil
We saw in that true, false, and nil are keywords in Ruby. true and false are the two Boolean values, and they represent truth and falsehood, yes and no, on and off. nil is a special value reserved to indicate the absence of value.
Each of these keywords evaluates to a special object. true evaluates to an object that is a singleton instance of TrueClass. Likewise, false and nil are singleton instances of FalseClass and NilClass. Note that there is no Boolean class in Ruby. TrueClass and FalseClass both have Object as their superclass.
If you want to check whether a value is nil, you can simply compare it to nil, or use the method nil?:
o == nil   # Is o nil?
o.nil?     # Another way to test
Note that true, false, and nil refer to objects, not numbers. false and nil are not the same thing as 0, and true is not the same thing as 1. When Ruby requires a Boolean value, nil behaves like false, and any value other than nil or false behaves like true.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Objects
Ruby is a very pure object-oriented language: all values are objects, and there is no distinction between primitive types and object types as there are in many other languages. In Ruby, all objects inherit from a class named Object and share the methods defined by that class. This section explains the common features of all objects in Ruby. It is dense in parts, but it’s required reading; the information here is fundamental.
When we work with objects in Ruby, we are really working with object references. It is not the object itself we manipulate but a reference to it.  When we assign a value to a variable, we are not copying an object “into” that variable; we are merely storing a reference to an object into that variable. Some code makes this clear:
s = "Ruby" # Create a String object. Store a reference to it in s.
t = s      # Copy the reference to t. s and t both refer to the same object.
t[-1] = "" # Modify the object through the reference in t.
print s    # Access the modified object through s. Prints "Rub". 
t = "Java" # t now refers to a different object.
print s,t  # Prints "RubJava".
When you pass an object to a method in Ruby, it is an object reference that is passed to the method. It is not the object itself, and it is not a reference to the reference to the object. Another way to say this is that method arguments are passed by value rather than by reference, but that the values passed are object references.
Because object references are passed to methods, methods can use those references to modify the underlying object. These modifications are then visible when the method returns.

Immediate values

We’ve said that all values in Ruby are objects and all objects are manipulated by reference. In the reference implementation, however, Fixnum and Symbol objects are actually “immediate values” rather than references. Neither of these classes have methods, so Fixnum and Symbol objects are immutable, which means that there is really no way to tell that they are manipulated by value rather than by reference.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Expressions and Operators
An expression is a chunk of Ruby code that the Ruby interpreter can evaluate to produce a value. Here are some sample expressions:
2                  # A numeric literal
x                  # A local variable reference
Math.sqrt(2)       # A method invocation
x = Math.sqrt(2)   # Assignment
x*x                # Multiplication with the * operator
As you can see, primary expressions—such as literals, variable references, and method invocations—can be combined into larger expressions with operators, such as the operator and the multiplication operator.
Many programming languages distinguish between low-level expressions and higher-level statements, such as conditionals and loops. In these languages, statements control the flow of a program, but they do not have values. They are executed, rather than evaluated. In Ruby, there is no clear distinction between statements and expressions; everything in Ruby, including class and method definitions, can be evaluated as an expression and will return a value. It is still useful, however, to distinguish syntax typically used as expressions from syntax typically used as statements. Ruby expressions that affect flow-of-control are documented in . Ruby expressions that define methods and classes are covered in Chapters and .
This chapter covers the simpler, more traditional sort of expressions. The simplest expressions are literal values, which we already documented in . This chapter explains variable and constant references, method invocations, assignment, and expressions created by combining smaller expressions with operators.
Literals are values such as 1.0, 'hello world', and [] that are embedded directly into your program text. We introduced them in and documented them in detail in .
It is worth noting that many literals, such as numbers, are primary expressions—the simplest possible expressions not composed of simpler expressions. Other literals, such as array and hash literals and double-quoted strings that use interpolation, include subexpressions and are therefore not primary expressions.
Additional content appearing in this section has been removed.
Purchase this book now