Working with Files

There are a whole slew of options for doing various file management tasks in Ruby. Because of this, it can be difficult to determine what the best approach for a given task might be. In this section, we’ll cover two key tasks while looking at three of Ruby’s standard libraries.

First, you’ll learn how to use the pathname and fileutils libraries to traverse your filesystem using a clean cross-platform approach that rivals the power of popular *nix shells without sacrificing compatibility. We’ll then move on to how to use tempfile to automate handling of temporary file resources within your scripts. These practical tips will help you write platform-agnostic Ruby code that’ll work out of the box on more systems, while still managing to make your job easier.

Using Pathname and FileUtils

If you are using Ruby to write administrative scripts, it’s nearly inevitable that you’ve needed to do some file management along the way. It may be quite tempting to drop down into the shell to do things like move and rename directories, search for files in a complex directory structure, and other common tasks that involve ferrying files around from one place to the other. However, Ruby provides some great tools to avoid this sort of thing.

The pathname and fileutils standard libraries provide virtually everything you need for file management. The best way to demonstrate their capabilities is by example, so we’ll now take a look at some code and then break it down piece by piece.

To illustrate Pathname, we can take a look at a small tool I’ve built for doing local installations of libraries found on GitHub. This script, called mooch, essentially looks up and clones a git repository, puts it in a convenient place within your project (a vendor/ directory), and optionally sets up a stub file that will include your vendored packages into the loadpath upon requiring it. Sample usage looks something like this:

$ mooch init lib/my_project
$ mooch sandal/prawn  0.2.3
$ mooch ruport/ruport 1.6.1

We can see the following will work without loading RubyGems:

>> require "lib/my_project/dependencies"
=> true
>> require "prawn"
=> true
>> require "ruport"
=> true
>> Prawn::VERSION
=> "0.2.3"
>> Ruport::VERSION
=> "1.6.1"

Although this script is pretty useful, that’s not what we’re here to talk about. Instead, let’s focus on how this sort of thing is built, as it shows a practical example of using Pathname to manipulate files and folders. I’ll start by showing you the whole script, and then we’ll walk through it part by part:

#!/usr/bin/env ruby
require "pathname"

WORKING_DIR = Pathname.getwd
LOADER = %Q{
  require "pathname"

  Pathname.glob("#{WORKING_DIR}/vendor/*/*/") do |dir|
   lib = dir + "lib"
   $LOAD_PATH.push(lib.directory? ? lib : dir)
  end
}

if ARGV[0] == "init"
  lib = Pathname.new(ARGV[1])
  lib.mkpath
  (lib + 'dependencies.rb').open("w") do |file|
    file.write LOADER
  end
else
  vendor = Pathname.new("vendor")
  vendor.mkpath
  Dir.chdir(vendor.realpath)
  system("git clone git://github.com/#{ARGV[0]}.git #{ARGV[0]}")
  if ARGV[1]
    Dir.chdir(ARGV[0])
    system("git checkout #{ARGV[1]}")
  end
end

As you can see, it’s not a ton of code, even though it does a lot. Let’s shine the spotlight on the interesting Pathname bits:

WORKING_DIR = Pathname.getwd

Here we are simply assigning the initial working directory to a constant. We use this to build up the code for the dependencies.rb stub script that can be generated via mooch init. Here we’re just doing quick-and-dirty code generation, and you can see the full stub as stored in LOADER:

LOADER = %Q{
  require "pathname"

  Pathname.glob("#{WORKING_DIR}/vendor/*/*/") do |dir|
    lib = dir + "lib"
    $LOAD_PATH.push(lib.directory? ? lib : dir)
  end
}

This script does something fun. It looks in the working directory that mooch init was run in for a folder called vendor, and then looks for folders two levels deep fitting the GitHub convention of username/project. We then use a glob to traverse the directory structure, in search of folders to add to the loadpath. The code will check to see whether each project has a lib folder within it (as is the common Ruby convention), but will add the project folder itself to the loadpath if it is not present.

Here we notice a few of Pathname’s niceties. You can see we can construct new paths by just adding new strings to them, as shown here:

lib = dir + "lib"

In addition to this, we can check to see whether the path we’ve created actually points to a directory on the filesystem, via a simple Pathname#directory? call. This makes traversal downright easy, as you can see in the preceding code.

This simple stub may be a bit dense, but once you get the hang of Pathname, you can see that it’s quite powerful. Let’s look at a couple more tricks, focusing this time on the code that actually writes this snippet to file:

lib = Pathname.new(ARGV[1])
lib.mkpath
(lib + 'dependencies.rb').open("w") do |file|
  file.write LOADER
end

Before, the invocation looked like this:

$ mooch init lib/my_project

Here, ARGV[1] is lib/my_project. So, in the preceding code, you can see we’re building up a relative path to our current working directory and then creating a folder structure. A very cool thing about Pathname is that it works in a similar way to mkdir -p on *nix, so Pathname#mkpath will actually create any necessary nesting directories as needed, and won’t complain if the structure already exists, which are both results that we want here.

Once we build up the directories, we need to create our dependencies.rb file and populate it with the string in LOADER. We can see here that Pathname provides shortcuts that work in a similar fashion to File.open().

In the code that actually downloads and vendors libraries from GitHub, we see the same techniques in use yet again, this time mixed in with some shell commands and Dir.chdir. As this doesn’t introduce anything new, we can skip over the details.

Before we move on to discussing temporary files, we’ll take a quick look at FileUtils. The purpose of this module is to provide a Unix-like interface to file manipulation tasks, and a quick look at its method list will show that it does a good job of this:

cd(dir, options)
cd(dir, options) {|dir| .... }
pwd()
mkdir(dir, options)
mkdir(list, options)
mkdir_p(dir, options)
mkdir_p(list, options)
rmdir(dir, options)
rmdir(list, options)
ln(old, new, options)
ln(list, destdir, options)
ln_s(old, new, options)
ln_s(list, destdir, options)
ln_sf(src, dest, options)
cp(src, dest, options)
cp(list, dir, options)
cp_r(src, dest, options)
cp_r(list, dir, options)
mv(src, dest, options)
mv(list, dir, options)
rm(list, options)
rm_r(list, options)
rm_rf(list, options)
install(src, dest, mode = <src's>, options)
chmod(mode, list, options)
chmod_R(mode, list, options)
chown(user, group, list, options)
chown_R(user, group, list, options)
touch(list, options)

You’ll see a bit more of FileUtils later on in the chapter when we talk about atomic saves. But before we jump into advanced file management techniques, let’s review another important foundational tool: the tempfile standard library.

Get Ruby Best Practices now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.