Explorers Ahoy!

It’s hard to compare intrepid explorers like Ferdinand Magellan, James Cook, and Roald Amundsen with someone, well, like me. While these adventurers braved the elements, wild nature, and unknown dangers to discover new worlds (at least for their civilization), my biggest physical achievement to date would probably be completing a 10-kilometer charity quarter-marathon—walking.

The explorers of old had it good, of course, when it came to choices of unexplored places to stake their claim on. Christopher Columbus only had to sail due west from Europe, and he discovered two entire continents. For us, there are far fewer choices. There isn’t much landmass on Earth that is yet unexplored; even the Mariana Trench, the deepest part of the world’s oceans, has been conquered.

But explorer I am, and explorer you will be in this book. While much of the known physical world has been conquered (see Figure 1), the unknown still looms over most of us.

The Scott expedition to the South Pole (photo from the Public Domain Review; )

Figure 1. The Scott expedition to the South Pole (photo from the Public Domain Review;

We are all born with a sense of wonder and amazement at the world around us. Many of us just learn to turn it off as we grow older and jaded. I believe this is partly because we don’t understand what goes on in the world around us well enough, and thus we don’t care either. Click the remote and the TV turns on—why and how does that work? The first time we tried to ask, we were probably given a blank stare or waved away—who cares as long as you can watch the next season of American Idol? That soon grows to be our reaction as well.

Well, in this book, I’ll take you along winding paths to bring back the original, wide-eyed person you were. We’ll find the magic again, and hopefully at the end of the book, you’ll continue where we leave off and make your own way in that journey of exploration and discovery.

Data, Data, Everywhere

We are swamped with data every minute and second of our lives. I don’t mean this metaphorically, and I am not simply waxing lyrical about big data either.

In fact, we’re so swamped that our eyes have evolved and adapted to this fact by shutting off our environment for a very short while every millisecond. In a phenomenon called saccadic masking, the brain shuts down during a fast eye movement (a saccade) to remove blurred images that come to our retina. Blurred images are not very useful, so the brain discards them, rendering us effectively blind (without us realizing it) during a saccade.

There is much similarity between saccadic masking and the way we process data today. The data comes so fast, so frequently that we often mask it away. There is a lot of data around us that we can extract and analyze to find answers, but the problem has always been how to do this.

In the (distant) past, it was always geniuses who had that knack of unlocking secrets with data and insight, along with the serendipitous few who simply stumbled on the answers. Not so anymore. Although intelligence is still a prerequisite, the arrival of computers and programming has elevated us from the more mundane, repetitive, and mind-numbing tasks of processing data to extract nuggets of information.

Only, it hasn’t.

At least not for most people, anyway. The exceptions are scientists and mathematicians, who long ago pounced on the tools that enable them to do their work much more efficiently. If you’re someone from these two camps, you are likely already taking full advantage of the power of computers.

However, for programmers and many other people, writing computer programs started with providing tools for businesses and for improving business processes. It’s all about using computers to reduce cost, increase revenue, and improve efficiency. For many professional programmers, coding is a job. It’s drudgery, low-level menial work that brings food to the table. We have forgotten the promise of computers and the power of programming for discovery.

Bringing the World to Us

This book is an attempt to bring back that wonder and sense of discovery. I want this book to uncover things that you didn’t know, or didn’t understand. I want it to help you discover new worlds within the existing world we see every day. Finally, I want it to enable you to explore the mundane and learn new things through programming and analyzing data.

While sometimes the world we explore in this book is the real world, more often it’s not. It’s hard to explore the whole wide world with just bits and bytes. So if we can’t explore the world we live in, we’ll create our own worlds and explore those—in other words, we’ll use simulations.

Simulations are an excellent way of exploring things that we cannot control. We do this all the time. When we were young, we often created make-believe worlds and lived in them. Doing this enabled us to understand the real world better. We still do this today, through the magic of television (especially serials and soap operas) and movies—where we live through the characters we see on the screen. And for better or worse, simulations like television affect our real lives and even our dreams. For example, a survey by the American Psychological Association found that only 20% of people in their 60s (who grew up before color television was popular) recalled having bright and vivid dreams. However, 80% of people under the age of 30 confirmed that their dreams were in full color.[1]

In this book, we will use simulations to create experiments, isolate factors, and propose hypotheses to explain the results of the experiments. You might or might not agree with the experiments I describe or the hypotheses I suggest, but that doesn’t really matter. What I would like you to get out of our journey together is the realization that there is more than business as usual to programming business solutions and processes. What I hope to achieve is for you eventually to design your own experiments, run through them, and discover your own worlds.

Packing Your Bags

So what do you need on this journey of discovery, this grand adventure through programming and analyzing data? Tools, of course. They will be the subject of the next two chapters. These are not the only tools available to you, but they are the ones we will be using in this book.

The two tools we will use are Ruby and R. I’ve chosen them for specific purposes. Ruby is easy to learn and to read, perfectly suited to explain concepts in human-readable code. I will be using Ruby to write simulations and to do preprocessing to get data. R, on the other hand, is great for analyzing data and for generating charts for visualization.

Although you don’t need to be a Ruby or R programmer to be able to appreciate this book, I have assumed a basic understanding of programming. Specifically, I assume you have completed a computer science or related course or have done some simple programming in any programming language.

For the rest of the book, every chapter is more or less self-sufficient. Each chapter explores an idea, starting from the realization that a question exists and then attempting to answer it in either a simulation or some processing that brings out the data. We then analyze this data and make certain conclusions based on our analysis.

The ideas are drawn from diverse fields, ranging from economics to evolution, from healthcare to workplace design (in this case, figuring out the correct number of restrooms in an office). Some ideas are grander than others, and some ideas can be quite personal. The reason for this diversity is to show that the possibilities for exploration are limited only by our creativity.

Each chapter usually starts off small, and we gradually add on layers of complexity to flesh out its central idea. The hypotheses, conclusions, and results from the experiments surrounding the base idea are incidental. You might, for example, agree or disagree with my conclusions and interpretation of the results. For this book at least, the journey is more important than the results.

With that, we’re off! Have fun with the next two chapters, and enjoy the rest of the explorations, intrepid explorer!

Conventions Used in This Book

The following typographical conventions are used in this book:


Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user; also used for emphasis within program listings.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.


This icon signifies a tip, suggestion, or general note.


This icon indicates a warning or caution.

Using Code Examples

All examples and related files in this book may be downloaded from GitHub.

This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Exploring Everyday Things with R and Ruby by Sau Sheong Chang (O’Reilly). Copyright 2012 Sau Sheong Chang, 978-1-449-31515-3.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

Safari® Books Online


Safari Books Online ( is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of product mixes and pricing programs for organizations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and dozens more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at:

To comment or ask technical questions about this book, send email to:

For more information about our books, courses, conferences, and news, see our website at

Find us on Facebook:

Follow us on Twitter:

Watch us on YouTube:


This is the part where I finally get to thank the people who helped me create the book you now hold in your hands. Writing a book is never the sole effort of a lonely author, as I have learned over the years, but the collective work of the author, a professional team, and a community of reviewers and supporters. In no particular order, I would like to thank:

  • Mike Hendrickson for agreeing to this rather different type of programming book. It was a wild shot sending in the book proposal and I didn't really expect it to be picked up, except that it was.

  • Andy Oram for being patient to a first time O’Reilly author, and arranging really long distance Skype calls halfway around the world, and waking up really early to speak to me every Tuesday evening.

  • Kristen Borg, Rachel Monaghan, and the whole production editing team for doing such an awesome and professional job with the book.

  • Jeremy Leipzig, Ivan Tan, Patrick Haller, and Judith Myerson for their help in doing the technical reviews and giving great advice. In particular, Patrick Haller, whom I badgered with emails about his comments on my R scripts. Thanks, Patrick!

  • Rully Santosa, Chen Way Yen, Ng Tze Yang, Kelvin Teh, George Goh, and the rest of the HP Labs Singapore Applied Research team, to whom I have bounced off countless ideas and have given me innumerable remarks. Special thanks to Rully, Way Yen, and George for their feedback in Chapter 6, In a Heartbeat.

  • The Ruby community, especially the Singapore Ruby Brigade, where I made and continue to make good friends with common interests in exploring the world through Ruby. It's a great community to be in, and I relish the (now) annual RedDotRubyConf organized by the ever efficient Andy Croll.

Finally, I would like to dedicate this book to my family, who is my inspiration and my motivation in everything I do. To my lovely wife Wooi Ying, who has been patient yet again (for the third time), thanks for understanding why I simply have to understand everything and how it works. To my soon-to-be teenage son Kai Wen, I hope this book will also be an inspiration to you in being the wide-eyed explorer that I have been all my life.

[1] Okada, Hitoshi, Kazuo Matsuoka, and Takao Hatakeyama. “Life Span Differences in Color Dreaming.” Dreaming 21, no. 3 (2011), 213–220.

Get Exploring Everyday Things with R and Ruby now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.