Preface

Working with data is about producing knowledge. Whether that knowledge is consumed by a person or acted on by a machine, our goal as professionals working with data is to use observations to learn about how the world works. We want to turn information into insights, and asking the right questions ensures that we’re creating insights about the right things. The purpose of this book is to help us understand that these are our goals and that we are not alone in this pursuit.

I work as a data strategy consultant. I help people figure out what problems they are trying to solve, how to solve them, and what to do with them once the problems are “solved.” This book grew out of the recognition that the problem of asking good questions and knowing how to put the answers together is not a new one. This problem—the problem of turning observations into knowledge—is one that has been worked on again and again and again by experts in a variety of disciplines. We have much to learn from them.

People use data to make knowledge to accomplish a wide variety of things. There is no one goal of all data work, just as there is no one job description that encapsulates it. Consider this incomplete list of things that can be made better with data:

  • Answering a factual question
  • Telling a story
  • Exploring a relationship
  • Discovering a pattern
  • Making a case for a decision
  • Automating a process
  • Judging an experiment

Doing each of these well in a data-driven way draws on different strengths and skills. The most obvious are what you might call the “hard skills” of working with data: data cleaning, mathematical modeling, visualization, model or graph interpretation, and so on.[1]

What is missing from most conversations is how important the “soft skills” are for making data useful. Determining what problem one is actually trying to solve, organizing results into something useful, translating vague problems or questions into precisely answerable ones, trying to figure out what may have been left out of an analysis, combining multiple lines or arguments into one useful result…the list could go on. These are the skills that separate the data scientist who can take direction from the data scientist who can give it, as much as knowledge of the latest tools or newest algorithms.

Some of this is clearly experience—experience working within an organization, experience solving problems, experience presenting the results. But these are also skills that have been taught before, by many other disciplines. We are not alone in needing them. Just as data scientists did not invent statistics or computer science, we do not need to invent techniques for how to ask good questions or organize complex results. We can draw inspiration from other fields and adapt them to the problems we face. The fields of design, argument studies, critical thinking, national intelligence, problem-solving heuristics, education theory, program evaluation, various parts of the humanities—each of them have insights that data science can learn from.

Data science is already a field of bricolage. Swaths of engineering, statistics, machine learning, and graphic communication are already fundamental parts of the data science canon. They are necessary, but they are not sufficient. If we look further afield and incorporate ideas from the “softer” intellectual disciplines, we can make data science successful and help it be more than just this decade’s fad.

A focus on why rather than how already pervades the work of the best data professionals. The broader principles outlined here may not be new to them, though the specifics likely will be.

This book consists of six chapters. Chapter 1 covers a framework for scoping data projects. Chapter 2 discusses how to pin down the details of an idea, receive feedback, and begin prototyping. Chapter 3 covers the tools of arguments, making it easier to ask good questions, build projects in stages, and communicate results. Chapter 4 covers data-specific patterns of reasoning, to make it easier to figure out what to focus on and how to build out more useful arguments. Chapter 5 takes a big family of argument patterns (causal reasoning) and gives it a longer treatment. Chapter 6 provides some more long examples, tying together the material in the previous chapters. Finally, there is a list of further reading in Appendix A, to give you places to go from here.

Conventions Used in This Book

The following typographical convention is used in this book:

Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.

Safari® Books Online

Note

Safari Books Online is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of product mixes and pricing programs for organizations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and dozens more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://oreil.ly/thinking-with-data.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

I would be remiss to not mention some of the fantastic people who have helped make this book possible. Juan-Pablo Velez has been invaluable in refining my ideas. Jon Bruner, Matt Wallaert, Mike Dewar, Brian Eoff, Jake Porway, Sam Rayachoti, Willow Brugh, Chris Wiggins, Claudia Perlich, and John Matthews provided me with key insights that hopefully I have incorporated well.

Jay Garlapati, Shauna Gordon-McKeon, Michael Stone, Brian Eoff, Dave Goodsmith, and David Flatow provided me with very helpful feedback on drafts. Ann Spencer was a fantastic editor. It was wonderful to know that there was always someone in my corner. Thank you also to Solomon Roberts, Gabe Gaster, emily barger, Miklos Abert, Laci Babai, and Gordon Kindlmann, who were each crucial at setting me on the path that gave me math. Thank you also to Christian Rudder, who taught me so much—not least of which, the value of instinct. As always, all the errors and mistakes are mine alone. Thanks as well to all of you who were helpful whose names I neglected to put down.

At last I understand why every author in every book on my shelf thanks their family. My wonderful partner, Sarah, has been patient, kind, and helpful at every stage of this process, and my loving parents and sister have been a source of comfort and strength as I made this book a reality. My father especially has been a great source of ideas to me. He set me off on this path as a kid when he patiently explained to me the idea of “metacognition,” or thinking about thinking. It would be hard to be grateful enough.



[1] See Taxonomy of Data Science by Hilary Mason and Chris Wiggins (http://www.dataists.com/2010/09/a-taxonomy-of-data-science/) and From Data Mining to Knowledge Discovery in Databases by Usama Fayyad et al. (AI Magazine, Fall 1996).

Get Thinking with Data now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.