Skip to Content
Doing Data Science
book

Doing Data Science

by Cathy O'Neil, Rachel Schutt
October 2013
Beginner
405 pages
10h 9m
English
O'Reilly Media, Inc.
Content preview from Doing Data Science

Chapter 3. Algorithms

In the previous chapter we discussed in general how models are used in data science. In this chapter, we’re going to be diving into algorithms.

An algorithm is a procedure or set of steps or rules to accomplish a task. Algorithms are one of the fundamental concepts in, or building blocks of, computer science: the basis of the design of elegant and efficient code, data preparation and processing, and software engineering.

Some of the basic types of tasks that algorithms can solve are sorting, searching, and graph-based computational problems. Although a given task such as sorting a list of objects could be handled by multiple possible algorithms, there is some notion of “best” as measured by efficiency and computational time, which matters especially when you’re dealing with massive amounts of data and building consumer-facing products.

Efficient algorithms that work sequentially or in parallel are the basis of pipelines to process and prepare data. With respect to data science, there are at least three classes of algorithms one should be aware of:

  1. Data munging, preparation, and processing algorithms, such as sorting, MapReduce, or Pregel.

    We would characterize these types of algorithms as data engineering, and while we devote a chapter to this, it’s not the emphasis of this book. This is not to say that you won’t be doing data wrangling and munging—just that we don’t emphasize the algorithmic aspect of it.

  2. Optimization algorithms for parameter estimation, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Statistics for Data Scientists

Practical Statistics for Data Scientists

Peter Bruce, Andrew Bruce

Publisher Resources

ISBN: 9781449363871Errata PageSupplemental Content