Chapter 7 Data Science

Now that we have so much more data and this data is being stored longer and in more accessible formats, data scientists are increasingly in demand. Demand for data scientists is growing sharply across many fields and sectors. The term “data scientist” can refer to specific training and background (with more and more advanced degree programs cropping up), but for the purposes of this discussion, let’s assume that data scientists are those who are being asked to extract insight, draw conclusions, and make predictions from data. Data scientists work with data, analyzing, transforming, and building models and databases. Sometimes those acting in data science capacities have relatively little formal training in data science. We certainly hope that everyone engaged in data science has a sufficient understanding of statistics so as not to employ dubious methods or arrive at erroneous conclusions.

We covered some tools specific to genomic analysis in Chapter 2. In this chapter we explore NoSQL database offerings and statistical tools for 21st-century data science. Data science is a vast topic that we will not be able to cover exhaustively in this chapter. If you’d like more information, I would suggest consulting:

  • Nathan Marz and James Warren, Big Data Principles and Best Practices of Scalable Real-Time Data Systems. Shelter Island, NY: Manning Publications, 2015.
  • Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills, Advanced Analytics with Spark: Patterns for ...

Get Strategies in Biomedical Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.