Skip to Content
The Big Book of Data Science. Part I: Data Processing
book

The Big Book of Data Science. Part I: Data Processing

by David Lopez, Eugenia Robles
September 2025
Intermediate to advanced
420 pages
50h 55m
English
Cyberblue Media

Overview

There are already excellent books on software programming for data processing and data transformation for instance: Wes McKinney’s. This book, reflecting on my own industrial and teaching experience, tries to overcome the big learning curve newcomers to the field have to travel before they are ready to tackle real data science and AI challenges. In this regard this book is different to other books in that:

It assumes zero software programming knowledge. This instructional design is intentional given the book’s aim to open the practice of data science to anyone interested in data exploration and analysis irrespective of their previous background.

It follows an incremental approach to facilitate the assimilation of, sometimes, arcane software techniques to manipulate data.

It is practice oriented to ensure readers can apply what they learn in their daily practices.

Illustrates how to use generative AI to help you become a more productive data scientist and AI engineer.

By reading and working on the labs included in this book you will develop software programming skills required to successfully contribute to the data understanding and data preparation stages involved in any data related project. You will become proficient at manipulating and transforming datasets in industrial contexts and produce clean, reliable datasets that can drive accurate analysis and informed decision-making. Moreover you will be prepared to develop and deploy dashboards and visualizations supporting the insights and conclusions in the deployment stage.

Data modelling and evaluation are not covered in this book. We are working on a second installment of the book series illustrating the application of statistical and machine learning techniques to derive data insights.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Learning Data Science

Learning Data Science

Sam Lau, Joseph Gonzalez, Deborah Nolan

Publisher Resources

ISBN: 9798993039404