Skip to Content
Kubeflow for Machine Learning
book

Kubeflow for Machine Learning

by Trevor Grant, Holden Karau, Boris Lublinsky, Richard Liu, Ilan Filonenko
October 2020
Intermediate to advanced
261 pages
6h 19m
English
O'Reilly Media, Inc.
Content preview from Kubeflow for Machine Learning

Chapter 9. Case Study Using Multiple Tools

In this chapter we’re going to discuss what to do if you need to use “other” tools for your particular data science pipeline. Python has a plethora of tools for handling a wide array of data formats. RStats has a large repository of advanced math functions. Scala is the default language of big data processing engines such as Apache Spark and Apache Flink. Legacy programs that would be costly to reproduce exist in any number of languages.

A very important benefit of Kubeflow is that users no longer need to choose which language is best for their entire pipeline but can instead use the best language for each job (as long as the language and code are containerizable).

We will demonstrate these concepts through a comprehensive example denoising CT scans. Low-dose CT scans allow clinicians to use the scans as a diagnostic tool by delivering a fraction of the radiation dose—however, these scans often suffer from an increase in white noise. CT scans come in a format known as DICOM, and we’ll use a container with a specialized library called pydicom to load and process the data into a numpy matrix.

Several methods for denoising CT scans exist; however, they often focus on the mathematical justification, not the implementation. We will present an open source method that uses a singular value decomposition (SVD) to break the image into components, the “least important” of which are often the noise. We use Apache Spark with the Apache Mahout library ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Machine Learning Bookcamp

Machine Learning Bookcamp

Alexey Grigoriev
Machine Learning for High-Risk Applications

Machine Learning for High-Risk Applications

Patrick Hall, James Curtis, Parul Pandey

Publisher Resources

ISBN: 9781492050117Errata Page