Skip to Content
Kubeflow for Machine Learning
book

Kubeflow for Machine Learning

by Trevor Grant, Holden Karau, Boris Lublinsky, Richard Liu, Ilan Filonenko
October 2020
Intermediate to advanced
261 pages
6h 19m
English
O'Reilly Media, Inc.
Book available
Content preview from Kubeflow for Machine Learning

Chapter 9. Case Study Using Multiple Tools

In this chapter we’re going to discuss what to do if you need to use “other” tools for your particular data science pipeline. Python has a plethora of tools for handling a wide array of data formats. RStats has a large repository of advanced math functions. Scala is the default language of big data processing engines such as Apache Spark and Apache Flink. Legacy programs that would be costly to reproduce exist in any number of languages.

A very important benefit of Kubeflow is that users no longer need to choose which language is best for their entire pipeline but can instead use the best language for each job (as long as the language and code are containerizable).

We will demonstrate these concepts through a comprehensive example denoising CT scans. Low-dose CT scans allow clinicians to use the scans as a diagnostic tool by delivering a fraction of the radiation dose—however, these scans often suffer from an increase in white noise. CT scans come in a format known as DICOM, and we’ll use a container with a specialized library called pydicom to load and process the data into a numpy matrix.

Several methods for denoising CT scans exist; however, they often focus on the mathematical justification, not the implementation. We will present an open source method that uses a singular value decomposition (SVD) to break the image into components, the “least important” of which are often the noise. We use Apache Spark with the Apache Mahout library ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Feature Store for Machine Learning

Feature Store for Machine Learning

Jayanth Kumar M J
Grokking Deep Learning

Grokking Deep Learning

Andrew W. Trask

Publisher Resources

ISBN: 9781492050117Errata PageSupplemental Content