O'Reilly logo
live online training icon Live Online training

Accelerate and migrate your data science to GPU with RAPIDS

Achieve GPU performance the easy way with NVIDIA’s Python open source RAPIDS

Adam Breindel

Using a GPU for general data computation isn’t new, but until recently, it required special skills and was prohibitively complex and expensive. The rise of deep learning has made it easier for nonspecialist engineers to solve common problems using basic Python while leveraging the speed of GPUs. With the ongoing development of open source data science framework RAPIDS—used by Anaconda, Walmart, Databricks, IBM, and Uber, among many others—it’s now possible for typical data scientists and engineers with typical Python knowledge and experience to easily move workflows to the GPU.

Jumpstart your RAPIDS knowledge with expert Adam Breindel. You’ll see how RAPIDS lets you use skills you already have—like working with tabular data in SQL or pandas and building models with scikit-learn—while empowering vast speedups with GPU compute.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • The enormous performance improvements enabled by GPU computation on “regular” business data
  • How RAPIDS enables extremely easy use of GPU compute in Python—no writing CUDA kernels or working with low-level interfaces to hardware
  • How to scale up (with one or more GPUs) as well as scale out (to multiple servers) to handle any data challenge

And you’ll be able to:

  • Transform data and perform feature engineering and extraction with RAPIDS
  • Train or tune ML models with RAPIDS
  • Perform graph computations, clustering, dimensionality reduction, and other techniques
  • Realize a huge improvement in compute speed

This training course is for you because...

  • You’re a data scientist or engineer.
  • You work with large amounts of data or need much faster performance on modest-sized datasets.
  • You want to become a lead or architect for the next generation of data-intensive applications.

Prerequisites

  • A basic understanding of Python, NumPy, pandas, scikit-learn, and SQL

Recommended follow-up

About your instructor

  • Adam Breindel consults and teaches courses on Apache Spark, data engineering, machine learning, AI, and deep learning. He supports instructional initiatives as a senior instructor at Databricks, has taught classes on Apache Spark and deep learning for O'Reilly, and runs a business helping large firms and startups implement data and ML architectures. Adam’s first full-time job in tech was neural net–based fraud detection, deployed at North America's largest banks back; since then, he's worked with numerous startups, where he’s enjoyed getting to build things like mobile check-in for two of America's five biggest airlines years before the iPhone came out. He’s also worked in entertainment, insurance, and retail banking; on web, embedded, and server apps; and on clustering architectures, APIs, and streaming analytics.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction and overview (30 minutes)

  • Lecture: How GPUs differ from CPUs and why they offer such enormous performance; NumPy versus CuPy and PyTorch
  • Group discussion: Basic tabular computation on GPU
  • Hands-on exercise: Explore what’s on your wish list and what’s missing
  • Q&A

BlazingSQL (30 minutes)

  • Lecture: The GPU-accelerated SQL engine; querying files and data lakes
  • Group discussion: Current SQL engines and catalogs in use (Hive, Spark, etc.)
  • Hands-on exercise: Write an SQL report and run it on GPU
  • Q&A

Break (5 minutes)

cuDF: The heart of RAPIDS (30 minutes)

  • Lecture: CUDA-enabled data frames that work like pandas; performing common data engineering and processing tasks with cuDF
  • Group discussion: Current and future features
  • Hands-on exercise: Work with BlazingSQL result sets as CUDA DataFrames
  • Q&A

cuML: High-level machine learning tools (30 minutes)

  • Lecture: ML on the GPU with the ease of scikit-learn; feature engineering helpers
  • Group discussion: Current algorithm support
  • Hands-on exercise: Train a model on a GPU
  • Q&A

Break (5 minutes)

cuGraph: Graph analytics overview (20 minutes)

  • Lecture: Building graphs; PageRank; breadth-first search
  • Group discussion: Built-in algorithms
  • Hands-on exercise: Finding shortest paths
  • Q&A

End-to-end problem solving (30 minutes)

  • Lecture: Scaling to multiple GPUs or nodes; generating visualizations
  • Group discussion: Integrating with existing data engineering environments (e.g., Hadoop)
  • Q&A