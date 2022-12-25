Scaling Python with Ray

Scaling Python with Ray

by Holden Karau, Boris Lublinsky
Released December 2022
Publisher(s): O'Reilly Media, Inc.
ISBN: 9781098118808

Book description

Serverless computing enables developers to concentrate solely on their applications rather than worry about where they've been deployed. With the Ray general-purpose serverless implementation in Python, programmers and data scientists can hide servers, implement stateful applications, support direct communication between tasks, and access hardware accelerators.

In this book, authors Holden Karau and Boris Lublinsky show you how to scale existing Python applications and pipelines, allowing you to stay in the Python ecosystem while avoiding single points of failure and manual scheduling. If your data processing has grown beyond what a single computer can handle, this book is for you.

Written by experienced software architecture practitioners, Scaling Python with Ray is ideal for software architects and developers eager to explore successful case studies and learn more about decision and measurement effectiveness. This book covers distributed processing (the pure Python implementation of serverless) and shows you how to:

  • Implement stateful applications with Ray actors
  • Build workflow management in Ray
  • Use Ray as a unified platform for batch and streaming
  • Implement advanced data processing with Ray
  • Apply microservices with Ray platform
  • Implement reliable Ray applications

Table of contents

  1. Preface
    1. Acknowledgments
    2. A note on responsibility
    3. License
    4. Code Examples
  2. 1. What Is Ray?
    1. Why Do You Need It?
    2. Where Can You Run Ray?
    3. Running Your Code with Ray
    4. Where Does It Fit in the Ecosystem?
      1. “Big” Data / Scalable Dataframes
      2. Machine Learning
      3. Workflow Scheduling
      4. Streaming
      5. Interactive
    5. What Ray Is Not
    6. Conclusion
  3. 2. Getting Started with Ray (Locally)
    1. Installing
      1. Installing (for x86 and Mac ARM)
    2. Hello Worlds
      1. Ray Remote (Task/Futures) Hello World
      2. Data Hello World
      3. Actor Hello World
    3. Conclusion
  4. 3. Ray Remote Functions
    1. Understanding Essentials of Ray Remote Functions
    2. Composition of Remote Ray Functions
    3. Ray Remote Best Practices
    4. Bringing It Together with an Example
    5. Conclusion
  5. 4. Remote Actors
    1. Understanding the Actor model
    2. Basic Ray Remote Actor
    3. Implementing Actor’s Persistence
    4. Scaling Ray Remote Actors
    5. Ray Remote Actors’ Best Practices
    6. Conclusion
  6. 5. Ray Design Details
    1. Fault Tolerance
    2. Ray Objects
    3. Serialization/Pickling
      1. Cloudpickle
      2. Apache Arrow
    4. Resources/Vertical Scaling
    5. Autoscaler
    6. Placement Groups - Organizing your tasks & actors
    7. Namespaces
    8. Managing Dependencies with Runtime Environments
    9. Deploying Ray Applications with the Ray Job API
    10. Conclusion
  7. 6. Implementing streaming applications
    1. Apache Kafka
      1. Basic Kafka concepts
      2. Kafka APIs
    2. Using Kafka with Ray
      1. Scaling our implementation
    3. Building stream processing applications with Ray
      1. Implementing stateful stream processing
    4. Going beyond Kafka
    5. Conclusion
  8. 7. Implementing Microservices
    1. Microservice architecture in Ray
      1. Deployment
      2. Additional deployment capabilities
    2. Using Ray Serve for model serving
      1. Simple model service example
      2. Considerations for model serving implementations
      3. Implementing Speculative Model serving using Ray microservice framework
    3. Conclusion
  9. 8. Advanced Data With Ray
    1. Creating and Saving Ray Datasets
    2. Using Ray Datasets with Different Tools
    3. Tools on Ray Datasets
      1. Pandas-Like DataFrames with Dask
      2. Indexing
      3. Shuffles
      4. Embarrassingly Parallel Operations
      5. Working with Multiple DataFrames
      6. What does not work
      7. What’s Slower
      8. Handling Recursive Algorithms
      9. What other functions are different
      10. Pandas Like DataFrames with Modin
      11. Big Data with Spark
      12. Working with Local Tools
    4. Built-in Ray DataSet operations
    5. How Ray Datasets are Implemented
    6. Conclusion
