Chapter 12. Ray in the Enterprise

Deploying software in enterprise environments often comes with additional requirements, especially regarding security. Enterprise deployments tend to involve multiple stakeholders and need to provide service to a larger group of scientists/engineers. While not required, many enterprise clusters tend to have some form of multitenancy to allow more efficient use of resources (including human resources, such as operational staff).

Ray Dependency Security Issues

Unfortunately, Ray’s default requirements file brings in some insecure libraries. Many enterprise environments have some kind of container scanning or similar system to detect such issues.1 In some cases, you can simply remove or upgrade the dependency issues flagged, but when Ray includes the dependencies in its wheel (e.g., the Apache Log4j issue), limiting yourself to prebuilt wheels has serious drawbacks. If you find a Java or native library flagged, you will need to rebuild Ray from source with the version upgraded. Derwen.ai has an example of doing this for Docker in its ray_base repo.

Interacting with the Existing Tools

Enterprise deployments often involve interaction with existing tools and the data they produce. Some potential points for integration here are using Ray’s dataset-generic Arrow interface to interact with other tools. When data is stored “at rest,” Parquet is the best format for interaction with other tools.

Using Ray with CI/CD Tools

When working in large teams, ...

Get Scaling Python with Ray now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.