Appendix C. Debugging with Ray
Depending on your debugging techniques, moving to distributed systems could require a new set of techniques. Thankfully, tools like Pdb and PyCharm allow you to connect remote debuggers, and Ray’s local mode can allow you to use your existing debugging tools in many other situations. Some errors happen outside Python, making them more difficult to debug, like container out-of-memory (OOM) errors, segmentation faults, and other native errors.
Note
Some components of this appendix are shared with Scaling Python with Dask, as they are general good advice for debugging all types of distributed systems.
General Debugging Tips with Ray
You likely have your own standard debugging techniques for working with Python code, and this appendix is not meant to replace them. Here are some general techniques that make more sense with Ray:
-
Break up failing functions into smaller functions. Since
ray.remote
schedules on the block of a function, smaller functions make it easier to isolate the problem. -
Be careful about any unintended scope capture.
-
Sample data and try to reproduce it locally (local debugging is often easier).
-
Use Mypy for type checking. While we haven’t included types in all our examples, liberal type usage can catch tricky errors in production code.
-
When issues appear regardless of parallelization, debug your code in single-threaded mode, where it can be easier to understand what’s going on.
Now, with those additional general tips, ...
Get Scaling Python with Ray now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.