5 Practicing scalability and performance

This chapter covers

  • Developing a realistic, performant data science project iteratively
  • Using the compute layer to power demanding operations, such as parallelized model training
  • Optimizing the performance of numerical Python code
  • Using various techniques to make your workflows more scalable and performant

In the previous chapter, we discussed how scalability is not only about being able to handle more demanding algorithms or handle more data. At the organizational level, the infrastructure should scale to a large number of projects developed by a large number of people. We recognized that scalability and performance are separate concerns—you can have one without the other. In fact, the different dimensions ...

Get Effective Data Science Infrastructure now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.