16. Software Fundamentals

16.1 Introduction

We just went over a number of hardware bottlenecks you should take into account while you work on applications and data science pipelines. In this chapter, we’ll talk about important bottlenecks at the software level with regard to how data is stored.

Each of the following topics is an additional consideration to make as you’re designing your pipeline. Looking at your use case as it relates to these concepts will help you choose a more optimal storage engine as it relates to your specific use case.

16.2 Paging

In Chapter 15 we touched on how the inefficient storage of information on disk relates to overhead in retrieving that data. But how does that affect your application? Page/block size is well ...

Get Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.