16. Software Fundamentals

16.1 Introduction

We just went over a number of hardware bottlenecks you should take into account while you work on applications and data science pipelines. In this chapter, we’ll talk about important bottlenecks at the software level with regard to how data is stored.

Each of the following topics is an additional consideration to make as you’re designing your pipeline. Looking at your use case as it relates to these concepts will help you choose a more optimal storage engine as it relates to your specific use case.

16.2 Paging

In Chapter 15 we touched on how the inefficient storage of information on disk relates to overhead in retrieving that data. But how does that affect your application? Page/block size is well ...

Get Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications, First Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.