Chapter 7. Input/Output Operations

It is a capital mistake to theorize before one has data.

Sherlock Holmes

As a general rule, the majority of data, be it in a finance context or any other application area, is stored on hard disk drives (HDDs) or some other form of permanent storage device, like solid state disks (SSDs) or hybrid disk drives. Storage capacities have been steadily increasing over the years, while costs per storage unit (e.g., megabytes) have been steadily falling.

At the same time, stored data volumes have been increasing at a much faster pace than the typical random access memory (RAM) available even in the largest machines. This makes it necessary not only to store data to disk for permanent storage, but also to compensate for lack of sufficient RAM by swapping data from RAM to disk and back.

Input/output (I/O) operations are therefore generally very important tasks when it comes to finance applications and data-intensive applications in general. Often they represent the bottleneck for performance-critical computations, since I/O operations cannot in general shuffle data fast enough to the RAM[28] and from the RAM to the disk. In a sense, CPUs are often “starving” due to slow I/O operations.

Although the majority of today’s financial and corporate analytics efforts are confronted with “big” data (e.g., of petascale size), single analytics tasks generally use data (sub)sets that fall in the “mid” data category. A recent study concluded:

Our measurements as well as ...

Get Python for Finance now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.