Chapter 4

CPU and Memory Interaction

We have now measured CPU instruction times and memory access times. How do these interact?

Consider a matrix multiply program working on two double-precision arrays of dimension 1024 x 1024, writing to a third array of the same size, and running on our sample server as described in Appendix A—an x86 CPU with

32KB 8-way L1 data cache,

256KB 8-way L2 cache, and

12-way 3MB L3 cache,

all with a line size of 64 bytes. Figure 4.1 shows the L1 cache layout, consisting of 64 sets, each 8-way associative. For 64-byte lines, address bits <5:0> are the byte within a line, and address bits <11:6> select a set. A given memory line can go into any of the eight ways within that set. Each vertical way in this L1 cache holds ...

Get Understanding Software Dynamics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.