251
15
HighPerformanceProgramming
withDataOrientedDesign
Noel Llopis
Snappy Touch
Common programming wisdom used to encourage delaying optimizations until
later in the project, and then optimizing only those parts that were obvious bot-
tlenecks in the profiler. That approach worked well with glaring inefficiencies,
like particularly slow algorithms or code that is called many times per frame. In a
time when CPU clock cycles were a good indication of performance, that was a
good approach to follow. Things have changed a lot in today’s hardware, and we
have all experienced the situation where, after fixing the obvious culprits, no sin-
gle function stands out in the profiler but performance remains subpar. Data-
oriented design helps address this problem by architecting the game with
memory accesses and parallelization from the beginning.
15.1ModernHardware
Modern hardware can be characterized by having multiple execution cores and
deep memory hierarchies. The reason for the complex memory hierarchies is due
to the gap between CPU power and memory access times. Gone are the days
when CPU instructions took about the same time as a main memory access. In-
stead, this gap continues to increase and shows no signs of stopping (see Fig-
ure 15.1).
Different parts of the memory hierarchy have different access times. The
smaller ones closer to the CPU are the fastest ones, whereas main memory can be
really large, but also very slow. Table 15.1 lists some common access times for
different levels of the hierarchy on modern platforms.
252 15.HighPerformanceProgrammingwithDataOrientedDesign
Figure 15.1. Relative CPU and memory performance over time.
With these kinds of access times, it’s very likely that the CPU is going to
stall waiting to read data from memory. All of a sudden, performance is not de-
termined so much by how efficient the program executing on the CPU is, but
how efficiently it uses memory.
Barring a radical technology change, this is not a situation that’s about to
change anytime soon. We’ll continue getting more powerful, wider CPUs and
larger memories that are going to make memory access even more problematic in
the future.
Looking at code from a memory access point of view, the worst-case situa-
tion would be a program accessing heterogeneous trees of data scattered all over
memory, executing different code at each node. There we get not just the con-
stant data cache misses but also bad instruction cache utilization because it’s call-
ing different functions. Does that sound like a familiar situation? That’s how
most modern games are architected: large trees of different kinds of objects with
polymorphic behavior.
What’s even worse is that bad memory access patterns will bring a program
down to its metaphorical knees, but that’s not a problem that’s likely to appear
anywhere in the profiler. Instead, it will result in the common situation of every-
thing being slower than we expected, but us not being able to point to a particular
spot. That’s because there isn’t a single place that we can fix. Instead, we need to
change the whole architecture, preferably from the beginning, and use a data-
oriented approach.
1
10
100
1000
10,000
1980 1985 1990 1995 2000 2005
C
P
U
:
2
×
e
v
e
r
y
2
y
e
a
rs
D
R
A
M:
2
×
e
v
e
r
y
6
y
e
a
r
s
Relative performance
Gap

Get Game Engine Gems 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.