Chapter 24
Overview of I/O Benchmarking
Katie Antypas and Yushu Yao
National Energy Research Scientific Computing Center, Lawrence Berkeley
National Laboratory
24.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
24.2 I/O Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
24.3 Why Profile I/O in Scientific Applications? . . . . . . . . . . . . . . . . . . . . . 283
24.4 Brief Introduction to I/O Profilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
24.5 I/O Profiling at NERSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
24.5.1 Application Profiling Case Studies .. . . . . . . . . . . . . . . . . . . . . 284
24.5.1.1 Checkpointing Too Frequently .. . . . . . . . . . . . 285
24.5.1.2 Reading Small Input Files from Every
Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
24.5.1.3 Using the Wrong File System . . . . . . . . . . . . . . 286
24.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
24.1 Introduction
For users of HPC systems, I/O remains a challenge in achieving high per-
formance on large-scale parallel systems. There are numerous reasons for I/O
bottlenecks. First, an I/O subsystem may be undersized for a particular HPC
compute partition. A great challenge for HPC centers is how much budget to
devote to components of a system. The balance of the I/O partition to the
compute partition depends on the system’s workload as well as the schedul-
ing policies. Second, depending on how a system is architected, concurrent
applications could be sharing limited I/O resources, leading to lower perfor-
mance. I/O subsystem resources that could produce increased latencies and
reduced bandwidth with multiple concurrent applications include contention
in I/O nodes, network components, metadata servers, spinning disk, amongst
others. Last, how a user reads and writes data can greatly affect application
performance (also discussed in Chapters 19–23). A user performing I/O, in
a non-optimal manner may see low performance because of these operations.
An application that performs many small writes may run into lock contention
279

Get High Performance Parallel I/O now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.