CHAPTER 10Performance and Productivity

As important as it is to know the technical intricacies of designing high-performance large-scale systems, it is equally important to understand how to quantify performance and find bottlenecks. In this chapter, we will look at different performance models, profiling techniques, and factors affecting them. Productivity goes hand in hand with performance, so we will dive into details such as the pros and cons of programming languages, libraries, and special hardware as well.

Performance Metrics

Understanding the performance of large-scale data-intensive applications is necessary to decide on the resource requirements of applications. The performance of a cluster is not the sum of its individual parts. There are metrics developed to measure the performance of hardware so we can specify the requirements to the vendors.

Furthermore, we need to measure the performance of our applications to see whether they are efficient in using the available resources. In a distributed setting, resources can be wasted easily, and finding issues is a labor-intensive process. There are standard metrics used by application developers to measure the performance of distributed parallel applications.

System Performance Metrics

There are metrics developed to measure the performance of clusters that run large-scale computations. We use these to find out the general capabilities of a cluster.

  • FLOPS—Even though there is no single number that can accurately represent ...

Get Foundations of Data Intensive Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.