Chapter 5. Optimizing Cloud Data Lake Architectures for Performance
Simplicity is the ultimate sophistication.
Leonardo da Vinci
Performance in its simplest terms can be defined as the timeliness of work completed. Having said that, this is probably one of the most loaded terms when it comes to cloud services, simply because there is no single measure for performance. In this chapter, we will peel back the layers of performance, building a good understanding of what performance means, the various dimensions associated with measuring performance when it comes to a cloud data lake, and the strategies that help optimize and tune your cloud data lake for the best performance. We will also use Klodars Corporation to illustrate these concepts and strategies.
Basics of Measuring Performance
When thinking of performance, I can say with a certain degree of confidence that you are assuming something related to speed, such as a runner crossing the finish line with a personal record. The common goal is that both strive to successfully complete their tasks and achieve a desired outcome, meeting or exceeding the spectatorsâ expectations. In a similar vein, in a cloud data lake, performance refers to the process of setting targets for the tasks to be done and ensuring that the tasks are completed within the set targets.
The performance of a task has two aspects to it, and any measure of performance needs to incorporate these two elements:
- Response time
-
How long did it take for the task ...
Get The Cloud Data Lake now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.