O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Effective Multi-Tenant Distributed Systems

Book Description

Organizations are eager to capitalize on real-time data analysis, move beyond batch processing for time-critical insights, and excel at big data in a predictable, reliable way. But performance has been an issue for distributed systems like Hadoop, especially when the use cases of a single cluster become multi-tenant or multi-workload. The worst part? You may not even know you have a performance issue.

In this report, Chad Carson and Sean Suchter from Pepperdata describe the performance challenges of running multi-tenant distributed computing environments, especially within a Hadoop context. After examining pros and cons of current solutions for these problems, you’ll learn how to use real-time, intelligent software that tracks and dynamically adjusts each application’s usage of physical hardware. Get ahead of your Hadoop operations for faster, better decision-making and faster, better business returns.

With this report, you’ll explore:

  • How Hadoop and other multi-tenant distributed systems work, and why performance matters
  • Business-visible symptoms of performance problems: late jobs, inconsistent runtimes, and underutilized hardware
  • Scheduling challenges in multi-tenant systems
  • Symptoms and solutions for CPU performance limitations
  • Physical and virtual limits of node memory—and what happens when you run out
  • Identifying and solving performance problems due to disk and network performance limits and other typical bottlenecks
  • Solutions for monitoring performance and accurately allocating cluster costs among users and business units

Table of Contents

  1. 1. Introduction to Multi-Tenant Distributed Systems
    1. The Benefits of Distributed Systems
    2. Performance Problems in Distributed Systems
      1. Scheduling
      2. Hardware Bottlenecks
    3. Lack of Visibility Within Multi-Tenant Distributed Systems
    4. The Impact on Business from Performance Problems
    5. Scope of This Book
      1. Hadoop: An Example Distributed System
      2. Terminology
  2. 2. Scheduling in Distributed Systems
    1. Introduction
    2. Dominant Resource Fairness Scheduling
    3. Aggressive Scheduling for Busy Queues
    4. Special Scheduling Treatment for Small Jobs
    5. Workload-Specific Scheduling Considerations
    6. Inefficiencies in Scheduling
      1. The Need to be Conservative with Memory
      2. Inability to Effectively Schedule the Use of Other Resources
      3. Deadlock and Starvation
      4. Waste Due to Speculative Execution
    7. Summary
  3. 3. CPU Performance Considerations
    1. Introduction
    2. Algorithm Efficiency
    3. Kernel Scheduling
      1. Intentional or Accidental Bad Actors
      2. Applying the Control Mechanisms in Multi-Tenant Distributed Systems
    4. I/O Waiting and CPU Cache Impacts
    5. Summary
  4. 4. Memory Usage in Distributed Systems
    1. Introduction
    2. Physical Versus Virtual Memory
    3. Node Thrashing
      1. Detecting and Avoiding Thrashing
    4. Kernel Out-Of-Memory Killer
    5. Implications of Memory-Intensive Workloads for Multi-Tenant Distributed Systems
      1. Solutions
    6. Summary
  5. 5. Disk Performance: Identifying and Eliminating Bottlenecks
    1. Introduction
    2. Overview of Disk Performance Limits
    3. Disk Behavior When Using Multiple Disks
    4. Disk Performance in Multi-Tenant Distributed Systems
    5. Controlling Disk I/O Usage to Improve Performance for High-Priority Applications
      1. Basic Disk I/O Prioritization Tools and Their Limitations
      2. Effective Control of Disk I/O Usage
    6. Solid-State Drives and Distributed Systems
    7. Measuring Performance and Diagnosing Problems
    8. Summary
  6. 6. Network Performance Limits: Causes and Solutions
    1. Introduction
    2. Bandwidth Problems in Distributed Systems
      1. Hadoop’s Solution to Network Bottlenecks: Move Computation to the Data
      2. Why Network Quality of Service Does Not Solve the Problem of Network Bottlenecks
      3. Controlling Network Usage on a Per-Application Basis
    3. Other Network-Related Bottlenecks and Problems
    4. Measuring Network Performance and Debugging Problems
      1. ping and mtr
      2. Retransmissions
    5. Summary
  7. 7. Other Bottlenecks in Distributed Systems
    1. Introduction
    2. NameNode Contention
    3. ResourceManager Contention
    4. ZooKeeper
    5. Locks
    6. External Databases and Related Systems
    7. DNS Servers
    8. Summary
  8. 8. Monitoring Performance: Challenges and Solutions
    1. Introduction
    2. Why Monitor?
    3. What to Monitor
    4. Systems and Performance Aspects of Monitoring
      1. Handling Huge Amounts of Metrics Data
      2. Reliability of the Monitoring System
      3. Some Commonly Used Monitoring Systems
    5. Algorithmic and Logical Aspects of Monitoring
      1. Challenges Specific to Multi-Tenant Distributed Systems
    6. Measuring the Effect of Attempted Improvements
    7. Allocating Cluster Costs Across Tenants
    8. Summary
  9. 9. Conclusion: Performance Challenges and Solutions for Effective Multi-Tenant Distributed Systems