O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Understanding Big Data Scalability: Big Data Scalability Series, Part I

Book Description

Get Started Scaling Your Database Infrastructure for High-Volume Big Data Applications 

“Understanding Big Data Scalability presents the fundamentals of scaling databases from a single node to large clusters. It provides a practical explanation of what ‘Big Data’ systems are, and fundamental issues to consider when optimizing for performance and scalability. Cory draws on many years of experience to explain issues involved in working with data sets that can no longer be handled with single, monolithic relational databases.... His approach is particularly relevant now that relational data models are making a comeback via SQL interfaces to popular NoSQL databases and Hadoop distributions.... This book should be especially useful to database practitioners new to scaling databases beyond traditional single node deployments.” —Brian O’Krafka, software architect 

Understanding Big Data Scalability presents a solid foundation for scaling Big Data infrastructure and helps you address each crucial factor associated with optimizing performance in scalable and dynamic Big Data clusters.

Database expert Cory Isaacson offers practical, actionable insights for every technical professional who must scale a database tier for high-volume applications. Focusing on today’s most common Big Data applications, he introduces proven ways to manage unprecedented data growth from widely diverse sources and to deliver real-time processing at levels that were inconceivable until recently.

Isaacson explains why databases slow down, reviews each major technique for scaling database applications, and identifies the key rules of database scalability that every architect should follow.

You’ll find insights and techniques proven with all types of database engines and environments, including SQL, NoSQL, and Hadoop. Two start-to-finish case studies walk you through planning and implementation, offering specific lessons for formulating your own scalability strategy. Coverage includes 

  • Understanding the true causes of database performance degradation in today’s Big Data environments

  • Scaling smoothly to petabyte-class databases and beyond

  • Defining database clusters for maximum scalability and performance

  • Integrating NoSQL or columnar databases that aren’t “drop-in” replacements for RDBMSes

  • Scaling application components: solutions and options for each tier

  • Recognizing when to scale your data tier—a decision with enormous consequences for your application environment

  • Why data relationships may be even more important in non-relational databases

  • Why virtually every database scalability implementation still relies on sharding, and how to choose the best approach

  • How to set clear objectives for architecting high-performance Big Data implementations 

  • The Big Data Scalability Series is a comprehensive, four-part series, containing information on many facets of database performance and scalability. Understanding Big Data Scalability is the first book in the series.

    Learn more and join the conversation about Big Data scalability at bigdatascalability.com.

    Table of Contents

    1. Contents
    2. Big Data Scalability Series
      1. eBook I: Understanding Big Data Scalability
    3. 1. Introduction
      1. What you will Learn
      2. The Challenge of Big Data
      3. Today’s Big Data Explosion
      4. Background for this Book
      5. Why the focus on database sharding?
      6. Summary
    4. 2. Why Databases Slow Down
      1. The Database Slowdown Curve
      2. A Hard-won Lesson
      3. The Enemies of Database Performance
      4. How to Identify Database Slowdown Issues
      5. Summary
    5. 3. What is Big Data?
      1. What is Big Data Anyhow?
      2. Sources of Big Data
      3. Summary
    6. 4. Big Data in the Real World
      1. Big Data in the Real World
      2. ActiveStandards
      3. FullContact
      4. Social Point
      5. Summary
    7. 5. Scaling your Application
      1. The Goals of a Scalable Application Platform
      2. The Excitement of a High-Growth Success
      3. Application Scalability Fundamentals
      4. A Typical Online Application Architecture
      5. Analytics Application Architectures
      6. Scaling an Analytics Application
      7. How to Scale a Traditional Online Application
      8. Summary
    8. 6. When to Scale your Database
      1. The Last Mile of Application Scalability
      2. How do you Know When to Scale your Database?
      3. Options for Increasing Database Performance
      4. Indications of the Need for Scale
      5. Summary
    9. 7. All Data is Relational
      1. Relational Data Overview
      2. The Meaning of Data
      3. Relationships Matter
      4. Why Data Modelling is Critical to Success
      5. Summary
    10. 8. It’s all about Sharding
      1. Sharding: The Ultimate Answer to Database Slowdown
      2. The Laws of Databases
      3. Sharding Defined
      4. Black-Box Sharding
      5. Relational Sharding
      6. Summary
    11. 9. Scaling Big Data: The Endgame
      1. The Game of Big Data Scalability
      2. Scaling Big Data Theory
      3. The Big Data Endgame
      4. Data Locality
      5. Summary