Get Started Scaling Your Database Infrastructure for High-Volume Big Data Applications
“Understanding Big Data Scalability presents the fundamentals of scaling databases from a single node to large clusters. It provides a practical explanation of what ‘Big Data’ systems are, and fundamental issues to consider when optimizing for performance and scalability. Cory draws on many years of experience to explain issues involved in working with data sets that can no longer be handled with single, monolithic relational databases.... His approach is particularly relevant now that relational data models are making a comeback via SQL interfaces to popular NoSQL databases and Hadoop distributions.... This book should be especially useful to database practitioners new to scaling databases beyond traditional single node deployments.” —Brian O’Krafka, software architect
Understanding Big Data Scalability presents a solid foundation for scaling Big Data infrastructure and helps you address each crucial factor associated with optimizing performance in scalable and dynamic Big Data clusters.
Database expert Cory Isaacson offers practical, actionable insights for every technical professional who must scale a database tier for high-volume applications. Focusing on today’s most common Big Data applications, he introduces proven ways to manage unprecedented data growth from widely diverse sources and to deliver real-time processing at levels that were inconceivable until recently.
Isaacson explains why databases slow down, reviews each major technique for scaling database applications, and identifies the key rules of database scalability that every architect should follow.
You’ll find insights and techniques proven with all types of database engines and environments, including SQL, NoSQL, and Hadoop. Two start-to-finish case studies walk you through planning and implementation, offering specific lessons for formulating your own scalability strategy. Coverage includes
Understanding the true causes of database performance degradation in today’s Big Data environments
Scaling smoothly to petabyte-class databases and beyond
Defining database clusters for maximum scalability and performance
Integrating NoSQL or columnar databases that aren’t “drop-in” replacements for RDBMSes
Scaling application components: solutions and options for each tier
Recognizing when to scale your data tier—a decision with enormous consequences for your application environment
Why data relationships may be even more important in non-relational databases
Why virtually every database scalability implementation still relies on sharding, and how to choose the best approach
How to set clear objectives for architecting high-performance Big Data implementations
The Big Data Scalability Series is a comprehensive, four-part series, containing information on many facets of database performance and scalability. Understanding Big Data Scalability is the first book in the series.
Learn more and join the conversation about Big Data scalability at bigdatascalability.com.
Table of contents
- About This eBook
- Title Page
- Copyright Page
- Praise for Understanding Big Data Scalability
- About the Author
- 1. Introduction
- 2. Why Databases Slow Down
- 3. What is Big Data?
- 4. Big Data in the Real World
- 5. Scaling Your Application
- 6. When to Scale Your Database
- 7. All Data Is Relational
- 8. It’s All About Sharding
- 9. Scaling Big Data: The Endgame
- Title: Understanding Big Data Scalability: Big Data Scalability Series, Part I
- Release date: July 2014
- Publisher(s): Pearson
- ISBN: 9780133599121
You might also like
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …
Kafka: The Definitive Guide
Every enterprise application creates data, whether it’s log messages, metrics, user activity, outgoing messages, or something …
Programming Rust, 2nd Edition
The Rust programming language offers the rare and valuable combination of statically verified memory safety and …
Python Data Science Handbook
For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, …