Chapter 17. Working with Large Databases
In the early days of relational databases, hard drive capacity was measured in megabytes, and databases were generally easy to administer simply because they couldn’t get very large. Today, however, hard drive capacity has ballooned to 15 TB, a modern disk array can store more than 4 PB of data, and storage in the cloud is essentially limitless. While relational databases face various challenges as data volumes continue to grow, there are strategies such as partitioning, clustering, and sharding that allow companies to continue to utilize relational databases by spreading data across multiple storage tiers and servers. Other companies have decided to move to big data platforms such as Hadoop in order to handle huge data volumes. This chapter looks at some of these strategies, with an emphasis on techniques for scaling relational databases.
Partitioning
When exactly does a database table become “too big”? If you ask this question to 10 different data architects/administrators/developers, you will likely get 10 different answers. Most people, however, would agree that the following tasks become more difficult and/or time consuming as a table grows past a few million rows:
- Query execution requiring full table scans
- Index creation/rebuild
- Data archival/deletion
- Generation of table/index statistics
- Table relocation (e.g., move to a different tablespace)
- Database backups
These tasks can start as routine when a database is small, then become ...