book

High Performance MySQL, 3rd Edition

by Baron Schwartz, Peter Zaitsev, Vadim Tkachenko

March 2012

Intermediate to advanced

823 pages

29h 40m

English

O'Reilly Media, Inc.

Read now

Unlock full access

How This Book Is OrganizedA Broad OverviewBuilding a Solid FoundationConfiguring Your ApplicationMySQL as an Infrastructure ComponentMiscellaneous Useful TopicsSoftware Versions and AvailabilityConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgments for the Third EditionAcknowledgments for the Second EditionFrom BaronFrom PeterFrom VadimFrom ArjenAcknowledgments for the First EditionFrom JeremyFrom Derek
MySQL’s Logical ArchitectureConnection Management and SecurityOptimization and ExecutionConcurrency ControlRead/Write LocksLock GranularityTable locksRow locksTransactionsIsolation LevelsDeadlocksTransaction LoggingTransactions in MySQLAUTOCOMMITMixing storage engines in transactionsImplicit and explicit lockingMultiversion Concurrency ControlMySQL’s Storage EnginesThe InnoDB EngineInnoDB’s historyInnoDB overviewThe MyISAM EngineStorageMyISAM featuresCompressed MyISAM tablesMyISAM performanceOther Built-in MySQL EnginesThe Archive engineThe Blackhole engineThe CSV engineThe Federated engineThe Memory engineThe Merge storage engineThe NDB Cluster engineThird-Party Storage EnginesOLTP storage enginesColumn-oriented storage enginesCommunity storage enginesSelecting the Right EngineLoggingRead-only or read-mostly tablesOrder processingBulletin boards and threaded discussion forumsCD-ROM applicationsLarge data volumesTable ConversionsALTER TABLEDump and importCREATE and SELECTA MySQL TimelineMySQL’s Development ModelSummary
Why Benchmark?Benchmarking StrategiesWhat to MeasureBenchmarking TacticsDesigning and Planning a BenchmarkHow Long Should the Benchmark Last?Capturing System Performance and StatusGetting Accurate ResultsRunning the Benchmark and Analyzing ResultsThe Importance of PlottingBenchmarking ToolsFull-Stack ToolsSingle-Component ToolsBenchmarking Exampleshttp_loadMySQL Benchmark SuitesysbenchThe sysbench CPU benchmarkThe sysbench file I/O benchmarkThe sysbench OLTP benchmarkOther sysbench featuresdbt2 TPC-C on the Database Test SuitePercona’s TPCC-MySQL ToolSummary
Introduction to Performance OptimizationOptimization Through ProfilingInterpreting the ProfileProfiling Your ApplicationInstrumenting PHP ApplicationsProfiling MySQL QueriesProfiling a Server’s WorkloadCapturing MySQL’s queries to a logAnalyzing the query logProfiling a Single QueryUsing SHOW PROFILEUsing SHOW STATUSUsing the slow query logUsing the Performance SchemaUsing the Profile for OptimizationDiagnosing Intermittent ProblemsSingle-Query Versus Server-Wide ProblemsUsing SHOW GLOBAL STATUSUsing SHOW PROCESSLISTUsing query loggingMaking sense of the findingsCapturing Diagnostic DataThe diagnostic triggerWhat kinds of data should you collect?Interpreting the dataA Case Study in DiagnosticsOther Profiling ToolsUsing the USER_STATISTICS TablesUsing straceSummary
Choosing Optimal Data TypesWhole NumbersReal NumbersString TypesVARCHAR and CHAR typesBLOB and TEXT typesUsing ENUM instead of a string typeDate and Time TypesBit-Packed Data TypesChoosing IdentifiersSpecial Types of DataSchema Design Gotchas in MySQLNormalization and DenormalizationPros and Cons of a Normalized SchemaPros and Cons of a Denormalized SchemaA Mixture of Normalized and DenormalizedCache and Summary TablesMaterialized ViewsCounter TablesSpeeding Up ALTER TABLEModifying Only the .frm FileBuilding MyISAM Indexes QuicklySummary
Indexing BasicsTypes of IndexesB-Tree indexesTypes of queries that can use a B-Tree indexHash indexesBuilding your own hash indexesHandling hash collisionsSpatial (R-Tree) indexesFull-text indexesOther types of indexBenefits of IndexesIndexing Strategies for High PerformanceIsolating the ColumnPrefix Indexes and Index SelectivityMulticolumn IndexesChoosing a Good Column OrderClustered IndexesComparison of InnoDB and MyISAM data layoutMyISAM’s data layoutInnoDB’s data layoutInserting rows in primary key order with InnoDBCovering IndexesUsing Index Scans for SortsPacked (Prefix-Compressed) IndexesRedundant and Duplicate IndexesUnused IndexesIndexes and LockingAn Indexing Case StudySupporting Many Kinds of FilteringAvoiding Multiple Range ConditionsOptimizing SortsIndex and Table MaintenanceFinding and Repairing Table CorruptionUpdating Index StatisticsReducing Index and Data FragmentationSummary
Why Are Queries Slow?Slow Query Basics: Optimize Data AccessAre You Asking the Database for Data You Don’t Need?Is MySQL Examining Too Much Data?Response timeRows examined and rows returnedRows examined and access typesWays to Restructure QueriesComplex Queries Versus Many QueriesChopping Up a QueryJoin DecompositionQuery Execution BasicsThe MySQL Client/Server ProtocolQuery statesThe Query CacheThe Query Optimization ProcessThe parser and the preprocessorThe query optimizerTable and index statisticsMySQL’s join execution strategyThe execution planThe join optimizerSort optimizationsThe Query Execution EngineReturning Results to the ClientLimitations of the MySQL Query OptimizerCorrelated SubqueriesWhen a correlated subquery is goodUNION LimitationsIndex Merge OptimizationsEquality PropagationParallel ExecutionHash JoinsLoose Index ScansMIN() and MAX()SELECT and UPDATE on the Same TableQuery Optimizer HintsOptimizing Specific Types of QueriesOptimizing COUNT() QueriesWhat COUNT() doesMyths about MyISAMSimple optimizationsUsing an approximationMore complex optimizationsOptimizing JOIN QueriesOptimizing SubqueriesOptimizing GROUP BY and DISTINCTOptimizing GROUP BY WITH ROLLUPOptimizing LIMIT and OFFSETOptimizing SQL_CALC_FOUND_ROWSOptimizing UNIONStatic Query AnalysisUsing User-Defined VariablesOptimizing ranking queriesAvoiding retrieving the row just modifiedCounting UPDATEs and INSERTsMaking evaluation order deterministicWriting a lazy UNIONOther uses for variablesCase StudiesBuilding a Queue Table in MySQLComputing the Distance Between PointsUsing User-Defined FunctionsSummary
Partitioned TablesHow Partitioning WorksTypes of PartitioningHow to Use PartitioningWhat Can Go WrongOptimizing QueriesMerge TablesViewsUpdatable ViewsPerformance Implications of ViewsLimitations of ViewsForeign Key ConstraintsStoring Code Inside MySQLStored Procedures and FunctionsTriggersEventsPreserving Comments in Stored CodeCursorsPrepared StatementsPrepared Statement OptimizationThe SQL Interface to Prepared StatementsLimitations of Prepared StatementsUser-Defined FunctionsPluginsCharacter Sets and CollationsHow MySQL Uses Character SetsDefaults for creating objectsSettings for client/server communicationHow MySQL compares valuesSpecial-case behaviorsChoosing a Character Set and CollationHow Character Sets and Collations Affect QueriesFull-Text SearchingNatural-Language Full-Text SearchesBoolean Full-Text SearchesFull-Text Changes in MySQL 5.1Full-Text Tradeoffs and WorkaroundsFull-Text Configuration and OptimizationDistributed (XA) TransactionsInternal XA TransactionsExternal XA TransactionsThe MySQL Query CacheHow MySQL Checks for a Cache HitHow the Cache Uses MemoryWhen the Query Cache Is HelpfulHow to Configure and Maintain the Query CacheReducing fragmentationImproving query cache usageInnoDB and the Query CacheGeneral Query Cache OptimizationsAlternatives to the Query CacheSummary
How MySQL’s Configuration WorksSyntax, Scope, and DynamismSide Effects of Setting VariablesGetting StartedIterative Optimization by BenchmarkingWhat Not to DoCreating a MySQL Configuration FileInspecting MySQL Server Status VariablesConfiguring Memory UsageHow Much Memory Can MySQL Use?Per-Connection Memory NeedsReserving Memory for the Operating SystemAllocating Memory for CachesThe InnoDB Buffer PoolThe MyISAM Key CachesThe MyISAM key block sizeThe Thread CacheThe Table CacheThe InnoDB Data DictionaryConfiguring MySQL’s I/O BehaviorInnoDB I/O ConfigurationThe InnoDB transaction logLog file size and the log bufferHow InnoDB flushes the log bufferHow InnoDB opens and flushes log and data filesThe InnoDB tablespaceConfiguring the tablespaceOld row versions and the tablespaceThe doublewrite bufferOther I/O configuration optionsMyISAM I/O ConfigurationConfiguring MySQL ConcurrencyInnoDB Concurrency ConfigurationMyISAM Concurrency ConfigurationWorkload-Based ConfigurationOptimizing for BLOB and TEXT WorkloadsOptimizing for FilesortsCompleting the Basic ConfigurationSafety and Sanity SettingsAdvanced InnoDB SettingsSummary

What Limits MySQL’s Performance?How to Select CPUs for MySQLWhich Is Better: Fast CPUs or Many CPUs?CPU ArchitectureScaling to Many CPUs and CoresBalancing Memory and Disk ResourcesRandom Versus Sequential I/OCaching, Reads, and WritesWhat’s Your Working Set?Finding an Effective Memory-to-Disk RatioChoosing Hard DisksSolid-State StorageAn Overview of Flash MemoryFlash TechnologiesBenchmarking Flash StorageSolid-State Drives (SSDs)Using RAID with SSDsPCIe Storage DevicesOther Types of Solid-State StorageWhen Should You Use Flash?Using FlashcacheOptimizing MySQL for Solid-State StorageChoosing Hardware for a ReplicaRAID Performance OptimizationRAID Failure, Recovery, and MonitoringBalancing Hardware RAID and Software RAIDRAID Configuration and CachingThe RAID stripe chunk sizeThe RAID cacheStorage Area Networks and Network-Attached StorageSAN BenchmarksUsing a SAN over NFS or SMBMySQL Performance on a SANShould You Use a SAN?Using Multiple Disk VolumesNetwork ConfigurationChoosing an Operating SystemChoosing a FilesystemChoosing a Disk Queue SchedulerThreadingSwappingOperating System StatusHow to Read vmstat OutputHow to Read iostat OutputOther Helpful ToolsA CPU-Bound MachineAn I/O-Bound MachineA Swapping MachineAn Idle MachineSummary
Replication OverviewProblems Solved by ReplicationHow Replication WorksSetting Up ReplicationCreating Replication AccountsConfiguring the Master and ReplicaStarting the ReplicaInitializing a Replica from Another ServerRecommended Replication ConfigurationReplication Under the HoodStatement-Based ReplicationRow-Based ReplicationStatement-Based or Row-Based: Which Is Better?Replication FilesSending Replication Events to Other ReplicasReplication FiltersReplication TopologiesMaster and Multiple ReplicasMaster-Master in Active-Active ModeMaster-Master in Active-Passive ModeMaster-Master with ReplicasRing ReplicationMaster, Distribution Master, and ReplicasTree or PyramidCustom Replication SolutionsSelective replicationSeparating functionsData archivingUsing replicas for full-text searchesRead-only replicasEmulating multisource replicationCreating a log serverReplication and Capacity PlanningWhy Replication Doesn’t Help Scale WritesWhen Will Replicas Begin to Lag?Plan to UnderutilizeReplication Administration and MaintenanceMonitoring ReplicationMeasuring Replication LagDetermining Whether Replicas Are Consistent with the MasterResyncing a Replica from the MasterChanging MastersPlanned promotionsUnplanned promotionsLocating the desired log positionsSwitching Roles in a Master-Master ConfigurationReplication Problems and SolutionsErrors Caused by Data Corruption or LossUsing Nontransactional TablesMixing Transactional and Nontransactional TablesNondeterministic StatementsDifferent Storage Engines on the Master and ReplicaData Changes on the ReplicaNonunique Server IDsUndefined Server IDsDependencies on Nonreplicated DataMissing Temporary TablesNot Replicating All UpdatesLock Contention Caused by InnoDB Locking SelectsWriting to Both Masters in Master-Master ReplicationExcessive Replication LagDon’t duplicate the expensive part of writesDo writes in parallel outside of replicationPrime the cache for the replication threadOversized Packets from the MasterLimited Replication BandwidthNo Disk SpaceReplication LimitationsHow Fast Is Replication?Advanced Features in MySQL ReplicationOther Replication TechnologiesSummary
What Is Scalability?A Formal DefinitionScaling MySQLPlanning for ScalabilityBuying Time Before ScalingScaling UpScaling OutFunctional partitioningData shardingChoosing a partitioning keyMultiple partitioning keysQuerying across shardsAllocating data, shards, and nodesArranging shards on nodesFixed allocationDynamic allocationMixing dynamic and fixed allocationExplicit allocationRebalancing shardsGenerating globally unique IDsTools for shardingScaling by ConsolidationScaling by ClusteringMySQL Cluster (NDB Cluster)ClustrixScaleBaseGenieDBAkibanScaling BackKeeping active data separateLoad BalancingConnecting DirectlySplitting reads and writes in replicationChanging the application configurationChanging DNS namesMoving IP addressesIntroducing a MiddlemanLoad balancersLoad-balancing algorithmsAdding and removing servers in the poolLoad Balancing with a Master and Multiple ReplicasSummary
What Is High Availability?What Causes Downtime?Achieving High AvailabilityImproving Mean Time Between FailuresImproving Mean Time to RecoveryAvoiding Single Points of FailureShared Storage or Replicated DiskSynchronous MySQL ReplicationMySQL ClusterPercona XtraDB ClusterReplication-Based RedundancyFailover and FailbackPromoting a Replica or Switching RolesVirtual IP Addresses or IP TakeoverMiddleman SolutionsHandling Failover in the ApplicationSummary
Benefits, Drawbacks, and Myths of the CloudThe Economics of MySQL in the CloudMySQL Scaling and HA in the CloudThe Four Fundamental ResourcesMySQL Performance in Cloud HostingBenchmarks for MySQL in the CloudMySQL Database as a Service (DBaaS)Amazon RDSOther DBaaS SolutionsSummary
Common ProblemsWeb Server IssuesFinding the Optimal ConcurrencyCachingCaching Below the ApplicationApplication-Level CachingCache Control PoliciesCache Object HierarchiesPregenerating ContentThe Cache as an Infrastructure ComponentUsing HandlerSocket and memcached AccessExtending MySQLAlternatives to MySQLSummary
Why Backups?Defining Recovery RequirementsDesigning a MySQL Backup SolutionOnline or Offline Backups?Logical or Raw Backups?Logical backupsRaw backupsWhat to Back UpIncremental and differential backupsStorage Engines and ConsistencyData consistencyFile consistencyReplicationManaging and Backing Up Binary LogsThe Binary Log FormatPurging Old Binary Logs SafelyBacking Up DataMaking a Logical BackupSQL dumpsDelimited file backupsFilesystem SnapshotsHow LVM snapshots workPrerequisites and configurationCreating, mounting, and removing an LVM snapshotLVM snapshots for online backupsLock-free InnoDB backups with LVM snapshotsPlanning for LVM backupsOther uses and alternativesRecovering from a BackupRestoring Raw FilesStarting MySQL after restoring raw filesRestoring Logical BackupsLoading SQL filesLoading delimited filesPoint-in-Time RecoveryMore Advanced Recovery TechniquesDelayed replication for fast recoveryRecovering with a log serverInnoDB Crash RecoveryCauses of InnoDB corruptionHow to recover corrupted InnoDB dataBackup and Recovery ToolsMySQL Enterprise BackupPercona XtraBackupmylvmbackupZmanda Recovery ManagermydumpermysqldumpScripting BackupsSummary
Interface ToolsCommand-Line UtilitiesSQL UtilitiesMonitoring ToolsOpen Source Monitoring ToolsCommercial Monitoring SystemsCommand-Line Monitoring with InnotopSummary
Percona ServerMariaDBDrizzleOther MySQL VariantsSummary
System VariablesSHOW STATUSThread and Connection StatisticsBinary Logging StatusCommand CountersTemporary Files and TablesHandler OperationsMyISAM Key BufferFile DescriptorsQuery CacheSELECT TypesSortsTable LockingInnoDB-SpecificPlugin-SpecificSHOW ENGINE INNODB STATUSHeaderSEMAPHORESLATEST FOREIGN KEY ERRORLATEST DETECTED DEADLOCKTRANSACTIONSFILE I/OINSERT BUFFER AND ADAPTIVE HASH INDEXLOGBUFFER POOL AND MEMORYROW OPERATIONSSHOW PROCESSLISTSHOW ENGINE INNODB MUTEXReplication StatusThe INFORMATION_SCHEMAInnoDB TablesTables in Percona ServerThe Performance SchemaSummary
Copying FilesA Naïve ExampleA One-Step MethodAvoiding Encryption OverheadOther OptionsFile Copy Benchmarks
Invoking EXPLAINRewriting Non-SELECT QueriesThe Columns in EXPLAINThe id ColumnThe select_type ColumnThe table ColumnDerived tables and unionsAn example of complex SELECT typesThe type ColumnThe possible_keys ColumnThe key ColumnThe key_len ColumnThe ref ColumnThe rows ColumnThe filtered ColumnThe Extra ColumnTree-Formatted OutputImprovements in MySQL 5.6
Lock Waits at the Server LevelTable LocksFinding out who holds a lockThe Global Read LockName LocksUser LocksLock Waits in InnoDBUsing the INFORMATION_SCHEMA Tables
A Typical Sphinx SearchWhy Use Sphinx?Efficient and Scalable Full-Text SearchingApplying WHERE Clauses EfficientlyFinding the Top Results in OrderOptimizing GROUP BY QueriesGenerating Parallel Result SetsScalingAggregating Sharded DataArchitectural OverviewInstallation OverviewTypical Partition UseSpecial FeaturesPhrase Proximity RankingSupport for AttributesFilteringThe SphinxSE Pluggable Storage EngineAdvanced Performance ControlPractical Implementation ExamplesFull-Text Searching on Mininova.orgFull-Text Searching on BoardReader.comOptimizing Selects on Sahibinden.comOptimizing GROUP BY on BoardReader.comOptimizing Sharded JOIN Queries on Grouply.comSummary

Content preview from High Performance MySQL, 3rd Edition

Chapter 1. MySQL Architecture and History

MySQL is very different from other database servers, and its architectural characteristics make it useful for a wide range of purposes as well as making it a poor choice for others. MySQL is not perfect, but it is flexible enough to work well in very demanding environments, such as web applications. At the same time, MySQL can power embedded applications, data warehouses, content indexing and delivery software, highly available redundant systems, online transaction processing (OLTP), and much more.

To get the most from MySQL, you need to understand its design so that you can work with it, not against it. MySQL is flexible in many ways. For example, you can configure it to run well on a wide range of hardware, and it supports a variety of data types. However, MySQL’s most unusual and important feature is its storage-engine architecture, whose design separates query processing and other server tasks from data storage and retrieval. This separation of concerns lets you choose how your data is stored and what performance, features, and other characteristics you want.

This chapter provides a high-level overview of the MySQL server architecture, the major differences between the storage engines, and why those differences are important. We’ll finish with some historical context and benchmarks. We’ve tried to explain MySQL by simplifying the details and showing examples. This discussion will be useful for those new to database servers as well as readers ...