O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

PostgreSQL 9.6 High Performance

Book Description

Enhance the performance of your PostgreSQL system with this handy guide while avoiding common pitfalls that can slow it down.

About This Book
  • Learn the right techniques to obtain optimal PostgreSQL database performance, ranging from initial design to routine maintenance
  • Fine tune the performance of your queries and avoid the common pitfalls that can slow your system down
  • Contains tips and tricks on scaling successful database installations, and ensuring a highly available PostgreSQL solution
Who This Book Is For

This book is for intermediate to advanced database administrators and developers who use or plan to exploit the features of PostgreSQL in the best possible manner. While administrators can benefit from the topics related to the installation, configuration, and optimization of the server, developers will learn how to write optimal queries and address performance issues in their database design. This book will also benefit the PostgreSQL internal architects in being able to monitor the performance using benchmarking tools.

What You Will Learn
  • Learn the best practices to configure your PostgreSQL 9.6 database for optimal performance
  • Write optimal queries and techniques to detect performance issue in queries
  • Fine tune the performance of your queries using benchmarking and indexing techniques
  • Ensure high performance and a highly available database using the scaling and replication techniques
  • Discover how to make informed speed and reliability trade-offs
  • Handle increasing database workloads without any hassle
  • Use monitoring insights to continuously rework the design and configuration for best performance
In Detail

Database administrators and developers spend years learning techniques to configure their PostgreSQL database servers for optimal performance, mostly when they encounter performance issues. Scalability and high availability of the database solution is equally important these days. This book will show you how to configure new database installations and optimize existing database server installations using PostgreSQL 9.6.

You will start with the basic concepts of database performance, because all successful database applications are destined to eventually run into issues when scaling up their performance. You will not only learn to optimize your database and queries for optimal performance, but also detect the real performance bottlenecks using PostgreSQL tools and some external tools. Next, you will learn how to benchmark your hardware and tune your operating system. Optimize your queries against the database with the help of right indexes, and monitor every layer, ranging from hardware to queries. Moving on, you will see how connection pooling, caching, partitioning, and replication will help you handle increasing database workloads.

Achieving high database performance is not easy, but you can learn it by using the right guide—PostgreSQL 9.6 High Performance.

Style and approach

This book has been organized in such a manner that will help you understand basic PostgreSQL 9.6 performance tuning to advanced-level configuration. There are many real-world problems explained in this book and explained in clear language, because improving database performance requires an equal mix of understanding theoretical concepts and working through hands-on examples.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Errata
      3. Piracy
      4. Questions
  2. PostgreSQL Versions
    1. Performance of historical PostgreSQL releases
      1. Choosing a version to deploy
      2. Upgrading to a new major version
        1. Minor version upgrades
    2. PostgreSQL or another database?
    3. PostgreSQL Tools
      1. PostgreSQL contrib
        1. Finding contrib modules on your system
        2. Installing a contrib module from source
        3. Using a contrib module
        4. Using a PostgreSQL's Extensions
      2. pgFoundry
      3. Additional PostgreSQL related software
    4. PostgreSQL application scaling life cycle
    5. Performance tuning as a practice
    6. Summary
  3. Database Hardware
    1. Balancing hardware spending
      1. CPUs
      2. Memory
      3. Disks
        1. RAID
        2. Drive error handling
        3. Drive firmware and RAID
        4. SSDs
      4. Disk controllers
        1. Hardware and software RAID
        2. Recommended disk controllers
        3. Attached storage - SAN and NAS
    2. Reliable controller and disk setup
      1. Write-back caches
        1. Sources of write-back caching
        2. Disk controller monitoring
        3. Disabling drive write caches
      2. Performance impact of write-through caching
    3. Summary
  4. Database Hardware Benchmarking
    1. CPU and memory benchmarking
      1. Memtest86+
      2. STREAM memory testing
        1. STREAM and Intel versus AMD
      3. CPU benchmarking
      4. Sources of slow memory and processors
    2. Physical disk performance
      1. Random access and I/O per second
      2. Sequential access and ZCAV
        1. Short stroking
      3. Commit rate
        1. PostgreSQL test_fsync
        2. The INSERT rate
        3. Windows commit rate
    3. Disk benchmarking tools
      1. Basic disk benchmarking using HD Tune
        1. Short stroking tests
        2. Measuring IOPS using HD Tune
        3. Unpredictable performance and Windows
      2. Disk throughput in case of sequential read and write
      3. bonnie++
        1. Bonnie++ 2.0
        2. Bonnie++ ZCAV
      4. sysbench
      5. Seek rate
        1. Removing test files
        2. fsync commit rate
      6. Complicated disk benchmarks
    4. Sample disk results
      1. Disk-performance expectations
        1. Sources of slow disk and array performance
    5. Summary
  5. Disk Setup
    1. Maximum filesystem sizes
    2. Filesystem crash recovery
      1. Journaling filesystems
    3. Linux filesystems
      1. ext2
      2. ext3
      3. ext4
      4. XFS
      5. Other Linux filesystems
      6. Write barriers
        1. Drive support for barriers
        2. Filesystem support for barriers
      7. General Linux filesystem tuning
        1. Read-ahead
        2. File access times
        3. Read caching and swapping
        4. Write cache sizing
        5. I/O scheduler elevator
    4. Solaris and FreeBSD filesystems
      1. Solaris UFS
      2. FreeBSD UFS2
      3. ZFS
      4. FAT32
      5. NTFS
        1. Adjusting mounting behaviour
    5. Disk layout for PostgreSQL
      1. Symbolic links
      2. Tablespaces
      3. Database directory tree
        1. Temporary files
      4. Disk arrays, RAID, and disk layout
        1. Disk layout guidelines
    6. Summary
  6. Memory for Database Caching
    1. Memory units in the postgresql.conf file
    2. Increasing UNIX shared memory parameters for larger buffer sizes
      1. Kernel semaphores
      2. Estimating shared memory allocation
      3. Inspecting the database cache
      4. Installing pg_buffercache into a database
      5. Database disk layout
      6. Creating a new block in a database
      7. Writing dirty blocks to disk
    3. Crash recovery and the buffer cache
      1. Checkpoint processing basics
      2. Write-ahead log and recovery processing
      3. Checkpoint timing
        1. Checkpoint spikes
        2. Spread checkpoints
      4. Database block lifecycle
        1. Dirty block write paths
    4. Database buffer cache versus operating system cache
      1. Doubly cached data
        1. Inspecting the OS cache
      2. Checkpoint overhead
      3. Starting size guidelines
        1. Platform, version, and workload limitations
    5. Analyzing buffer cache contents
      1. Inspection of the buffer cache queries
        1. Top relations in the cache
        2. Summary by usage count
        3. Buffer contents summary with percentages
        4. Buffer usage count distribution
      2. Using buffer cache inspection for sizing feedback
    6. Summary
  7. Server Configuration Tuning
    1. Interacting with the live configuration
      1. Defaults and reset values
      2. Allowed change context
      3. Reloading the configuration file
        1. Commented-out settings
        2. Server-wide settings
      4. Database connections
        1. listen_addresses
        2. max_connections
      5. Shared memory
        1. shared_buffers
        2. Free space map settings
      6. Logging
        1. log_line_prefix
        2. log_statement
        3. log_min_duration_statement
      7. Vacuuming and statistics
        1. autovacuum
        2. Enabling autovacuum
        3. maintainance_work_mem
        4. default_statistics_target
      8. Checkpoints
        1. checkpoint_timeout
        2. checkpoint_completion_target
        3. wal_buffers
        4. wal_sync_method
      9. PITR and WAL replication
        1. Per-client settings
        2. effective_cache_size
        3. synchronous_commit
        4. work_mem
        5. random_page_cost
        6. constraint_exclusion
      10. Tunables to avoid
        1. fsync
        2. full_page_writes
        3. commit_delay and commit_siblings
        4. max_prepared_transactions
        5. Query enable parameters
          1. New server tuning
          2. Dedicated server guidelines
          3. Shared server guidelines
          4. pgtune
    2. Summary
  8. Routine Maintenance
    1. Transaction visibility with multiversion concurrency control
      1. Visibility computation internals
      2. Updates
      3. Row lock conflicts
        1. Serialization
      4. Deletions
      5. Advantages of MVCC
      6. Disadvantages of MVCC
      7. Transaction ID wraparound
    2. Vacuum
      1. Vacuum implementation
        1. Regular vacuum
        2. Returning free disk space
        3. Full vacuum
        4. Removing tuple timeout
        5. HOT
      2. Cost-based vacuuming
      3. Autovacuum
        1. Autovacuum logging
        2. Autovacuum monitoring
        3. Autovacuum triggering
        4. Per-table adjustments
      4. Common vacuum and autovacuum problems
        1. Autovacuum is running even though it was turned off
        2. Autovacuum is constantly running
        3. Out of memory errors
        4. Not keeping up on a busy server
        5. Autovacuum is too disruptive
        6. Long-running transactions
        7. Free space map exhaustion
        8. Recovering from major problems
    3. Autoanalyze
    4. Index bloat
      1. Measuring index bloat
      2. Fixing the index bloat
    5. Dump and restore
    6. Vacuuming the database/table
    7. CLUSTER
    8. Reindexing
    9. Detailed data and index page monitoring
    10. Monitoring query logs
      1. Basic PostgreSQL log setup
        1. Log collection
        2. log_line_prefix
        3. Multi-line queries
        4. Using syslog for log messages
        5. CSV logging
      2. Logging difficult queries
        1. auto_explain
      3. Log file analysis
        1. Normalized query fingerprints
        2. pg_stat_statements
        3. pgFouine
        4. pqa
        5. eqpa
        6. pgsi
        7. mk-query-digest
    11. Summary
  9. Database Benchmarking
    1. pgbench default tests
      1. Table definition
      2. Scale detection
      3. Query script definition
      4. Configuring the database server for pgbench
        1. Sample server configuration
          1. Running pgbench manually
    2. Graphing results with pgbench-tools
      1. Configuring pgbench-tools
    3. Sample pgbench test results
      1. SELECT-only test
      2. TPC-B-like test (Read/Write)
      3. Latency analysis
    4. Sources for bad results and variation
      1. Developer PostgreSQL builds
      2. Worker threads and pgbench program limitations
    5. pgbench custom tests
      1. Insert speed test
    6. Transaction processing performance council benchmarks
    7. Summary
  10. Database Indexing
    1. Indexing example walkthrough
      1. Measuring query disk and index block statistics
      2. Running the example
      3. Sample data setup
      4. Simple index lookups
      5. Full table scans
      6. Index creation
      7. Lookup with an inefficient index
      8. Combining indexes
        1. Planning for plan changes
      9. Clustering against an index
      10. Explain with buffer counts
    2. Index creation and maintenance
      1. Unique indexes
      2. Concurrent index creation
      3. Clustering an index
        1. Fillfactor
      4. Reindexing
    3. Index types
      1. B-tree
        1. Text operator classes
      2. Hash
      3. GIN
      4. GiST
      5. SP-GiST
      6. BRIN
      7. Index only scans
    4. count(*)
    5. Visibility map
    6. Advanced index use
      1. Multicolumn indexes
      2. Indexes for sorting
      3. Partial indexes
      4. Expression-based indexes
      5. Indexing for full-text search
    7. Summary
  11. Query Optimization
    1. Sample datasets
      1. Pagila
      2. Dell Store 2
    2. EXPLAIN basics
      1. Timing overhead
      2. Hot and cold cache behavior
        1. Clearing the cache
    3. Query plan node structure
      1. Basic cost computation
        1. Estimated costs and real-world costs
      2. Visual explain
      3. Verbose output
      4. Machine readable explain output
      5. Planning analysis tools
    4. Assembling row sets
      1. Tuple ID
        1. Object ID
      2. Sequential scan
      3. Index scan
      4. Index only scans
      5. Bitmap heap and index scans
    5. Processing nodes
      1. Sort
      2. Limit
        1. Offsets
      3. Aggregate
      4. HashAggregate
      5. Unique
        1. WindowAgg
      6. Result
      7. Append
      8. Group
      9. Subquery scan and subplan
        1. Subquery conversion and IN lists
      10. Set operations
      11. Materialize
    6. CTE scan
    7. Copy command
      1. COPY TO
      2. COPY FROM
      3. COPY FREEZE
    8. Joins
      1. Nested loop
        1. Nested loop with inner-index scan
      2. Merge join
        1. Nested Loop and Merge Join materialization
      3. Hash joins
        1. Hash semi and anti joins
      4. Join ordering
        1. Forcing join order
        2. Join removal
        3. Genetic query optimizer
    9. Collecting statistics
      1. Viewing and estimating with statistics
      2. Statistics targets
        1. Adjusting a column target
        2. Distinct values
      3. Difficult areas to estimate
    10. Other query planning parameters
      1. effective_cache_size
      2. work_mem
      3. constraint_exclusion
      4. cursor_tuple_fraction
    11. Executing other statement types
    12. Improving queries
      1. Optimizing for fully cached datasets
      2. Testing for query equivalence
      3. Disabling optimizer features
      4. Working around optimizer bugs
      5. Avoiding plan restructuring with OFFSET
      6. External trouble spots
    13. SQL limitations
      1. Numbering rows in SQL
      2. Using window functions for numbering
      3. Using window functions for cumulatives
    14. Summary
  12. Database Activity and Statistics
    1. Statistics views
    2. Cumulative and live views
    3. Table statistics
      1. Table I/O
    4. Index statistics
      1. Index I/O
    5. Database wide totals
    6. Connections and activity
    7. Locks
      1. Virtual transactions
      2. Decoding lock information
      3. Table-level Lock Modes
      4. Transaction lock waits
      5. Table lock waits
      6. Logging lock information
        1. Deadlocks
        2. Disk usage
        3. Monitoring Buffer, background writer, and checkpoint activity
      7. Saving pg_stat_bgwriter snapshots
      8. Tuning using background writer statistics
    8. Summary
  13. Monitoring and Trending
    1. UNIX monitoring tools
      1. Sample setup
      2. vmstat
      3. iostat
        1. iotop for Linux
        2. Examples of good performance
        3. Overloaded system samples
      4. top
        1. Solaris top replacements
        2. htop for Linux
      5. SysStat and sar
        1. Enabling sysstat and its optional features
        2. Graphing with kSar
    2. Windows monitoring tools
      1. Task manager
        1. Sysinternals tools
      2. Windows system monitor
        1. Saving Windows system monitor data
    3. Trending software
      1. Types of monitoring and trending software
        1. Storing historical trend data
      2. Nagios
        1. Nagios and PostgreSQL
        2. Nagios and Windows
      3. Cacti
        1. Cacti and PostgreSQL
        2. Cacti and Windows
        3. Munin
      4. Other trending packages
        1. pgStatspack
        2. Zenoss
        3. Hyperic HQ
        4. Reconnoiter
        5. Staplr
        6. SNMP tools
    4. Summary
  14. Pooling and Caching
    1. Connection pooling
      1. Pooling connection counts
      2. pgpool-II
      3. Connection pooling
      4. Replication
      5. Watch-dog
      6. Failover
      7. Load balancing
      8. pgBouncer
        1. Application server pooling
        2. Database caching
      9. What is memcached
      10. pgmemcache
    2. Summary
  15. Scaling with Replication
    1. Hot standby
      1. Terminology
      2. Setting up WAL shipping
      3. Streaming replication
      4. Cascading replication
      5. Synchronous replication
      6. Tuning Hot Standby
    2. Replication queue managers
      1. Slony
      2. Londiste
      3. Read scaling with replication queue software
      4. Special application requirements
      5. Bucardo
      6. pglogical
      7. xDB
      8. pgpool-II
    3. Other interesting replication projects
    4. Replications solution comparison
    5. Summary
  16. Partitioning Data
    1. Table inheritance
    2. dellstore2 database
    3. Partitioning in PostgreSQL
    4. Range partitioning
    5. Determine a key field to partition over
    6. Sizing the partitions
      1. List partitioning
    7. Redirecting INSERT statements to the partitions
      1. Dynamic trigger functions
      2. Partition rules
    8. Empty partition query plans
    9. Date change update trigger
    10. Live migration of a partitioned table
    11. Partitioned queries
    12. Creating new partitions
      1. Scheduled creation
      2. Dynamic creation
    13. Partitioning advantages
    14. Common partitioning mistakes
    15. Horizontal partitioning with PL/Proxy
      1. Hash generation
      2. Scaling with PL/Proxy
        1. Sharding
      3. Scaling with GridSQL
    16. Summary
  17. Database Profiling
    1. Profiling using gprof
    2. Debugging using Valgrind
    3. Visual Studio
    4. Profiling using DTrace
      1. DTrace on FreeBSD
      2. Linux SystemTap emulation of DTrace
    5. Summary
  18. Avoiding Common Problems
    1. Bulk loading
      1. Loading methods
        1. External loading programs
      2. Tuning for bulk loads
      3. Skipping WAL acceleration
      4. Recreate indexes and add constraints
      5. Parallel restore
      6. Post load clean up
    2. Common performance issues
      1. Counting rows
      2. Unexplained writes
      3. Slow function and prepared statement execution
      4. PL/pgSQL benchmarking
      5. High foreign key overhead
      6. Trigger memory use
      7. Heavy statistics collector overhead
        1. Targeted statistics resets
      8. Materialized views
    3. Summary
  19. Performance Features by Release
    1. Aggressive PostgreSQL version upgrades
    2. Performance features in version 8.1
    3. Performance features in version 8.2
    4. Performance features in version 8.3
    5. Performance features in version 8.4
    6. Performance features in version 9.0
      1. Replication
      2. Queries and EXPLAIN
      3. Database development
      4. Configuration and monitoring
    7. Performance features in version 9.1
    8. Performance features in version 9.2
    9. Performance features in version 9.3
    10. Performance features in version 9.4
    11. Performance features in version 9.5
    12. Performance features in version 9.6
    13. Summary