Performance Optimization and Tuning Techniques for IBM Processors, including IBM POWER8

Book description

This IBM® Redbooks® publication focuses on gathering the correct technical information, and laying out simple guidance for optimizing code performance on IBM POWER8™ systems that run the AIX®, IBM i, or Linux operating systems. There is much straightforward performance optimization that can be performed with a minimum of effort and without extensive previous experience or in-depth knowledge.

The POWER8 processor contains many new and important performance features, such as support for eight hardware threads in each core and support for transactional memory. POWER8 is a strict superset of IBM POWER7+™, and so all of the performance features of POWER7+, such as multiple page sizes, also appear in POWER8. Much of the technical information and guidance for optimizing performance on POWER8 presented in this guide also applies to POWER7+ and earlier processors, except where the guide explicitly indicates that a feature is new in POWER8.

This guide strives to focus on optimizations that tend to be positive across a broad set of IBM POWER® processor chips and systems. Specific guidance is given for the POWER8 processor; however, the general guidance is applicable to the IBM POWER7+, IBM POWER7®, IBM POWER6®, IBM POWER5, and even to earlier processors.

This guide is directed to personnel who are responsible for performing migration and implementation activities on IBM POWER8-based servers. This includes system administrators, system architects, network administrators, information architects, and database administrators (DBAs).

Table of contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Chapter 1. Optimization and tuning on IBM POWER8
    1. 1.1 Introduction
    2. 1.2 Outline of this guide
    3. 1.3 Conventions that are used in this guide
    4. 1.4 Background
    5. 1.5 Optimizing performance on POWER8
      1. 1.5.1 Lightweight tuning and optimization guidelines
      2. 1.5.2 Deployment guidelines
      3. 1.5.3 Deep performance optimization guidelines
  5. Chapter 2. The POWER8 processor
    1. 2.1 Introduction to the POWER8 processor
    2. 2.2 Using POWER8 features
      1. 2.2.1 Multi-core and multi-thread
      2. 2.2.2 Multipage size support: Page sizes (4 KB, 64 KB, 16 MB, and 16 GB)
      3. 2.2.3 Efficient use of cache and memory
      4. 2.2.4 Transactional memory (TM)
      5. 2.2.5 Vector Scalar eXtension (VSX)
      6. 2.2.6 Decimal floating point
      7. 2.2.7 In-core cryptography and integrity enhancements
      8. 2.2.8 On-chip accelerators
      9. 2.2.9 Storage synchronization (sync, lwsync, lwarx, stwcx, and eieio)
      10. 2.2.10 Fixed-point load and store quadword instructions
      11. 2.2.11 Instruction fusion
      12. 2.2.12 Event-based branches (or user-level fast interrupts)
      13. 2.2.13 Power management and system performance
    3. 2.3 Related publications
  6. Chapter 3. The POWER Hypervisor
    1. 3.1 Introduction to the POWER8 Hypervisor
    2. 3.2 POWER8 virtualization
      1. 3.2.1 Virtual processors
      2. 3.2.2 Page table sizes for LPARs
      3. 3.2.3 Placing LPAR resources to attain higher memory affinity
      4. 3.2.4 Active memory expansion
      5. 3.2.5 Optimizing Resource Placement: Dynamic Platform Optimizer
      6. 3.2.6 Partition compatibility mode
    3. 3.3 Related publications
  7. Chapter 4. AIX
    1. 4.1 Introduction
    2. 4.2 Using Power features with AIX
      1. 4.2.1 Multi-core and multi-thread
      2. 4.2.2 Multipage size support on AIX
      3. 4.2.3 Efficient use of cache
      4. 4.2.4 Transactional memory (TM)
      5. 4.2.5 Vector Scalar eXtension (VSX)
      6. 4.2.6 Decimal floating point (DFP)
      7. 4.2.7 On-chip encryption accelerator
    3. 4.3 AIX operating system-specific optimizations
      1. 4.3.1 Malloc
      2. 4.3.2 Pthread tunables
      3. 4.3.3 pollset
      4. 4.3.4 File system performance benefits
      5. 4.3.5 Direct I/O
      6. 4.3.6 Concurrent I/O (CIO)
      7. 4.3.7 Asynchronous I/O
      8. 4.3.8 I/O completion ports
      9. 4.3.9 shmat versus mmap
      10. 4.3.10 Large segment tunable aliasing (LSA)
      11. 4.3.11 64-bit versus 32-bit ABIs
      12. 4.3.12 Sleep and wake-up primitives (thread_wait and thread_post)
      13. 4.3.13 Shared versus private loads
      14. 4.3.14 Workload partitions (WPARs) shared License Program Product (LPP) installs
    4. 4.4 AIX preferred practices
      1. 4.4.1 AIX preferred practices that are applicable to all Power Systems generations
      2. 4.4.2 AIX preferred practices that are applicable to POWER7 and POWER8
    5. 4.5 Related publications
  8. Chapter 5. IBM i
    1. 5.1 Introduction
    2. 5.2 Using Power features with IBM i
      1. 5.2.1 Multi-core and multi-thread
      2. 5.2.2 Multipage size support on IBM i
      3. 5.2.3 Vector Scalar eXtension (VSX)
      4. 5.2.4 Decimal floating point
    3. 5.3 IBM i operating system-specific optimizations
      1. 5.3.1 IBM i advanced optimization techniques
      2. 5.3.2 Performance management on IBM i
    4. 5.4 Related publications
  9. Chapter 6. Linux
    1. 6.1 Introduction
    2. 6.2 Using Power features with Linux
      1. 6.2.1 Multi-core and multi-thread
      2. 6.2.2 Multipage size support on Linux
      3. 6.2.3 Efficient use of cache
      4. 6.2.4 Transactional memory (TM)
      5. 6.2.5 Vector Scalar eXtension (VSX)
      6. 6.2.6 Decimal floating point (DFP)
      7. 6.2.7 Event-based branches
    3. 6.3 Linux operating system-specific optimizations
      1. 6.3.1 GCC, toolchain, and IBM Advance Toolchain
      2. 6.3.2 Tuning and optimizing malloc
      3. 6.3.3 Large TOC -mcmodel=medium optimization
      4. 6.3.4 POWER7 based distro considerations
      5. 6.3.5 Split-core considerations
      6. 6.3.6 KVM on Power considerations
    4. 6.4 Related publications
  10. Chapter 7. Compilers and optimization tools for C, C++, and Fortran
    1. 7.1 Compiler versions and optimization levels
    2. 7.2 Advanced compiler optimization techniques
      1. 7.2.1 Common prerequisites
      2. 7.2.2 XL compiler family
      3. 7.2.3 GCC compiler family
    3. 7.3 Capitalizing on POWER8 features with the XL and GCC compilers
      1. 7.3.1 In-core cryptography
      2. 7.3.2 Compiler support for VSX
      3. 7.3.3 Built-in functions for storage synchronization
      4. 7.3.4 Data Streams Control Register (DSCR) controls
      5. 7.3.5 Transactional memory (TM)
    4. 7.4 IBM Feedback Directed Program Restructuring (FDPR)
      1. 7.4.1 Introduction
      2. 7.4.2 FDPR supported environments
      3. 7.4.3 Acceptable input formats
      4. 7.4.4 General operation
      5. 7.4.5 Instrumentation and profiling
      6. 7.4.6 Optimization
    5. 7.5 Using the Advance Toolchain with IBM XLC and XLF
    6. 7.6 Related publications
  11. Chapter 8. Java
    1. 8.1 Java levels
    2. 8.2 32-bit versus 64-bit Java
    3. 8.3 Memory and page size considerations
      1. 8.3.1 Medium and large pages for Java heap and code cache
      2. 8.3.2 Configuring large pages for Java heap and code cache
      3. 8.3.3 Prefetching
      4. 8.3.4 Compressed references
      5. 8.3.5 JIT code cache
      6. 8.3.6 Shared classes
      7. 8.3.7 In-core Advanced Encryption Standard (AES) acceleration
      8. 8.3.8 Transactional memory (TM)
      9. 8.3.9 Runtime instrumentation
    4. 8.4 Java garbage collection tuning
      1. 8.4.1 GC strategy: Optthruput
      2. 8.4.2 GC strategy: Optavgpause
      3. 8.4.3 GC strategy: Gencon
      4. 8.4.4 GC strategy: Balanced
      5. 8.4.5 Optimal heap size
    5. 8.5 Application scaling
      1. 8.5.1 Choosing the correct SMT mode
      2. 8.5.2 Using resource sets (RSETS)
      3. 8.5.3 Java lock reservation
      4. 8.5.4 Java GC threads
      5. 8.5.5 Java concurrent marking
    6. 8.6 Related publications
  12. Chapter 9. DB2
    1. 9.1 DB2 and the POWER processor
    2. 9.2 Taking advantage of the POWER processor
      1. 9.2.1 Affinitization
      2. 9.2.2 Page sizes
      3. 9.2.3 Decimal arithmetics
      4. 9.2.4 Using SMT priorities for internal lock implementation
      5. 9.2.5 SIMD
    3. 9.3 Capitalizing on the compilers and optimization tools for POWER
      1. 9.3.1 Whole-program analysis and profile-based optimizations
      2. 9.3.2 Feedback directed program restructuring (FDPR)
    4. 9.4 Capitalizing on POWER virtualization
      1. 9.4.1 DB2 virtualization
      2. 9.4.2 DB2 in an AIX workload partition
    5. 9.5 Capitalizing on the AIX system libraries
      1. 9.5.1 Using the thread_post_many API
      2. 9.5.2 File systems
    6. 9.6 Capitalizing on performance tooling
      1. 9.6.1 High-level investigation
      2. 9.6.2 Low-level investigation
    7. 9.7 Conclusion
    8. 9.8 Related publications
  13. Chapter 10. WebSphere Application Server
    1. 10.1 IBM WebSphere on Power Systems
    2. 10.2 Performance and functional considerations
      1. 10.2.1 Installation
      2. 10.2.2 Deployment
      3. 10.2.3 Performance
      4. 10.2.4 Performance analysis, problem determination, and diagnostic tests
  14. Appendix A. Analyzing malloc usage under AIX
    1. Introduction
    2. How to collect malloc usage information
  15. Appendix B. Performance tooling and empirical performance analysis
    1. Introduction
    2. Performance advisors
    3. Power Virtualization Performance (PowerVP)
    4. AIX
    5. Linux
    6. Java (either AIX or Linux)
  16. Back cover
  17. IBM System x Reference Architecture for Hadoop: IBM InfoSphere BigInsights Reference Architecture
    1. Introduction
    2. Business problem and business value
    3. Reference architecture use
    4. Requirements
    5. InfoSphere BigInsights predefined configuration
    6. InfoSphere BigInsights HBase predefined configuration
    7. Deployment considerations
    8. Customizing the predefined configurations
    9. Predefined configuration bill of materials
    10. References
    11. The team who wrote this paper
    12. Now you can become a published author, too!
    13. Stay connected to IBM Redbooks
  18. Notices
    1. Trademarks

Product information

  • Title: Performance Optimization and Tuning Techniques for IBM Processors, including IBM POWER8
  • Author(s):
  • Release date: July 2014
  • Publisher(s): IBM Redbooks
  • ISBN: None