Fundamentals of Parallel Multicore Architecture

Book description

Although multicore is now a mainstream architecture, there are few textbooks that cover parallel multicore architectures. Filling this gap, Fundamentals of Parallel Multicore Architecture provides all the material for a graduate or senior undergraduate course that focuses on the architecture of multicore processors. The book is also useful as a ref

Table of contents

  1. Cover
  2. Half Title
  3. Series Page
  4. Title Page
  5. Copyright Page
  6. Dedication
  7. Table of Contents
  8. Preface
  9. Acknowledgement
  10. About the Author
  11. List of Abbreviations
  12. 1 Perspectives on Multicore Architectures
    1. 1.1 The Origin of the Multicore Architecture
      1. 1.1.1 Power Consumption Issue
    2. 1.2 Perspectives on Parallel Computers
      1. 1.2.1 Flynn’s Taxonomy of Parallel Computers
      2. 1.2.2 Classes of MIMD Parallel Computers
    3. 1.3 Future Multicore Architectures
    4. 1.4 Exercises
  13. 2 Perspectives on Parallel Programming
    1. 2.1 Limits on Parallel Program Performance
    2. 2.2 Parallel Programming Models
      1. 2.2.1 Comparing Shared Memory and Message Passing Models
      2. 2.2.2 A Simple Example
      3. 2.2.3 Other Programming Models
    3. 2.3 Exercises
  14. 3 Shared Memory Parallel Programming
    1. 3.1 Steps in Parallel Programming
    2. 3.2 Dependence Analysis
      1. 3.2.1 Loop-Level Dependence Analysis
      2. 3.2.2 Iteration-Space Traversal Graph and Loop-Carried Dependence Graph
    3. 3.3 Identifying Parallel Tasks in Loop Structures
      1. 3.3.1 Parallelism between Loop Iterations and DOALL Parallelism
      2. 3.3.2 DOACROSS: Synchronized Parallelism between Loop Iterations
      3. 3.3.3 Parallelism Across Statements in a Loop
      4. 3.3.4 DOPIPE: Parallelism Across Statements of a Loop
    4. 3.4 Identifying Parallelism at Other Levels
    5. 3.5 Identifying Parallelism through Algorithm Knowledge
    6. 3.6 Determining the Scope of Variables
      1. 3.6.1 Privatization
      2. 3.6.2 Reduction Variables and Operation
      3. 3.6.3 Summary of Criteria
    7. 3.7 Synchronization
    8. 3.8 Assigning Tasks to Threads
    9. 3.9 Mapping Threads to Processors
    10. 3.10 A Brief Introduction to OpenMP
    11. 3.11 Exercises
  15. 4 Parallel Programming for Linked Data Structures
    1. 4.1 Parallelization Challenges in LDS
      1. 4.1.1 Loop-Level Parallelization is Insufficient
    2. 4.2 Approaches to Parallelization of LDS
      1. 4.2.1 Parallelizing Computation vs. Traversal
      2. 4.2.2 Parallelizing Operations on the Data Structure
    3. 4.3 Parallelization Techniques for Linked Lists
      1. 4.3.1 Parallelization among Readers
      2. 4.3.2 Parallelism among LDS Traversals
      3. 4.3.3 Fine-Grain Lock Approach
    4. 4.4 The Role of Transactional Memory
    5. 4.5 Exercises
  16. 5 Introduction to Memory Hierarchy Organization
    1. 5.1 Motivation for Memory Hierarchy
    2. 5.2 Basic Architectures of a Cache
      1. 5.2.1 Placement Policy
      2. 5.2.2 Replacement Policy
      3. 5.2.3 Write Policy
      4. 5.2.4 Inclusion Policy on Multi-Level Caches
      5. 5.2.5 Unified/Split/Banked Cache Organization and Cache Pipelining
      6. 5.2.6 Cache Addressing and Translation Lookaside Buffer
      7. 5.2.7 Non-Blocking Cache
    3. 5.3 Cache Performance
      1. 5.3.1 The Power Law of Cache Misses
      2. 5.3.2 Stack Distance Profile
      3. 5.3.3 Cache Performance Metrics
    4. 5.4 Prefetching
      1. 5.4.1 Stride and Sequential Prefetching
      2. 5.4.2 Prefetching in Multiprocessor Systems
    5. 5.5 Cache Design in Multicore Architecture
    6. 5.6 Physical Cache Organization
      1. 5.6.1 United Cache Organization
      2. 5.6.2 Distributed Cache Organization
      3. 5.6.3 Hybrid United+Distributed Cache Organization
    7. 5.7 Logical Cache Organization
      1. 5.7.1 Hashing Function
      2. 5.7.2 Improving Distance Locality of Shared Cache
      3. 5.7.3 Capacity Sharing in the Private Cache Organization
    8. 5.8 Case Studies
      1. 5.8.1 IBM Power7 Memory Hierarchy
      2. 5.8.2 Comparing AMD Shanghai and Intel Barcelona’s Memory Hierarchy
    9. 5.9 Exercises
  17. 6 Introduction to Shared Memory Multiprocessors
    1. 6.1 The Cache Coherence Problem
    2. 6.2 Memory Consistency Problem
    3. 6.3 Synchronization Problem
    4. 6.4 Exercises
  18. 7 Basic Cache Coherence Issues
    1. 7.1 Overview
      1. 7.1.1 Basic Support for Bus-Based Multiprocessors
    2. 7.2 Cache Coherence in Bus-Based Multiprocessors
      1. 7.2.1 Coherence Protocol for Write Through Caches
      2. 7.2.2 MSI Protocol with Write Back Caches
      3. 7.2.3 MESI Protocol with Write Back Caches
      4. 7.2.4 MOESI Protocol with Write Back Caches
      5. 7.2.5 Update-Based Protocol with Write Back Caches
    3. 7.3 Impact of Cache Design on Cache Coherence Performance
    4. 7.4 Performance and Other Practical Issues
      1. 7.4.1 Prefetching and Coherence Misses
      2. 7.4.2 Multi-Level Caches
      3. 7.4.3 Snoop Filtering
    5. 7.5 Broadcast Protocol with Point-to-Point Interconnect
    6. 7.6 Exercises
  19. 8 Hardware Support for Synchronization
    1. 8.1 Lock Implementations
      1. 8.1.1 Evaluating the Performance of Lock Implementations
      2. 8.1.2 The Need for Atomic Instructions
      3. 8.1.3 Test and Set Lock
      4. 8.1.4 Test and Test and Set Lock
      5. 8.1.5 Load Linked and Store Conditional Lock
      6. 8.1.6 Ticket Lock
      7. 8.1.7 Array-Based Queuing Lock
      8. 8.1.8 Qualitative Comparison of Lock Implementations
    2. 8.2 Barrier Implementations
      1. 8.2.1 Sense-Reversing Centralized Barrier
      2. 8.2.2 Combining Tree Barrier
      3. 8.2.3 Hardware Barrier Implementation
    3. 8.3 Transactional Memory
    4. 8.4 Exercises
  20. 9 Memory Consistency Models
    1. 9.1 Programmers’ Intuition
    2. 9.2 Architecture Mechanisms for Ensuring Sequential Consistency
      1. 9.2.1 Basic SC Implementation on a Bus-Based Multiprocessor
      2. 9.2.2 Techniques to Improve SC Performance
    3. 9.3 Relaxed Consistency Models
      1. 9.3.1 Safety Net
      2. 9.3.2 Processor Consistency
      3. 9.3.3 Weak Ordering
      4. 9.3.4 Release Consistency
      5. 9.3.5 Lazy Release Consistency
    4. 9.4 Synchronization in Different Memory Consistency Models
    5. 9.5 Exercises
  21. 10 Advanced Cache Coherence Issues
    1. 10.1 Directory Coherence Protocols
    2. 10.2 Overview of Directory Coherence Protocol
      1. 10.2.1 Directory Format and Location
    3. 10.3 Basic Directory Cache Coherence Protocol
    4. 10.4 Implementation Correctness and Performance
      1. 10.4.1 Handling Races Due to Out-of-Sync Directory State
      2. 10.4.2 Handling Races Due to Non-Instantaneous Processing of a Request
      3. 10.4.3 Write Propagation and Transaction Serialization
      4. 10.4.4 Synchronization Support
      5. 10.4.5 Memory Consistency Models
    5. 10.5 Contemporary Design Issues
      1. 10.5.1 Dealing with Imprecise Directory Information
      2. 10.5.2 Granularity of Coherence
      3. 10.5.3 System Partitioning
      4. 10.5.4 Accelerating Thread Migration
    6. 10.6 Exercises
  22. 11 Interconnection Network Architecture
    1. 11.1 Link, Channel, and Latency
    2. 11.2 Network Topology
    3. 11.3 Routing Policies and Algorithms
    4. 11.4 Router Architecture
    5. 11.5 Case Study: Alpha 21364 Network Architecture
    6. 11.6 Multicore Design Issues
      1. 11.6.1 Contemporary Design Issues
    7. 11.7 Exercises
  23. 12 SIMT Architecture
    1. 12.1 SIMT Programming Model
    2. 12.2 Mapping SIMT Workloads to SIMT Cores
    3. 12.3 SIMT Core Architecture
      1. 12.3.1 Scalar ISA
      2. 12.3.2 SIMDization/Vectorization: Warp Formation
      3. 12.3.3 Fine-Grain Multithreading (Warp-Level Parallelism)
      4. 12.3.4 Microarchitecture
      5. 12.3.5 Pipeline Execution
      6. 12.3.6 Control Flow Processing
      7. 12.3.7 Memory Systems
    4. 12.4 Exercises
  24. 13 Ask the Experts
  25. Bibliography
  26. Index

Product information

  • Title: Fundamentals of Parallel Multicore Architecture
  • Author(s): Yan Solihin
  • Release date: November 2015
  • Publisher(s): Chapman and Hall/CRC
  • ISBN: 9781498753418