Algorithms and Parallel Computing

Book description

There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to program a parallel computer for a given application.

Table of contents

  1. Cover
  2. Title page
  3. Copyright page
  4. Dedication
  5. Preface
    1. ABOUT THIS BOOK
    2. CHAPTER ORGANIZATION AND OVERVIEW
    3. ACKNOWLEDGMENTS
    4. COMMENTS AND SUGGESTIONS
  6. List of Acronyms
  7. Chapter 1 Introduction
    1. 1.1 INTRODUCTION
    2. 1.2 TOWARD AUTOMATING PARALLEL PROGRAMMING
    3. 1.3 ALGORITHMS
    4. 1.4 PARALLEL COMPUTING DESIGN CONSIDERATIONS
    5. 1.5 PARALLEL ALGORITHMS AND PARALLEL ARCHITECTURES
    6. 1.6 RELATING PARALLEL ALGORITHM AND PARALLEL ARCHITECTURE
    7. 1.7 IMPLEMENTATION OF ALGORITHMS: A TWO-SIDED PROBLEM
    8. 1.8 MEASURING BENEFITS OF PARALLEL COMPUTING
    9. 1.9 AMDAHL’S LAW FOR MULTIPROCESSOR SYSTEMS
    10. 1.10 GUSTAFSON–BARSIS’S LAW
    11. 1.11 APPLICATIONS OF PARALLEL COMPUTING
  8. Chapter 2 Enhancing Uniprocessor Performance
    1. 2.1 INTRODUCTION
    2. 2.2 INCREASING PROCESSOR CLOCK FREQUENCY
    3. 2.3 PARALLELIZING ALU STRUCTURE
    4. 2.4 USING MEMORY HIERARCHY
    5. 2.5 PIPELINING
    6. 2.6 VERY LONG INSTRUCTION WORD (VLIW) PROCESSORS
    7. 2.7 INSTRUCTION-LEVEL PARALLELISM (ILP) AND SUPERSCALAR PROCESSORS
    8. 2.8 MULTITHREADED PROCESSOR
  9. Chapter 3 Parallel Computers
    1. 3.1 INTRODUCTION
    2. 3.2 PARALLEL COMPUTING
    3. 3.3 SHARED-MEMORY MULTIPROCESSORS (UNIFORM MEMORY ACCESS [UMA])
    4. 3.4 DISTRIBUTED-MEMORY MULTIPROCESSOR (NONUNIFORM MEMORY ACCESS [NUMA])
    5. 3.5 SIMD PROCESSORS
    6. 3.6 SYSTOLIC PROCESSORS
    7. 3.7 CLUSTER COMPUTING
    8. 3.8 GRID (CLOUD) COMPUTING
    9. 3.9 MULTICORE SYSTEMS
    10. 3.10 SM
    11. 3.11 COMMUNICATION BETWEEN PARALLEL PROCESSORS
    12. 3.12 SUMMARY OF PARALLEL ARCHITECTURES
  10. Chapter 4 Shared-Memory Multiprocessors
    1. 4.1 INTRODUCTION
    2. 4.2 CACHE COHERENCE AND MEMORY CONSISTENCY
    3. 4.3 SYNCHRONIZATION AND MUTUAL EXCLUSION
  11. Chapter 5 Interconnection Networks
    1. 5.1 INTRODUCTION
    2. 5.2 CLASSIFICATION OF INTERCONNECTION NETWORKS BY LOGICAL TOPOLOGIES
    3. 5.3 INTERCONNECTION NETWORK SWITCH ARCHITECTURE
  12. Chapter 6 Concurrency Platforms
    1. 6.1 INTRODUCTION
    2. 6.2 CONCURRENCY PLATFORMS
    3. 6.3 CILK++
    4. 6.4 OpenMP
    5. 6.5 COMPUTE UNIFIED DEVICE ARCHITECTURE (CUDA)
  13. Chapter 7 Ad Hoc Techniques for Parallel Algorithms
    1. 7.1 INTRODUCTION
    2. 7.2 DEFINING ALGORITHM VARIABLES
    3. 7.3 INDEPENDENT LOOP SCHEDULING
    4. 7.4 DEPENDENT LOOPS
    5. 7.5 LOOP SPREADING FOR SIMPLE DEPENDENT LOOPS
    6. 7.6 LOOP UNROLLING
    7. 7.7 PROBLEM PARTITIONING
    8. 7.8 DIVIDE-AND-CONQUER (RECURSIVE PARTITIONING) STRATEGIES
    9. 7.9 PIPELINING
  14. Chapter 8 Nonserial–Parallel Algorithms
    1. 8.1 INTRODUCTION
    2. 8.2 COMPARING DAG AND DCG ALGORITHMS
    3. 8.3 PARALLELIZING NSPA ALGORITHMS REPRESENTED BY A DAG
    4. 8.4 FORMAL TECHNIQUE FOR ANALYZING NSPAs
    5. 8.5 DETECTING CYCLES IN THE ALGORITHM
    6. 8.6 EXTRACTING SERIAL AND PARALLEL ALGORITHM PERFORMANCE PARAMETERS
    7. 8.7 USEFUL THEOREMS
    8. 8.8 PERFORMANCE OF SERIAL AND PARALLEL ALGORITHMS ON PARALLEL COMPUTERS
  15. Chapter 9 z-Transform Analysis
    1. 9.1 INTRODUCTION
    2. 9.2 DEFINITION OF z-TRANSFORM
    3. 9.3 THE 1-D FIR DIGITAL FILTER ALGORITHM
    4. 9.4 SOFTWARE AND HARDWARE IMPLEMENTATIONS OF THE z-TRANSFORM
    5. 9.5 DESIGN 1: USING HORNER’S RULE FOR BROADCAST INPUT AND PIPELINED OUTPUT
    6. 9.6 DESIGN 2: PIPELINED INPUT AND BROADCAST OUTPUT
    7. 9.7 DESIGN 3: PIPELINED INPUT AND OUTPUT
  16. Chapter 10 Dependence Graph Analysis
    1. 10.1 INTRODUCTION
    2. 10.2 THE 1-D FIR DIGITAL FILTER ALGORITHM
    3. 10.3 THE DEPENDENCE GRAPH OF AN ALGORITHM
    4. 10.4 DERIVING THE DEPENDENCE GRAPH FOR AN ALGORITHM
    5. 10.5 THE SCHEDULING FUNCTION FOR THE 1-D FIR FILTER
    6. 10.6 NODE PROJECTION OPERATION
    7. 10.7 NONLINEAR PROJECTION OPERATION
    8. 10.8 SOFTWARE AND HARDWARE IMPLEMENTATIONS OF THE DAG TECHNIQUE
  17. Chapter 11 Computational Geometry Analysis
    1. 11.1 INTRODUCTION
    2. 11.2 MATRIX MULTIPLICATION ALGORITHM
    3. 11.3 THE 3-D DEPENDENCE GRAPH AND COMPUTATION DOMAIN D
    4. 11.4 THE FACETS AND VERTICES OF D
    5. 11.5 THE DEPENDENCE MATRICES OF THE ALGORITHM VARIABLES
    6. 11.6 NULLSPACE OF DEPENDENCE MATRIX: THE BROADCAST SUBDOMAIN B
    7. 11.7 DESIGN SPACE EXPLORATION: CHOICE OF BROADCASTING VERSUS PIPELINING VARIABLES
    8. 11.8 DATA SCHEDULING
    9. 11.9 PROJECTION OPERATION USING THE LINEAR PROJECTION OPERATOR
    10. 11.10 EFFECT OF PROJECTION OPERATION ON DATA
    11. 11.11 THE RESULTING MULTITHREADED/MULTIPROCESSOR ARCHITECTURE
    12. 11.12 SUMMARY OF WORK DONE IN THIS CHAPTER
  18. Chapter 12 Case Study: One-Dimensional IIR Digital Filters
    1. 12.1 INTRODUCTION
    2. 12.2 THE 1-D IIR DIGITAL FILTER ALGORITHM
    3. 12.3 THE IIR FILTER DEPENDENCE GRAPH
    4. 12.4 z-DOMAIN ANALYSIS OF 1-D IIR DIGITAL FILTER ALGORITHM
  19. Chapter 13 Case Study: Two- and Three-Dimensional Digital Filters
    1. 13.1 INTRODUCTION
    2. 13.2 LINE AND FRAME WRAPAROUND PROBLEMS
    3. 13.3 2-D RECURSIVE FILTERS
    4. 13.4 3-D DIGITAL FILTERS
  20. Chapter 14 Case Study: Multirate Decimators and Interpolators
    1. 14.1 INTRODUCTION
    2. 14.2 DECIMATOR STRUCTURES
    3. 14.3 DECIMATOR DEPENDENCE GRAPH
    4. 14.4 DECIMATOR SCHEDULING
    5. 14.5 DECIMATOR DAG FOR s1 = [1 0]
    6. 14.6 DECIMATOR DAG FOR s2 = [1 −1]
    7. 14.7 DECIMATOR DAG FOR s3 = [1 1]
    8. 14.8 POLYPHASE DECIMATOR IMPLEMENTATIONS
    9. 14.9 INTERPOLATOR STRUCTURES
    10. 14.10 INTERPOLATOR DEPENDENCE GRAPH
    11. 14.11 INTERPOLATOR SCHEDULING
    12. 14.12 INTERPOLATOR DAG FOR s1 = [1 0]
    13. 14.13 INTERPOLATOR DAG FOR s2 = [1 −1]
    14. 14.14 INTERPOLATOR DAG FOR s3 = [1 1]
    15. 14.15 POLYPHASE INTERPOLATOR IMPLEMENTATIONS
  21. Chapter 15 Case Study: Pattern Matching
    1. 15.1 INTRODUCTION
    2. 15.2 EXPRESSING THE ALGORITHM AS A REGULAR ITERATIVE ALGORITHM (RIA)
    3. 15.3 OBTAINING THE ALGORITHM DEPENDENCE GRAPH
    4. 15.4 DATA SCHEDULING
    5. 15.5 DAG NODE PROJECTION
    6. 15.6 DESIGN 1: DESIGN SPACE EXPLORATION WHEN s = [1 1]t
    7. 15.7 DESIGN 2: DESIGN SPACE EXPLORATION WHEN s = [1 −1]t
    8. 15.8 DESIGN 3: DESIGN SPACE EXPLORATION WHEN s = [1 0]t
  22. Chapter 16 Case Study: Motion Estimation for Video Compression
    1. 16.1 INTRODUCTION
    2. 16.2 FBMAS
    3. 16.3 DATA BUFFERING REQUIREMENTS
    4. 16.4 FORMULATION OF THE FBMA
    5. 16.5 HIERARCHICAL FORMULATION OF MOTION ESTIMATION
    6. 16.6 HARDWARE DESIGN OF THE HIERARCHY BLOCKS
  23. Chapter 17 Case Study: Multiplication over GF(2m)
    1. 17.1 INTRODUCTION
    2. 17.2 THE MULTIPLICATION ALGORITHM IN GF(2m)
    3. 17.3 EXPRESSING FIELD MULTIPLICATION AS AN RIA
    4. 17.4 FIELD MULTIPLICATION DEPENDENCE GRAPH
    5. 17.5 DATA SCHEDULING
    6. 17.6 DAG NODE PROJECTION
    7. 17.7 DESIGN 1: USING d1 = [1 0]t
    8. 17.8 DESIGN 2: USING d2 = [1 1]t
    9. 17.9 DESIGN 3: USING d3 = [1 −1]t
    10. 17.10 APPLICATIONS OF FINITE FIELD MULTIPLIERS
  24. Chapter 18 Case Study: Polynomial Division over GF(2)
    1. 18.1 INTRODUCTION
    2. 18.2 THE POLYNOMIAL DIVISION ALGORITHM
    3. 18.3 THE LFSR DEPENDENCE GRAPH
    4. 18.4 DATA SCHEDULING
    5. 18.5 DAG NODE PROJECTION
    6. 18.6 DESIGN 1: DESIGN SPACE EXPLORATION WHEN s1 = [1 −1]
    7. 18.7 DESIGN 2: DESIGN SPACE EXPLORATION WHEN s2 = [1 0]
    8. 18.8 DESIGN 3: DESIGN SPACE EXPLORATION WHEN s3 = [1 −0.5]
    9. 18.9 COMPARING THE THREE DESIGNS
  25. Chapter 19 The Fast Fourier Transform
    1. 19.1 INTRODUCTION
    2. 19.2 DECIMATION-IN-TIME FFT
    3. 19.3 PIPELINE RADIX-2 DECIMATION-IN-TIME FFT PROCESSOR
    4. 19.4 DECIMATION-IN-FREQUENCY FFT
    5. 19.5 PIPELINE RADIX-2 DECIMATION-IN-FREQUENCY FFT PROCESSOR
  26. Chapter 20 Solving Systems of Linear Equations
    1. 20.1 INTRODUCTION
    2. 20.2 SPECIAL MATRIX STRUCTURES
    3. 20.3 FORWARD SUBSTITUTION (DIRECT TECHNIQUE)
    4. 20.4 BACK SUBSTITUTION
    5. 20.5 MATRIX TRIANGULARIZATION ALGORITHM
    6. 20.6 SUCCESSIVE OVER RELAXATION (SOR) (ITERATIVE TECHNIQUE)
  27. Chapter 21 Solving Partial Differential Equations Using Finite Difference Method
    1. 21.1 INTRODUCTION
    2. 21.2 FDM FOR 1-D SYSTEMS
  28. References
  29. Index

Product information

  • Title: Algorithms and Parallel Computing
  • Author(s): Fayez Gebali
  • Release date: April 2011
  • Publisher(s): Wiley
  • ISBN: 9780470934630