Customizable Embedded Processors Design Technologies and Applications

Book description

Customizable processors have been described as the next natural step in the evolution of the microprocessor business: a step in the life of a new technology where top performance alone is no longer sufficient to guarantee market success. Other factors become fundamental, such as time to market, convenience, energy efficiency, and ease of customization. This book is the first to explore comprehensively one of the most fundamental trends which emerged in the last decade: to treat processors not as rigid, fixed entities, which designers include "as is" in their products; but rather, to build sound methodologies to tailor-fit processors to the specific needs of such products. This book addresses the goal of maintaining a very large family of processors, with a wide range of features, at a cost comparable to that of maintaining a single processor.

Table of contents

  1. Copyright
    1. Dedication
  2. In Praise of Customizable Embedded Processors
  3. The Morgan Kaufmann Series in Systems on Silicon
  4. List of Contributors
  5. About the Editors
  6. I. Opportunities and Challenges
    1. 1. From Prêt-à-Porter to Tailor-Made
      1. 1.1. The Call for Flexibility
      2. 1.2. Cool Chips for Shallow Pockets
      3. 1.3. A Million Processors for the Price of One?
      4. 1.4. Processors Coming of Age
      5. 1.5. This Book
      6. 1.6. Travel Broadens the Mind
    2. 2. Opportunities for Application-Specific Processors: The Case of Wireless Communications
      1. 2.1. Future Mobile Communication Systems
      2. 2.2. Heterogeneous MPSoC for Digital Receivers
        1. 2.2.1. The Fundamental Tradeoff between Energy Efficiency and Flexibility
        2. 2.2.2. How to Exploit the Huge Design Space?
        3. 2.2.3. Canonical Receiver Structure
        4. 2.2.4. Analyzing and Classifying the Functions of a Digital Receiver
          1. Step 1: Identify structural and temporal properties of the task
          2. Example: DVB-S Receiver
            1. Step 2: From function to algorithm
          3. Example: Cordic Algorithm
            1. Step 3: Define algorithmic descriptors
            2. Step 4: Classification of the algorithm
          4. Example: Baseband Processing for a 384-Kbps UMTS Receiver
          5. Example: UWB Receiver
          6. Example 3: Multimedia Signal Processing
        5. 2.2.5. Exploiting Parallelism
      3. 2.3. ASIP Design
        1. 2.3.1. Processor Design Flow
        2. 2.3.2. Architecture Description Language Based Design
        3. 2.3.3. Too Much Automation Is Bad
        4. 2.3.4. Processor Design: The LISATek Approach
        5. 2.3.5. Design Competence Rules the World
          1. Example: FFT Implementation
        6. 2.3.6. Application-Specific or Domain-Specific Processors?
    3. 3. Customizing Processors: Lofty Ambitions, Stark Realities
      1. 3.1. The “CFP” project at HP Labs
      2. 3.2. Searching for the Best Architecture Is Not a Machine-Only Endeavor
      3. 3.3. Designing a CPU Core Still Takes a Very Long Time
      4. 3.4. Don’t Underestimate Competitive Technologies
      5. 3.5. Software Developers Don’t Always Help You
      6. 3.6. The Embedded World Is Not Immune to Legacy Problems
      7. 3.7. Customization Can Be Trouble
      8. 3.8. Conclusions
  7. II. Aspects of Processor Customization
    1. 4. Architecture Description Languages
      1. 4.1. ADLs and other languages
      2. 4.2. Survey of Contemporary ADLs
        1. 4.2.1. Content-Oriented Classification of ADLs
          1. Structural ADLs
          2. MIMOLA
          3. Behavioral ADLs
          4. nML
          5. ISDL
          6. Mixed ADLs
          7. HMDES
          8. Expression
          9. LISA
        2. 4.2.2. Objective-Based Classification of ADLs
          1. Compilation-Oriented ADLs
          2. Simulation-Oriented ADLs
          3. Synthesis-Oriented ADLs
          4. Validation-Oriented ADLs
      3. 4.3. Conclusions
    2. 5. C Compiler Retargeting
      1. 5.1. Compiler Construction Background
        1. 5.1.1. Source Language Frontend
        2. 5.1.2. Intermediate Representation and Optimization
        3. 5.1.3. Machine Code Generation
          1. Code Selection
          2. Register Allocation
          3. Instruction Scheduling
          4. Other Backend Optimizations
      2. 5.2. Approaches to Retargetable Compilation
        1. 5.2.1. MIMOLA
        2. 5.2.2. GNU C Compiler
        3. 5.2.3. Little C Compiler
        4. 5.2.4. CoSy
      3. 5.3. Processor Architecture Exploration
        1. 5.3.1. Methodology and Tools for ASIP Design
        2. 5.3.2. ADL-Based Approach
      4. 5.4. C Compiler Retargeting in the LISATek Platform
        1. 5.4.1. Concept
        2. 5.4.2. Register Allocator and Scheduler
        3. 5.4.3. Code Selector
        4. 5.4.4. Results
      5. 5.5. Summary and Outlook
      6. Acknowledgments
    3. 6. Automated Processor Configuration and Instruction Extension
      1. 6.1. Automation Is Essential for ASIP Proliferation
      2. 6.2. The Tensilica Xtensa LX Configurable Processor
      3. 6.3. Generating ASIPs Using Xtensa
      4. 6.4. Automatic Generation of ASIP Specifications
      5. 6.5. Coding an Application for Automatic ASIP Generation
      6. 6.6. XPRES Benchmarking Results
      7. 6.7. Techniques for ASIP Generation
        1. 6.7.1. Reference Examples for Evaluating XPRES
        2. 6.7.2. VLIW-FLIX: Exploiting Instruction Parallelism
        3. 6.7.3. SIMD (Vectorization): Exploiting Data Parallelism
        4. 6.7.4. Operator Fusion: Exploiting Pipeline Parallelism
        5. 6.7.5. Combining Techniques
      8. 6.8. Exploring the Design Space
      9. 6.9. Evaluating Xpres Estimation Methods
        1. 6.9.1. Application Performance Estimation
        2. 6.9.2. ASIP Area Estimation
        3. 6.9.3. Characterization Benchmarks
        4. 6.9.4. Performance and Area Estimation
      10. 6.10. Conclusions and Future of the Technology
    4. 7. Automatic Instruction-Set Extensions
      1. 7.1. Beyond Traditional Compilers
        1. 7.1.1. Structure of the Chapter
      2. 7.2. Building Block for Instruction Set Extension
        1. 7.2.1. Motivation
        2. 7.2.2. Problem Statement: Identification and Selection
        3. 7.2.3. Identification Algorithm
        4. 7.2.4. Results
      3. 7.3. Heuristics
        1. 7.3.1. Motivation
        2. 7.3.2. Types of Heuristic Algorithms
        3. 7.3.3. A Partitioning-Based Heuristic Algorithm
        4. 7.3.4. A Clustering Heuristic Algorithm
      4. 7.4. State-Holding Instruction-Set Extensions
        1. 7.4.1. Motivation
        2. 7.4.2. Local-Memory Identification Algorithm
        3. 7.4.3. Results
      5. 7.5. Exploiting Pipelining to Relax I/O Constraints
        1. 7.5.1. Motivation
        2. 7.5.2. Reuse of the Basic Identification Algorithm
        3. 7.5.3. Problem Statement: Pipelining
        4. 7.5.4. I/O Constrained Scheduling Algorithm
        5. 7.5.5. Results
      6. 7.6. Conclusions and Further Challenges
    5. 8. Challenges to Automatic Customization
      1. 8.1. The ARCompact™ Instruction Set Architecture
        1. 8.1.1. Mechanisms for Architecture Extension
        2. 8.1.2. ARCompact Implementations
      2. 8.2. Microarchitecture Challenges
      3. 8.3. Case Study—Entropy Decoding
        1. 8.3.1. Customizing VLD Extensions
      4. 8.4. Limitations of Automated Extension
      5. 8.5. The Benefits of Architecture Extension
        1. 8.5.1. Customization Enables CoDesign
        2. 8.5.2. Customization Offers Performance Headroom
        3. 8.5.3. Customization Enables Platform IP
        4. 8.5.4. Customization Enables Differentiation
      6. 8.6. Conclusions
    6. 9. Coprocessor Generation from Executable Code
      1. 9.1. Introduction
      2. 9.2. User Level Flow
      3. 9.3. Integration with Embedded Software
      4. 9.4. Coprocessor Architecture
      5. 9.5. ILP Extraction Challenges
      6. 9.6. Internal Tool Flow
      7. 9.7. Code Mapping Approach
      8. 9.8. Synthesizing Coprocessor Architectures
      9. 9.9. A Real-World Example
      10. 9.10. Summary
    7. 10. Datapath Synthesis
      1. 10.1. Introduction
      2. 10.2. Custom Instruction Selection
      3. 10.3. Theoretical Preliminaries
        1. 10.3.1. The Minimum Area-Cost Acyclic Common Supergraph Problem
        2. 10.3.2. Subsequence and Substring Matching Techniques
      4. 10.4. Minimum Area-Cost Acyclic Common Supergraph Heuristic
        1. 10.4.1. Path-Based Resource Sharing
        2. 10.4.2. Example
        3. 10.4.3. Pseudocode
      5. 10.5. Multiplexer Insertion
        1. 10.5.1. Unary and Binary Noncommutative Operators
        2. 10.5.2. Binary Commutative Operators
      6. 10.6. Datapath Synthesis
        1. 10.6.1. Pipelined Datapath Synthesis
        2. 10.6.2. High-Level Synthesis
      7. 10.7. Experimental Results
      8. 10.8. Conclusion
    8. 11. Instruction Matching and Modeling
      1. 11.1. Matching Instructions
        1. 11.1.1. Introduction to Binary Decision Diagrams
        2. 11.1.2. The Translator
        3. 11.1.3. Filtering Algorithm
        4. 11.1.4. Combinational Equivalence Checking Model
        5. 11.1.5. Results
      2. 11.2. Modeling
        1. 11.2.1. Overview
        2. 11.2.2. Customization Parameters
        3. 11.2.3. Characterization for Various Constraints
          1. Area Overhead Characterization
          2. Latency Characterization
          3. Power Consumption Characterization
        4. 11.2.4. Equations for Estimating Area, Latency, and Power Consumption
        5. 11.2.5. Evaluation Results
      3. 11.3. Conclusions
      4. Appendix: Estimating Area, Latency, and Power Consumption
        1. A.1 Area Overhead Estimation
        2. A.2 Latency Estimation
        3. A.3 Power Consumption Estimation
    9. 12. Processor Verification
      1. 12.1. Motivation
      2. 12.2. Overview of Verification Approaches
        1. 12.2.1. Simulation
        2. 12.2.2. Semiformal Techniques
        3. 12.2.3. Proof Techniques
        4. 12.2.4. Coverage
      3. 12.3. Formal Verification of a RISC CPU
        1. 12.3.1. Verification Approach
        2. 12.3.2. Specification
        3. 12.3.3. SystemC Model
        4. 12.3.4. Formal Verification
          1. Hardware
          2. Interface
          3. Program
      4. 12.4. Verification Challenges in Customizable and Configurable Embedded Processors
      5. 12.5. Verification of Processor Peripherals
        1. 12.5.1. Coverage-Driven Verification Based on Constrained-Random Stimulation
        2. 12.5.2. Assertion-Based Verification of Corner Cases
        3. 12.5.3. Case Study: Verification of an On-Chip Bus Bridge
      6. 12.6. Conclusions
    10. 13. Sub-RISC Processors
      1. 13.1. Concurrent Architectures, Concurrent Applications
      2. 13.2. Motivating Sub-RISC PEs
        1. 13.2.1. RISC PEs
          1. Datatype-Level Concurrency
          2. Data-Level Concurrency
          3. Process-Level Concurrency
        2. 13.2.2. Customizable Datapaths
        3. 13.2.3. Synthesis Approaches
        4. 13.2.4. Architecture Description Languages
          1. Generating the Architecture from the Instruction Set
          2. Extracting the Instruction Set from the Architecture
          3. TIPI Processing Elements
      3. 13.3. Designing TIPI Processing Elements
        1. 13.3.1. Building Datapath Models
        2. 13.3.2. Operation Extraction
        3. 13.3.3. Single PE Simulator Generation
        4. 13.3.4. TIPI Multiprocessors
        5. 13.3.5. Multiprocessor Simulation and RTL Code Generation
      4. 13.4. Deploying Applications with Cairn
        1. 13.4.1. The Cairn Application Abstraction
        2. 13.4.2. Model Transforms
        3. 13.4.3. Mapping Models
        4. 13.4.4. Code Generation
      5. 13.5. IPv4 Forwarding Design Example
        1. 13.5.1. Designing a PE for Click
          1. Application Knowledge
          2. Architectural Knowledge
          3. Prior Experience
        2. 13.5.2. ClickPE Architecture
        3. 13.5.3. ClickPE Control Logic
        4. 13.5.4. LuleaPE Architecture
      6. 13.6. Performance Results
        1. 13.6.1. ClickPE Performance
        2. 13.6.2. LuleaPE Performance
        3. 13.6.3. Performance Comparison
        4. 13.6.4. Potentials for Improvement
      7. 13.7. Conclusion
      8. Acknowledgments
  8. III. Case Studies
    1. 14. Application Specific Instruction Set Processor for UMTS-FDD Cell Search
      1. 14.1. ASIP on Wireless Modem Design
        1. 14.1.1. The Role of ASIP
        2. 14.1.2. ASIP Challenges for a System House
        3. 14.1.3. Potential ASIP Use Cases in Wireless Receivers
      2. 14.2. Functionality of Cell Search ASIP
        1. 14.2.1. Cell Search–Related Channels and Codes
        2. 14.2.2. Cell Search Functions
        3. 14.2.3. Requirements for the ASIP
      3. 14.3. Cell Search ASIP Design and Verification
        1. 14.3.1. Microarchitecture
        2. 14.3.2. Special Function Units
        3. 14.3.3. Instruction Set
        4. 14.3.4. HDL Generation
        5. 14.3.5. Verification
      4. 14.4. Results
        1. 14.4.1. Performance
        2. 14.4.2. Synthesis Results
      5. 14.5. Summary and Conclusions
    2. 15. Hardware/Software Tradeoffs for Advanced 3G Channel Decoding
      1. 15.1. Channel Decoding for 3G Systems and Beyond
        1. 15.1.1. Turbo-Codes
      2. 15.2. Design Space
      3. 15.3. Programmable Solutions
        1. 15.3.1. VLIW Architectures
        2. 15.3.2. Customizable Processors
      4. 15.4. Multiprocessor Architectures
      5. 15.5. Conclusion
    3. 16. Application Code Profiling and ISA Synthesis on MIPS32
      1. 16.1. Profiling of Application Source Code
        1. 16.1.1. Assembly and Source Level Profiling
        2. 16.1.2. Microprofiling Approach
        3. 16.1.3. Memory Access Microprofiling
        4. 16.1.4. Experimental Results
      2. 16.2. Semiautomatic ISA Extension Synthesis
        1. 16.2.1. Sample Platform: MIPS CorExtend
        2. 16.2.2. CoWare CorXpert Tool
        3. 16.2.3. ISA Extension Synthesis Problem
        4. 16.2.4. Synthesis Core Algorithm
        5. 16.2.5. ISA Synthesis–Based Design Flow
        6. 16.2.6. Speedup Estimation
        7. 16.2.7. Exploring the Design Space
        8. 16.2.8. SW Tools Retargeting and Architecture Implementation
        9. 16.2.9. Case Study: Instruction Set Customization for Blowfish Encryption
          1. Automatic CI Identification for Blowfish
          2. Use of Scratch-Pad Memories
      3. 16.3. Summary and Outlook
      4. Acknowledgements
    4. 17. Designing Soft Processors for FPGAs
      1. 17.1. Overview
        1. 17.1.1. FPGA Architecture Overview
        2. 17.1.2. Soft Processors in FPGAs
        3. 17.1.3. Overview of Processor Acceleration
      2. 17.2. MicroBlaze Soft Processor Architecture
        1. 17.2.1. Short Description of MicroBlaze
        2. 17.2.2. Highlights of Architectural Features
      3. 17.3. Discussion of Architectural Design Tradeoffs in MicroBlaze
        1. 17.3.1. Architectural Building-Blocks and Their FPGA Implementation
          1. FPGA Logic and Memory Structures
        2. 17.3.2. Examples of Architectural Decisions in MicroBlaze
          1. Instruction Set Design
          2. Datapath Design
          3. Implementation-Specific Details
          4. Performance of Soft Processors
          5. Custom Instructions and Function Acceleration
        3. 17.3.3. Tool Support
      4. 17.4. Conclusions
      5. Acknowledgements
  9. Chapter References
    1. Chapter 1
    2. Chapter 2
    3. Chapter 3
    4. Chapter 4
    5. Chapter 5
    6. Chapter 6
    7. Chapter 7
    8. Chapter 8
    9. Chapter 9
    10. Chapter 10
    11. Chapter 11
    12. Chapter 12
    13. Chapter 13
    14. Chapter 14
    15. Chapter 15
    16. Chapter 16
    17. Chapter 17
  10. Bibliography

Product information

  • Title: Customizable Embedded Processors Design Technologies and Applications
  • Author(s): Rainer Leupers, Paolo Ienne
  • Release date: July 2006
  • Publisher(s): Morgan Kaufmann
  • ISBN: 9780080490984