The Unabridged Pentium 4 IA32 Processor Genealogy

Book description

“In this monumental new book, Tom Shanley pulls together 15 years of history of Intel’s mainline microprocessors, the most popular and important computer architecture in history. Shanley has a keen eye for the salient facts, and an outstanding sense for how to organize and display the material for easy accessibility by the reader. If you want to know what does this bit control, what does that feature do, and how did those instructions evolve through several generations of x86, this is the reference book for you. This is the book Intel should have written, but now they don’t have to.”

         —Bob Colwell, Intel Fellow

The Unabridged Pentium 4 offers unparalleled coverage of Intel’s IA32 family of processors, from the 386 through the Pentium 4 and Pentium M processors. Unlike other texts, which address solely a hardware or software audience, this book serves as a comprehensive technical reference for both audiences. Inside, Tom Shanley covers not only the hardware design and software enhancements of Intel’s latest processors, he also explains the relationship between these hardware and software characteristics. As a result, readers will come away with a complete understanding of the processor’s internal architecture, the Front Side Bus (FSB), the processor’s relationship to the system, and the processor’s software architecture.

Essential topics covered include:

  • Goals of single-task and multi-task operating systems

  • The 386 processor—the baseline ancestor of the IA32 processor family

  • The 486 processor, including a cache primer

  • The Pentium processor

  • The P6 roadmap, P6 processor core, and P6 FSB

  • The Pentium Pro processor, including the Microcode Update feature

  • The Pentium II and the Pentium II Xeon and Celeron processors

  • The Pentium III and the Pentium III Xeon and Celeron processors

  • The Pentium 4 processor family

  • The Pentium M processor

  • Processor identification, System Management Mode, and the IO and Local APICs

  • An “at-a-glance” table of contents allows readers to quickly find topics ranging from 386 Demand Mode Paging to Pentium 4 CPU Arbitration.

    The accompanying CD-ROM contains 16 extra chapters.

    Whether you design software or hardware or are responsible for system maintenance or customer support, The Unabridged Pentium 4 will prove an invaluable reference to the world’s most widely used microprocessor chips.

    MindShare’s PC System Architecture series is a crisply written and comprehensive set of guides to the most important PC hardware standards. Books in the series are intended for use by hardware and software designers, programmers, and support personnel.

    One of the leading technical training companies in the hardware industry, MindShare, Inc., provides innovative courses for dozens of companies, including HP, AMD, IBM, and Compaq. Through these classes and by writing the highly regarded PC System Architecture Series for Addison-Wesley, MindShare trainers emphasize the relationships of hardware subsystems to each other as well as the relationship between software and hardware.

    Table of contents

    1. Copyright
    2. At-a-Glance Table of Contents
    3. Figures
    4. Tables
    5. Acknowledgments
    6. About This Book
    7. Introduction
      1. Overview of the Processor Role
        1. The IA32 Specification
        2. IA32 Processors
        3. IA32 Instructions vs. μops
        4. Processor = Instruction Fetch/Decode/Execute Engine
        5. Some Instructions Result in FSB Transactions
        6. The Processor's Role in Today's Systems
        7. System Overview
    8. Single-/MultiTask OS Background
      1. Single-Task OS and Application
        1. Operating System Overview
        2. Direct IO Access
        3. Application Program Memory Usage
        4. Task Initiation, Execution and Termination
      2. Definition of Multitasking
        1. Concept
        2. An Example—Timeslicing
        3. Another Example—Awaiting an Event
      3. Multitasking Problems
        1. OS Protects Territorial Integrity
        2. Stay in Your Own Memory Area
        3. IO Port Anarchy
        4. Unauthorized Use of OS's Tools
        5. No Interrupts, Please!
        6. BIOS Calls
    9. The 386
      1. 386 Real Mode Operation
        1. Special Note
        2. An Overview of the 386 Internal Architecture
        3. An Overview of the 386DX FSB
        4. The 386 Register Set
        5. 386 Power-Up State
        6. Initial Memory Reads
        7. IO Port Addressing
        8. Memory Addressing
        9. Real Mode Instructions and Registers
        10. Real Mode Interrupt/Exception Handling
        11. Protection in Real Mode
      2. Protected Mode Introduction
        1. General
        2. Memory Protection
        3. IO Protection
        4. Privilege Levels
        5. Virtual 8086 Mode
        6. Task Switching
        7. Interrupt Handling
      3. Intro to Segmentation in Protected Mode
        1. Special Note
        2. Real Mode Limitations
        3. Segment Descriptor Describes a Memory Area in Detail
        4. Segment Register—Selects Descriptor Table and Entry
        5. Introduction to the Descriptor Tables
        6. General Segment Descriptor Format
      4. Code Segments
        1. Selecting the Code Segment to Execute
        2. Code Segment Descriptor Format
        3. Accessing the Code Segment
        4. Privilege Checking
        5. Calling a Procedure in the Current Task
        6. Call Gate
      5. Data and Stack Segments
        1. A Note Regarding Stack Segments
        2. The Data Segments
        3. Selecting and Accessing a Stack Segment
      6. Creating a Task
        1. What Is a Task?
        2. Basics of Task Creation and Startup
        3. TSS Structure
        4. TSS Descriptor
        5. How the OS Starts a Task
        6. What Happens When a Task Starts
        7. Use of the LTR and STR Instructions
      7. Mechanics of a Task Switch
        1. Events that Initiate a Task Switch
        2. Switch Via a TSS Descriptor
        3. Task Gate Descriptor
        4. Task Switch Details
        5. Linked Tasks
        6. Linkage Modification
        7. The Busy Bit
        8. Address Mapping
      8. 386 Demand Mode Paging
        1. Problem—Loading Entire Task into Memory is Wasteful
        2. Solution—Load Part and Keep Remainder on Disk
        3. Problem—Running Two (or more) DOS Programs
        4. Solution—Redirect Memory Accesses to Separate Memory Areas
        5. Global Solution—Map Linear Address to Disk Address or to a Different Physical Memory Address
        6. The Paging Unit Is the Translator
        7. Three Possible Page Lookup Methods
        8. IA32 Page Lookup Method
        9. Enabling Paging
        10. Page Directory and Page Tables
        11. Finding the Location of a Physical Page
        12. Eliminating the Directory Lookup
        13. Checking Page Access Permission
        14. Page Faults
        15. Usage of the Dirty and Accessed Bits
        16. Demand Mode Paging Evolution
      9. The Flat Model
        1. Segments Complicate Things
        2. Paging Can Do It All
        3. Eliminating Segmentation
        4. The Privilege Check
        5. The Read/Write Check
        6. Each Task (including the OS) Has Its Own TSS
      10. Interrupts and Exceptions
        1. Special Note
        2. General
        3. Hardware Interrupts
        4. Software-Generated Exceptions
        5. Interrupt/Exception Priority
        6. Real Mode Interrupt/Exception Handling
        7. Protected Mode Interrupt/Exception Handling
        8. Interrupt/Exception Handling in VM86 Mode
        9. Exception Error Codes
        10. The Resume Flag Prevents Multiple Debug Exceptions
        11. Special Case—Interrupts Disabled While Updating SS:ESP
        12. Detailed Description of the Software Exceptions
      11. Virtual 8086 Mode
        1. A Special Note
        2. DOS Application—Portrait of an Anarchist
        3. Solution—Set a Watchdog on the DOS Application
        4. The Virtual Machine Monitor (VMM)
        5. Entering or Reentering VM86 Mode
        6. An Interrupt or Exception Causes an Exit From VM86 Mode
        7. A Task Switch Causes an EFlags Update
        8. DOS Task's Memory Usage
        9. The Privilege Level of a VM86 Task
        10. Restricting IO Accesses
        11. IOPL-Sensitive Instructions
        12. Interrupt/Exception Generation and Handling
        13. Registers Accessible in Real/VM86 Mode
        14. Instructions Usable in Real/VM86 Mode
        15. VM86 Mode Evolution
      12. The Debug Registers
        1. The Debug Registers
    10. 486
      1. Caching Overview
        1. Definition of a Load and a Store
        2. The Cache's Purpose
        3. The Write-Through Cache
        4. The Write Back Cache
        5. Snooping
        6. The Overall Cache Architecture
        7. Cache Real Estate Management
        8. A Unified Cache
        9. Split Caches
        10. Non-Blocking Caches
      2. 486 Hardware Overview
        1. 486 Flavors
        2. An Overview of the 486 Internal Architecture
        3. An Overview of the 486 FSB
        4. A20 Mask
        5. On-Chip Cache Added
      3. 486 Software Enhancements
        1. FPU Added On-Die
        2. Alignment Checking Feature
        3. Paging-Related Changes
        4. Caching-Related Changes to the Programming Environment
        5. CR4 Was Added in the Later Models of the 486
        6. Test Registers Added
        7. Instruction Set Changes
        8. New/Altered Exceptions
        9. System Management Mode (SMM)
    11. Pentium®
      1. Pentium® Hardware Overview
        1. Pentium® Flavors
        2. An Overview of the Pentium® Internal Architecture
        3. An Overview of the Pentium® FSB
        4. The Caches
        5. Local APIC Added in the P54C
        6. Test Access Port (TAP)
        7. FRC Mode
        8. Soft Reset (INIT#)
      2. Pentium® Software Enhancements
        1. VM86 Extensions
        2. Protected Mode Virtual Interrupts
        3. Debug Extension
        4. Time Stamp Counter
        5. 4MB Pages
        6. Machine Check Architecture (MCA)
        7. Performance Monitoring
        8. Local APIC Register Set
        9. Test Registers Relocated
        10. MSRs Added
        11. Instruction Set Changes
        12. New/Altered Exceptions
    12. Intro to the P6 Core and FSB
      1. P6 Road Map
        1. The P6 Processor Family
        2. The Klamath Core
        3. The Deschutes Core
        4. The Katmai Core
      2. P6 Hardware Overview
        1. For More Detail
        2. Introduction
        3. The P6 Processor Core
        4. The FSB Interface Unit
        5. The Backside Bus (BSB) Interface Unit
        6. The Unified L2 Cache
        7. The L1 Data Cache
        8. The L1 Code Cache
        9. The Processor Core
        10. The Local APIC Unit
    13. Pentium® Pro Software Enhancements
      1. Pentium® Pro Software Enhancements
        1. Paging Enhancements
        2. APIC Enhancements
        3. MMX Not Implemented
        4. SMM Enhancement
        5. MTRRs Added
        6. MCA Enhanced
        7. The Performance Counters
        8. MSRs Added
        9. Instruction Set Changes
        10. New/Altered Exceptions
      2. MicroCode Update Feature
        1. The Problem
        2. The Solution
        3. The Microcode Update Image
        4. Matching the Image to a Processor
        5. The Microcode Update Loader
        6. Updates in a Multiprocessor System
        7. The Image Management BIOS
        8. When Must the Image Upload Take Place?
        9. Determining if a New Update Supersedes a Previously-Loaded Update
        10. Effect of RESET# Or INIT# on a Previously-Loaded Update
    14. Pentium® II
      1. Pentium® II Hardware Overview
        1. The Pentium® Pro and Pentium® II: Same CPU, Different Package
        2. Dual-Independent Bus Architecture (DIBA)
        3. IOQ Depth
        4. Pentium® Pro/Pentium® II Differences
        5. One Product Yields Three Product Lines
        6. The Pentium® II/Xeon/Celeron Roadmap
        7. The Cartridge
        8. The Core
        9. The FSB and BSB
        10. The Introduction of the Celeron
        11. Miscellaneous Hardware Stuff
      2. Pentium® II Power Management Features
        1. The Pentium® Pro's Power Conservation Modes
        2. The Pentium® II's Power Conservation Modes
        3. The Normal State
        4. The AutoHalt Power Down State
        5. The Stop Grant State
        6. The Halt/Grant Snoop State
        7. The Sleep State
        8. The Deep Sleep State
      3. Pentium® II Software Enhancements
        1. The Pentium® II and Pentium® III MSRs
        2. Instruction Set Changes
        3. New/Altered Exceptions
      4. Pentium® II Xeon Features
        1. Introduction
        2. To Avoid Confusion...
        3. Basic Characteristics
        4. Hardware Characteristics
        5. PSE-36 Mode
    15. Pentium® III
      1. Pentium® III Hardware Overview
        1. One Product = Three Product Lines
        2. Pentium® II/Pentium® III Differences
        3. The Pentium® III/Xeon/Celeron Roadmap
        4. IOQ Depth
        5. The L1 Caches
        6. The L2 Cache
        7. The Data Prefetcher
        8. SSE Introduced
        9. The WCBs Were Enhanced
        10. Additional Writeback Buffers
        11. SpeedStep Technology
      2. Pentium® III Software Enhancements
        1. The Streaming SIMD Extensions (SSE)
        2. CPUID Enhanced
      3. Pentium® III Xeon Features
        1. Basic Characteristics
        2. PAT Feature (Page Attribute Table)
    16. Pentium® 4
      1. Pentium® 4 Road Map
        1. The Roadmap
      2. Pentium® 4 System Overview
        1. General
        2. The Graphics Adapter
        3. Device Adapters
        4. Snooping
        5. Definition of a Cluster
        6. Definition of the Boot Strap Processor
        7. Starting up the Application Processors (the APs)
      3. Pentium® 4 Processor Overview
        1. The Pentium® 4 Processor Family
        2. Pentium® III/Pentium® 4 Differences
        3. Pentium® 4/Pentium® 4 Prescott Differences
        4. Pentium® 4 Processor Basic Organization
        5. The FSB is Tuned for Multiprocessing
        6. Intro to the FSB Enhancements
        7. IA Instructions Vary in Length and Are Complex
        8. The Trace Cache
        9. There Are Two Pipeline Sections
        10. The μop Pipeline
        11. The IA32 Data Register Set Was Small
        12. Speculative Execution
      4. Pentium® 4 PowerOn Configuration
        1. Configuration on Trailing-Edge of Reset
        2. Setup and Hold Time Requirements
        3. Built-In Self-Test (BIST) Trigger
        4. Assignment of IDs to the Processor
        5. Error Observation Options
        6. In-Order Queue Depth Selection
        7. Power-On Restart Address
        8. Tri-State Mode
        9. Processor Core Speed Selection
        10. Bus Parking Option
        11. Hyper-Threading Option
        12. Program-Accessible Startup Features
      5. Pentium® 4 Processor Startup
        1. Introduction
        2. The Processor's State After Reset
        3. EAX, EDX Content After Reset Removal
        4. The Core Is Starving and Caching is Disabled
        5. Boot Strap Processor (BSP) Selection
        6. How the APs are Discovered and Configured
      6. Pentium® 4 Core Description
        1. One μop Doesn't Necessarily = One IA32 Instruction
        2. Upstream vs. Downstream
        3. Introduction
        4. The Big Picture
        5. The Front-End Pipeline Stages
        6. Intro to the μop Pipeline
        7. The μop Pipeline's Major Elements
        8. Additional, Core-Specific Terms
      7. Hyper-Threading
        1. General
        2. Background
        3. The HT Approach
        4. Overview of HT Resource Usage
        5. HT and the Data TLB
        6. HT and the FSB
        7. The IOQ Depth Was Increased
        8. HT Performance Issues
        9. HT and Serializing Instructions
        10. HT and the Microcode Update Feature
        11. HT Cache-Related Issues
        12. HT and the TLBs
        13. HT and the Thermal Monitor Feature
        14. HT and External Pin Usage
      8. The Pentium® 4 Caches
        1. A Cache Primer
        2. The L0 Cache
        3. Upstream vs. Downstream
        4. Overview
        5. Determining the Processor's Cache Sizes and Structures
        6. Enabling/Disabling the Caches
        7. The L1 Data Cache
        8. The L2 ATC
        9. The L3 Cache
        10. FSB Transactions and the Caches
        11. The Cache Management Instructions
      9. Pentium® 4 Handling of Loads and Stores
        1. The Memory Type Defines Load/Store Characteristics
        2. Load μops
        3. Store-to-Load Forwarding
        4. Store μops
        5. The MFENCE Instruction
        6. Non-Temporal Stores
      10. The Pentium® 4 Prescott
        1. Introduction
        2. Increased Pipeline Depth
        3. Trace Cache Improvements
        4. Increased Number of WCBs
        5. L1 Data Cache Changes
        6. Increased L2 Cache Size
        7. Enhanced Branch Prediction
        8. Store Forwarding Improved
        9. SSE3 Instruction Set
        10. Increased Elimination of Dependencies
        11. Enhanced Shifter/Rotator
        12. Integer Multiply Enhanced
        13. Scheduler Enhancements
        14. Fixed the MXCSR Serialization Problem
        15. Data Prefetch Instruction Execution Enhanced
        16. Improved the Hardware Data Prefetcher
        17. Hyper-Threading Improved
      11. Pentium® 4 FSB Electrical Characteristics
        1. Introduction
        2. The Bus and Processor Clocks
        3. The Address and Data Strobes
        4. The Voltage ID
        5. Everything's Relative
        6. Signals that Can Be Driven by Multiple FSB Agents
        7. Minimum One BCLK Response Time
      12. Intro to the Pentium® 4 FSB
        1. Enhanced Mode Scaleable Bus
        2. FSB Agents
        3. Uniprocessor vs. Multiprocessor Bus
        4. The Request Agent
        5. The Transaction Phases
        6. Transaction Pipelining
        7. Transaction Tracking
      13. Pentium® 4 CPU Arbitration
        1. The Request Phase
        2. Logical versus Physical Processors
        3. The Discussion Assumes a Quad Xeon MP System
        4. Symmetric Agent Arbitration—Democracy at Work
      14. Pentium® 4 Priority Agent Arbitration
        1. Priority Agent Arbitration
      15. Pentium® 4 Locked Transaction Series
        1. Introduction
        2. The Shared Resource Concept
        3. Testing the Availability of and Gaining Ownership of Shared Resources
        4. A Race Condition Can Present a Problem
        5. Guaranteeing the Atomicity of a Read/Modify/Write
        6. Locking a Cache Line
      16. Pentium® 4 FSB Blocking
        1. Blocking New Requests—Stop! I'm Full!
        2. Assert BNR# When One Entry Remains
        3. BNR# Can Be Used by a Debug Tool
        4. Who Monitors BNR#?
        5. BNR# is a Shared Signal
        6. The Stalled/Throttled/Free Indicator
        7. BNR# Behavior at Powerup
        8. BNR# Behavior During Runtime
      17. Pentium® 4 FSB Request Phase
        1. Cautionary Note
        2. Introduction to the Request Phase
        3. The Source Synchronous Strobes
        4. The Request Phase Parity
        5. Request Phase Parity Checking
        6. The Request Phase Signal Group is Multiplexed
        7. Introduction to the Transaction Types
        8. The Contents of Request Packet A
        9. The Contents of Request Packet B
      18. Pentium® 4 FSB Snoop Phase
        1. Agents Involved in the Snoop Phase
        2. The Snoop Phase Has Two Purposes
        3. The Snoop Result Signals are Shared, DEFER# Isn't
        4. The Snoop Phase Duration Is Variable
        5. There Is No Snoop Stall Duration Limit
        6. Memory Transaction Snooping
        7. Non-Memory Transactions Have a Snoop Phase
      19. Pentium® 4 FSB Response and Data Phases
        1. A Note on Deferred Transactions
        2. The Purpose of the Response Phase
        3. The Response Phase Signal Group
        4. The Response Phase Start Point
        5. The Response Phase End Point
        6. The Response Types
        7. The Response Phase May Complete a Transaction
        8. The Data Phase Signal Group
        9. Five Example Scenarios
        10. Data Phase Wait States
        11. The Response Phase Parity
        12. Data Bus Parity
      20. Pentium® 4 FSB Transaction Deferral
        1. Example System Models
        2. Example Multi-Cluster Model
        3. The Problem
        4. Possible Solutions
        5. Example Read From a PCI Express Device
        6. Example Write To a PCI Express Device
        7. Pentium® 4 Support for Transaction Deferral
      21. Pentium® 4 FSB IO Transactions
        1. Introduction
        2. The IO Address Range
        3. The Data Transfer Length
      22. Pentium® 4 FSB Central Agent Transactions
        1. Point-to-Point vs. Broadcast
        2. The Interrupt Acknowledge Transaction
        3. The Special Transaction
        4. The BTM Transaction Is Used for Program Debug
      23. Pentium® 4 FSB Miscellaneous Signals
        1. The Signals
      24. Pentium® 4 Software Enhancements
        1. The Foundation
        2. Miscellaneous New Instructions
        3. Enhanced CPUID Instruction
        4. The SSE2 Instruction Set
        5. The SSE3 Instruction Set
        6. Local APIC Enhancements
        7. The Thermal Monitoring Facilities
        8. FPU Enhancement
        9. The MSRs
        10. The Machine Check Architecture
        11. Last Branch, Interrupt, and Exception Recording
        12. The Debug Store (DS) Mechanism
        13. New Exceptions
        14. The Performance Monitoring Facility
      25. Pentium® 4 Xeon Features
        1. General
        2. The Pentium® 4 Xeon DP
        3. The Pentium® 4 Xeon MP
    17. Pentium® M
      1. Pentium® M Processor
        1. Background
        2. The Pentium® M and Centrino
        3. Characteristics Overview
        4. The FSB Characteristics
        5. Enhanced Power Management Characteristics
        6. Three Different Packaging Models
        7. Improved Thermal Monitor Mode
        8. Enhanced Branch Prediction
        9. μop Fusion
        10. Advanced Stack Management
        11. Miscellaneous
        12. The Data Cache and Hyper-Threading
        13. The Next Pentium® M
    18. Additional Topics
      1. CPU Identification
        1. Prior to the Advent of the CPUID Instruction
        2. Determining if the CPUID instruction Is Supported
        3. General
        4. Determining the Request Types Supported
        5. The Basic Request Types
        6. The Extended Request Types
        7. Enhanced Processor Signature
      2. System Management Mode (SMM)
        1. What Falls Under the Heading of System Management?
        2. The Genesis of SMM
        3. SMM Has Its Own Private Memory Space
        4. The Basic Elements of SMM
        5. A Very Simple Example Scenario
        6. How the Processor Knows the SM Memory Start Address
        7. Protected Mode, Paging and PAE-36 Mode Are Disabled
        8. The Organization of SM RAM
        9. Entering SMM
        10. Exiting SMM
        11. Caching from SM Memory
        12. Setting Up the SMI Handler in SM Memory
        13. Relocating the SM RAM Base Address
        14. SMM in an MP System
      3. The Local and IO APICs
        1. Before the Advent of the APIC
        2. MP Systems Need a Better Interrupt Distribution Mechanism
        3. A Short History of the APIC
        4. Detecting the Presence and Version of the Local APIC
        5. Enabling/Disabling the Local APIC
        6. Local Cluster and APIC ID Assignment
        7. An Introduction to the Interrupt Sources
        8. Introduction to Interrupt Priority
        9. An Intro to Edge-Triggered Interrupts
        10. An Intro to Level-Sensitive Interrupts
        11. The Local APIC Register Set
        12. Locally Generated Interrupts
        13. Task and Processor Priority
        14. Interrupt Messages
        15. The IO APIC
        16. Message Signaled Interrupts (MSI)
        17. Message Format
        18. The Spurious Interrupt Vector
        19. The Agents in an Interrupt Message Transaction
        20. BSP Selection Process
        21. The APIC, the MPS and ACPI
    19. Acronyms
    20. CD-ROM Warranty
    21. Index

    Product information

    • Title: The Unabridged Pentium 4 IA32 Processor Genealogy
    • Author(s): Tom Shanley, Bob Colwell
    • Release date: July 2004
    • Publisher(s): Addison-Wesley Professional
    • ISBN: 9780321246561