Nearly
all PCs use either an Intel CPU or an
Intel-compatible CPU made by AMD (K6/Athlon/Duron series). The
dominance of Intel in CPUs and Microsoft in operating systems gave
rise to the hybrid term Wintel
, which refers to
systems that run Windows on an Intel or compatible CPU. Intel
processors are referred to generically as x86
processors
, based on Intel’s early
processor naming convention, 8086, 80186, 80286, etc. Intel has
produced seven CPU generations, the first five of which are now
obsolete and the sixth obsolescent.
- First generation
The 8086 was Intel’s first mainstream processor, and used 16 bits for both internal and external communications. The 8086 was first used in the late 1970s in dedicated word processors and minicomputers like the DisplayWriter and the System/23 DataMaster. When IBM shipped their first PC in 1981, they used the 8088, an 8086 variant that used 16 bits internally but only 8 bits externally, because 8-bit peripherals were at that time more readily available and less expensive than were 16-bit components. The 8086 achieved prominence much later when Compaq created the DeskPro as an improved clone of the IBM PC/XT. A few early PCs, notably Radio Shack models, were also built around the 80186 and 80188 CPUs, which were enhanced versions of the 8086 and 8088 respectively. The 8088 and 8086 CPUs did not include a floating-point unit (FPU), although an 8087 FPU, called a
math coprocessor
, was available as an optional upgrade chip. First-generation Intel CPUs (or their modern equivalents) are still used in some embedded applications, but they are long obsolete as general-purpose CPUs.- Second generation
In 1982, Intel introduced the long-awaited follow-on to their first-generation processors. The 80286, based on the iAPX-32 core, provided a quantum leap in processor performance, executing instructions as much as five times faster than an 808x processor running at the same clock speed. The 80286 processed instructions as fast as many mainframe processors of the time. The 80286 also increased addressable memory from 1 MB to 16 MB, and introduced
protected mode
operations. The IBM PC/AT was the first commercial implementation of the 80286. The optional 80287 FPU chip added floating-point acceleration to 80286 systems. Although long obsolete as a general-purpose CPU, the 80286 is still used in embedded controllers.- Third generation
Intel’s next generation debuted in 1985 as the 80386, later shortened to just 386. The 386 was Intel’s first 32-bit CPU, which communicated internally and externally with a 32-bit data bus and 32-bit address bus. The 386 was available in 16, 20, 25, and 33 MHz versions. Although 386 clock speeds were only slightly faster than those of the 80286, improved architecture resulted in significant performance increases. The optional 80387 FPU added floating-point acceleration to 386 systems. Intel later renamed the 386 to the 386DX and released a cheaper version called the 386SX, which used 32 bits internally but only 16 bits externally. The 386SX was notable as the first Intel processor that included an internal (L1) cache, although it was only 8 KB and relatively inefficient. The 386 is long obsolete as a general-purpose CPU, but is still commonly used in embedded controllers.
- Fourth generation
Intel’s next generation debuted in 1989 as the 486 (there never was an 80486). The 486 was a full 32-bit CPU with 8 KB of L1 cache, included a built-in FPU, and was available in speeds from 20 MHz to 50 MHz. Intel released 486DX and 486SX versions. The 486SX was in fact a 486DX with the FPU disabled. Intel sold the 487SX, which was actually a full-blown 486DX. Installing a 487SX in the coprocessor socket simply disabled the existing 486SX. The 486DX/2, introduced in 1992, was the first Intel processor that ran internally at a multiple of the memory bus speed. The 486DX/2 clock ran at twice bus speed, and was available in 25/50, 33/66, and 40/80 MHz versions. The 486DX/4, introduced in 1994, ran (despite its name) at thrice bus speed, doubled L1 cache to 16 KB, and was available in 25/75, 33/100, and 40/120 versions. The 486 is obsolete, but not for the reason you might think. A fast 486 with sufficient memory is still fast enough to run Windows 9X or Linux in undemanding applications, but these systems are so old that essentially none of them were Y2K compliant when manufactured, and BIOS updates to make them so are generally unavailable. The only practical way to upgrade a 486 system is to discard it and buy or build a new system.
- Fifth generation
The Intel Pentium CPU defines the fifth generation. It provides much better performance than its 486 ancestors by incorporating several architectural improvements, most notably an increase in data bus width from 32 bits to 64 bits and an increase in CPU-memory bus speed from 33 MHz to 60 and 66 MHz. Intel actually shipped several different versions of the Pentium, including:
- Pentium P54
The original Pentium shipped in 1993 in 50, 60, and 66 MHz versions using a 1X CPU multiplier, ran (hot) at 5.0 volts, contained a dual 8 KB + 8 KB L1 cache, and fit Socket 4 motherboards.
- Pentium P54C
The “Classic Pentium” first shipped in 1994, was available in speeds from 75 to 200 MHz using CPU multipliers from 1.5 to 3.0, used 3.3 volts, and contained the same dual L1 cache as the P54. P54C CPUs fit Socket 5 motherboards and most Socket 7 motherboards.
- Pentium P55C
The Pentium/MMX shipped in 1997, was available in speeds from 166 to 233 MHz using CPU multipliers from 2.5 to 3.5, used 3.3 volts, and contained a dual 16 KB + 16 KB L1 cache, twice the size of earlier Pentiums. The other major change from the P54C was the addition of the MMX instruction set, a set of additional instructions that greatly improved graphics processing speed. P55C CPUs fit Socket 7 motherboards, and were still commercially available as late as 2000.
- Sixth generation
This generation began with the 1995 introduction of the Pentium Pro, and includes recent Intel processors such as the Pentium II, Celeron, and Pentium III. Late-model sixth-generation Intel desktop processors are now relegated to entry-level systems, and will be gradually phased out during 2002, with only the Tualatin-core Celeron processors remaining as representatives of this generation by the end of 2002.
- Seventh generation
This is the current generation of Intel processors, and includes Intel’s flagship Pentium 4 and the P4-based Willamette128-core Celeron.
Intel currently manufactures several sixth-generation processors, including numerous variants and derivatives of the Celeron and Pentium III, and two seventh-generation processors, the Pentium 4 and the Willamette-core Celeron. The following sections describe current and recent Intel processors.
Tip
There are times when it is essential to identify the processor a system uses. For information about identifying Intel processors, see http://www.hardwareguys.com/supplement/cpu-id.html.
Intel originally designated their processors by number rather than by
name—Intel 8086, 8088, 80186, 80286, and so on. Intel dropped
the “80” prefix early in the life
cycle of the 80386, relabeling it as the 386. (Intel never made an
“80486” processor despite what some
people believe.) By the time Intel shipped their fourth-generation
processors, they were tired of other makers using similar names for
their compatible processors. Intel believed that these similar names
could lead to confusion among customers, and so tried to trademark
their X86 naming scheme. When Intel learned that part numbers cannot
be trademarked, they decided to drop the
“86” naming scheme and create a
made-up word to name their fifth-generation processors. They came up
with Pentium
.
Intel has produced the following three major subgenerations of Pentium:
- P54
These earliest Pentium CPUs, first shipped in March 1993, fit Socket 4 motherboards, use a 3.1 million transistor core, have 16 KB L1 cache, and use 5.0 volts for both core and I/O components. P54-based systems use a 50, 60, or 66 MHz memory bus and a fixed 1.0 CPU multiplier to yield processor speeds of 50, 60, or 66 MHz.
- P54C
The so-called
Classic Pentium
CPUs, first shipped in October 1994, fit Socket 5 and most Socket 7 motherboards, use a 3.3 million transistor core, have 16 KB L1 cache, and generally use 3.3 volts for both core and I/O components. P54C-based systems use a 50, 60, or 66 MHz memory bus and CPU multipliers of 1.5, 2.0, 2.5, and 3.0x to yield processor speeds of 75, 90, 100, 120, 133, 150, 166, and 200 MHz.- P55C
The Pentium/MMX CPUs (shown in Figure 4-1), first shipped in January 1997, fit Socket 7 motherboards, use a 4.1 million transistor core, have a 32 KB L1 cache, improved branch prediction logic, and generally use a 2.8 volt core and 3.3 volt I/O components. P55C-based systems use a 60 or 66 MHz memory bus and CPU multipliers of 2.5, 3.0, 3.5, 4.0, 4.5, and 5.0x to yield processor speeds of 120, 133, 150, 166, 200, 233, 266, and 300 MHz.
The Pentium was a quantum leap from the 486 in complexity and architectural efficiency. It is a CISC (Complex Instruction Set Computer) processor, and was initially built on a 0.35 micron process (later 0.25 micron). Pentiums, like 486s, use 32-bit operations internally. Externally, however, the Pentium doubles the 32-bit 486 data bus to 64 bits, allowing it to access eight full bytes at a time from memory. With the Pentium, Intel also introduced new chipsets to support this wider data bus and other Pentium enhancements.
The Pentium uses a dual-pipelined superscalar
design, which, relative to the 486 and earlier CPUs, allows it to
execute more instructions per clock cycle. The Pentium executes
integer instructions using the same five stages as the
486—Prefetch
,
Instruction Decode
,
Address Generate
,
Execute
, and
Write Back
—but the Pentium has two
parallel integer pipelines versus the 486’s one,
which allows the Pentium to execute two integer operations
simultaneously in parallel. This means that, for equal clock speeds,
the Pentium processes integer instructions about twice as fast as a
486.
The Pentium includes an improved 80-bit FPU that is much more
efficient than the 486 FPU. The Pentium also includes a
branch target buffer
to provide dynamic branch
prediction, a process that greatly enhances instruction execution
efficiency. Finally, the Pentium includes a system
management module that can control power use by the
processor and peripherals.
P54 Pentiums also improved upon 486 L1 caching. The 486 has one 8 KB
L1 cache (16 KB for the 486DX/4) that uses the inefficient
write-through
algorithm. P54 and P54C Pentiums
have dual 8 KB L1 caches—one for data and one for
instructions—that use the much more efficient two-way
set associative write-back
algorithm.
This doubling of L1 cache buffers and the improved caching algorithm
combine to greatly enhance CPU performance. P55C Pentiums double
L1 cache size to 16 KB, providing
still more improvement.
The changes from the P54 to the P54C were relatively minor. Higher voltages and faster CPU speeds generate more heat, so Intel reduced the core and I/O voltages from 5.0/5.0V in the P54 to 3.3/3.3V in the P54C, allowing them to run the CPUs faster without excessive heating. They also introduced support for CPU multipliers, which allow the CPU to run internally at some multiple of the memory bus speed.
The changes from the P54C Classic to the P55C MMX were much more significant. In fact, had Intel not already introduced the Pentium Pro (their first sixth-generation CPU) before the P55C, the P55C might have been considered the first of a new CPU generation. In addition to doubling L1 cache size, the P55C incorporated two major architectural enhancements:
- MMX
Although sometimes described as
MultiMedia eXtensions
orMatrix Math eXtensions
, Intel says officially that MMX stands for nothing. MMX is a set of 57 added instructions that are dedicated to manipulating audio, video, and graphics data more efficiently.- SIMD
Single Instruction Multiple Data
(SIMD) is an architectural enhancement that allows one instruction to operate simultaneously on multiple sets of similar data.
In conjunction, MMX and SIMD greatly extend the Pentium’s ability to perform parallel operations, processing eight bytes of data per clock cycle rather than one byte. This is particularly important for heavily graphics-oriented operations such as video, because it allows the P55C to retrieve and process eight one-byte pixels in one operation rather than manipulating those eight bytes as eight separate operations. Intel estimates that MMX and SIMD used with non-optimized software yields performance increases of as much as 20%, and can yield increases of 60% when used with MMX-aware applications.
For additional information about Pentium processors, including detailed identification tables, visit http://www.hardwareguys.com/supplement/pentium.html.
Intel’s first sixth-generation CPU, the Pentium Pro, was introduced in November 1995—along with the new 3.3 volt 387-pin Socket 8 motherboards required to accept it—and was discontinued in late 1998. Intel positioned the Pentium Pro for servers, a niche it never escaped, and where it continued to sell in shrinking numbers until its replacement, the Pentium II Xeon, shipped in mid-1998. The Pentium Pro pre-dated the P55C Pentium/MMX, and never shipped in an MMX version. The Pentium Pro never sold in large numbers for two reasons:
- Cost
The Pentium Pro was a very expensive processor to build. Its core logic comprised 5.5 million transistors (versus 4.1 million in the P55C), but the real problem was that the Pentium Pro also included a large L2 cache on the same substrate as the CPU. This L2 cache required millions of additional transistors, which in turn required a much larger die size and resulted in a much lower percentage yield of usable processors, both factors that kept Pentium Pro prices very high relative to other Intel CPUs.
- 32-bit optimization
The Pentium Pro was optimized to execute 32-bit operations efficiently at the expense of 16-bit performance. For servers, 32-bit optimization is ideal, but slow 16-bit operations meant that a Pentium Pro actually ran many Windows 95 client applications slower than a Pentium running at the same clock speed.
The Pentium Pro shipped in 133, 150, 166, 180, and 200 MHz versions with 256 KB, 512 KB, or 1 MB of L2 cache, and was never upgraded to a faster version. The Pentium Pro continued to sell long after the introduction of much faster Pentium II CPUs for only one reason: the first Pentium II chipsets supported only two-way Symmetric Multiprocessing (SMP) while Pentium Pro chipsets supported four-way SMP. In some server environments, four 200 MHz Pentium Pro CPUs outperformed two 450 MHz Pentium II CPUs. The introduction of the 450NX chipset, which supports four-way SMP, and the mid-1998 introduction of the Pentium II Xeon processor, which supports eight-way SMP, removed the raison d'être for the Pentium Pro, and it died a quick death.
Although the Pentium Pro is discontinued, it was the first Intel sixth-generation processor, and as such introduced many important architectural improvements. Understanding the Pentium Pro vis-à-vis the Pentium helps to understand current Intel CPU models. The two CPUs differ in the following major respects:
- Secondary (L2) cache
Pentium-based systems may optionally be equipped with an external L2 secondary cache of any size supported by the chipset. Typical Pentium systems have a 256 KB L2 cache, but high-performance motherboards may include a 512 KB, 1 MB, or larger L2 cache. But Pentium L2 caches use a narrow (32-bit), slow (60 or 66 MHz memory bus speed) link between the processor’s L1 cache and the L2 cache. The Pentium Pro L2 cache is internal, located on the CPU itself, and the Pentium Pro uses a 64-bit data path running at full processor speed to link L1 cache to L2 cache. The dedicated high-speed bus used to connect to cache is called the
back-side bus
(BSB), as opposed to the traditional CPU-to-memory bus, which is now designated thefront-side bus
(FSB). In conjunction, the BSB and FSB are called thedual independent bus
(DIB) architecture. DIB architecture yields dramatically improved cache performance. In effect, 256 KB of Pentium Pro L2 cache provides about the same performance boost as 2 MB or more of Pentium L2 cache.- Dynamic execution
The Pentium Pro uses a combination of techniques—including
branch prediction
,data flow analysis
, andspeculative execution
—that collectively are referred to asdynamic execution
. Using these techniques, the Pentium Pro productively uses clock cycles that would otherwise be wasted, as they are with the Pentium.- Super-pipelining
Super-pipelining
is a technique that allows the Pentium Pro to useout-of-order instruction execution
, another method to avoid wasting clock cycles. The Pentium executes instructions on a first-come, first-served basis, which means that it waits for all required data to process an earlier instruction instead of processing a later instruction for which it already has all of the data. Because it useslinear instruction sequencing
, orstandard pipelining
, the Pentium wastes what could otherwise be productive clock cycles executing no-op instructions. The Pentium Pro is the first Intel CPU to use super-pipelining. It has a 14-stage pipeline, divided into three sections. The first section, thein-order front end
, comprises eight stages, and decodes and issues instructions. The second section, theout-of-order core
, comprises three stages, and executes instructions in the most efficient order possible based on available data, regardless of the order in which it received the instructions. The third and final section, thein-order retirement section
, receives and forwards the results of the second section.- CISC versus RISC core
The most significant architectural difference between the Pentium and the sixth-generation processors is how they handle instructions internally. Pentiums use a
Complex Instruction Set Computer
(CISC) core. CISC means that the processor understands a large number of complicated instructions, each of which accomplishes a common task in just one instruction. The Pentium Pro was the first Intel CPU to use aReduced Instruction Set Computer
(RISC) core. RISC means that the processor understands only a few simple instructions. Complex operations are performed by stringing together multiple simple instructions. Although RISC CPUs must perform many simple instructions to accomplish the same task that CISC CPUs do with just one or a few complex instructions, the simple RISC instructions execute much faster than CISC instructions.
The Pentium Pro translates standard Intel x86 CISC instructions into RISC instructions that the Pentium Pro micro-code uses internally, and then passes those RISC instructions to the internal out-of-order execution core. This translation helps avoid limitations of the standard x86 CISC instruction set and supports the out-of-order execution that prevents pipeline stalls, but those benefits have a price. Although the time required is measured in nanoseconds, converting from CISC to RISC does take time, and that slows program execution. Also, 16-bit instructions convert inefficiently and frequently result in pipeline stalls in the out-of-order execution unit, which commonly result in CPU wait states of as many as seven clock cycles. The upshot is that, for pure 32-bit operations, the benefit of RISC conversion greatly outweighs the drawbacks, but for 16-bit operations, the converse is true.
For additional information about Pentium Pro processors, including detailed identification tables, visit http://www.hardwareguys.com/supplement/pentium-pro.html.
Intel’s first mainstream sixth-generation CPU, the Pentium II, shipped in May 1997. Intel subsequently shipped many variants of the Pentium II, which differ chiefly in packaging, the type and amount of L2 cache they include, the processor core they use, and the FSB speeds they support. All members of the Pentium II family use the Dynamic Execution Technology and DIB architecture introduced with the Pentium Pro. Intel reduced the core voltage from the 3.3 volts used by Pentium Pro to 2.8 volts or less in Pentium II processors, which allows them to run much faster while using less power and producing less heat. In effect, you’re not far wrong if you think of Pentium II, Celeron, and Pentium III processors as faster versions of the Pentium Pro with MMX (or the enhanced SSE version of MMX) added, and with the following major changes:
- L2 cache
The Pentium Pro taught Intel the folly of embedding the L2 cache onto the CPU substrate itself, at least for the then-current state of the technology. Early Pentium II family processors use discrete L2 cache
Static RAM
(SRAM) chips that reside within the CPU package but are not a part of the CPU substrate. Advances in fabrication technology have allowed Intel again to place L2 cache directly on the processor substrate on later Pentium II family processor models. Some Pentium II family processors run L2 cache at full processor speed, while others run it at half processor speed. The least expensive Pentium II family processors have no L2 cache at all. The L2 cache in later members of the Pentium II family is improved not just in size and/or speed, but in functionality. The most recent Pentium III processors, for example, use an8-way set associative cache
, which is more efficient than the caching schemes used on earlier variants.- Packaging
The Pentium Pro used the huge, complicated 387-pin
Dual Pattern Staggered Pin Grid Array
(DP-SPGA) Socket 8. The extra pins provide data and power lines for the on-board L2 cache. Intel developed simplified alternative packaging methods for various members of the Pentium II family processors, which are described below.- Improved 16-bit performance
High cost aside, the major reason the Pentium Pro was never widely used other than in servers was its poor performance with 16-bit software. Although represented as a 32-bit operating system, Windows 95/98 still contains much 16-bit code. Users quickly discovered that Windows 95 actually ran slower on a Pentium Pro than on a Pentium of the same speed. Intel solved the 16-bit problem by using the Pentium segment descriptor cache in the Pentium II.
Members of the Pentium II family include the Pentium II, Pentium II Overdrive, Pentium II Xeon, Celeron, Pentium III, and Pentium III Xeon. Each of these processors is described in the following sections.
First-generation Pentium II processors shipped in 233, 266, 300, and
333 MHz versions with the Klamath
core and a 66
MHz FSB. In mid-1998, Intel shipped second-generation Pentium II
processors, based on the Deschutes
core, that
ran at 350, 400, and 450 MHz, and used a 100 MHz FSB. Pentium II
processors have 512 KB of L2 cache that runs at half internal CPU
speed, versus 256 KB to 1 MB of full CPU speed L2 cache in the
Pentium Pro. Pentium II processors use a Single-Edge
Contact Connector
(SECC) or SECC2
cartridge, which contains the CPU and L2 cache. (The SECC cartridge
is shown in Figure 4-2.) The SECC/SECC2 package
mates with a 242-contact slot connector
,
formerly known as Slot
1
, which resembles
a standard expansion slot. Klamath-based processors run at 2.8 volts
and were built on a 0.35μ fab. Deschutes-based processors,
including all 100 MHz FSB processors and recent 66 MHz FSB
processors, run at 2.0 volts and are built on a 0.25μ fab.
Excepting FSB speed and fab process, all Slot 1 Pentium II processors
are functionally identical. As of June 2002, Pentium II processors
are still in limited distribution, but they are now considered
obsolescent.
Figure 4-2. Intel Pentium II processor in the original SECC package (photo courtesy of Intel Corporation)
For additional information about Pentium II processors, including detailed identification tables, visit http://www.hardwareguys.com/supplement/pentium-ii.html. For information about the Pentium II Overdrive processor, see http://www.hardwareguys.com/supplement/pii-overdrive.html. For information about the Pentium II Xeon processor, see http://www.hardwareguys.com/supplement/pii-xeon.html.
The Celeron was initially an inexpensive variant of the Pentium II, and, in later models, an inexpensive variant of the Pentium III or Pentium 4. Klamath-based (Covington-core) Celerons shipped in April 1998 in 266 and 300 MHz versions without L2 cache. Performance was poor, so in fall 1998 Intel began shipping modified Deschutes-based (Mendocino-core) Celerons with 128 KB L2 cache. The smaller Celeron L2 cache runs at full CPU speed, and provides L2 cache performance similar to that of the larger but slower Pentium II L2 cache for most applications. Mendocino (0.25μ) Celerons have been manufactured in 300A (to differentiate it from the cacheless 300), 333, 366, 400, 433, 466, 500, and 533 MHz versions, all of which use the 66 MHz FSB.
With the introduction of the Coppermine-core Pentium III processor,
Intel also introduced Celeron processors based on a variant of the
Coppermine
core called the Coppermine128
core. Celerons
based on this 0.18μ, 1.6v core began shipping in 533A,
566, and 600 MHz versions soon after their announcement in May 2000
and were eventually produced in speeds as high as 1.1 GHz, which
approaches the limit of the Coppermine core itself.
Coppermine128-core Celerons have half of the 256 KB on-die L2 cache disabled to bring L2 cache size to the Celeron-standard 128 KB, and use a 4-way set associate L2 cache rather than the 8-way version used by the Coppermine Pentium III. Coppermine128-core Celerons through the Celeron/766, shipped in November 2000, use the 66 MHz FSB speed. Coppermine128-core Celerons that use the 100 MHz FSB speed began shipping in March 2001, beginning with 800 MHz units and eventually reaching 1.1 GHz. Other than the differences in L2 cache size and type, processor bus speed differences, and official support for SMP, Coppermine128-core Celerons support the standard Coppermine-core Pentium III features, including SSE, described below.
Tip
Because Coppermine128 Celerons effectively are Pentium IIIs, some may be easy to overclock. For example, a Celeron/600 (66 MHz FSB) is effectively a down-rated Pentium III/900 (100 MHz FSB). During the ramp-up to Coppermine128-core Celerons, we believe that Intel recycled Pentium III processors that tested as unreliable at 100 MHz or 133 MHz as 66 MHz Celerons, although Intel has never confirmed this. Many early Coppermine128-core Celerons were not good overclockers, although that changed as production ramped up. Note, however, that overclocking Coppermine128-core Celerons is viable only for the slower 66 MHz FSB models—the Celeron/566 and /600. Attempting to overclock a faster Celeron by running it with a 100 MHz FSB would cause it to run near or over 1.1 GHz, which appears to be the effective limit of the Coppermine core itself.
In November 2001, Intel began shipping Celerons based on the latest Pentium III core, codenamed Tualatin. The first Tualatin-core Celerons ran at 1.2 GHz using the 100 MHz FSB. Intel subsequently shipped a 1.3 GHz Celeron, and finally in May 2002 shipped the 1.4 GHz Celeron, the final Tualatin-core model. These Celerons also differed from earlier models in that they include a full 256 KB L2 cache, the same as Coppermine-core Pentium III models.
Celerons have been produced in four form factors:
- Single-Edge Processor Package cartridge
All Celerons through 433 MHz were produced in
Single-Edge Processor Package
(SEPP) cartridge form, which resembles the Pentium II SECC and SECC2 package, and is compatible with the Pentium II 242-contact slot. In mid-1999 Intel largely abandoned SEPP in favor of PPGA, but they continue to sell SEPP Celerons in 400 and 433 MHz varieties. Figure 4-3 shows an SEPP Celeron.- Plastic Pin Grid Array
As a cheaper alternative to SEPP, Intel developed the
Plastic Pin Grid Array
(PPGA). PPGA processors fit Socket 370, which resembles Socket 7 but accepts only PPGA Celeron processors. All Mendocino-core Celerons are manufactured in PPGA. The Celeron/466 was the first Celeron produced only in PPGA. PPGA processors can be used in most Socket 370 motherboards, although a few accept only Socket 370 Pentium III processors. Figure 4-4 shows a PPGA Celeron.- Flip Chip Pin Grid Array
With the introduction of the Socket 370 version of the Pentium III, Intel introduced a modified version of PPGA called
Flip Chip PGA
(FC-PGA), which uses slightly different pinouts than PPGA. FC-PGA essentially reverses the position of the processor core from PPGA, placing the core on top (where it can make better contact with the heatsink) rather than on the bottom side with the pins. All Socket 370 Pentium III and Coppermine128-core Celerons (the 533A, 566, 600, and faster) require an FC-PGA compliant motherboard. FC-PGA processors physically fit older PPGA motherboards, but if you install an FC-PGA processor in a PPGA-only Socket 370 motherboard, the processor doesn’t work, although no harm is done. Figure 4-5 shows an FC-PGA Celeron.- Flip Chip Pin Grid Array 2
Tualatin-core Celerons use the FC-PGA2 packaging, which is essentially FC-PGA with the addition of a flat metal plate, called an
integrated heat spreader
, that covers the processor chip itself. Although these processors physically fit any Socket 370 motherboard, only very recent Socket 370 chipsets support the Tualatin core. Intel designates their own motherboard models that support Tualatin as “Universal” models. Other manufacturers use other terminology, but the important thing to remember is that the motherboard must explicitly support Tualatin if it is to run these processors. Figure 4-6 shows an FC-PGA2 Celeron.
Intel has produced five major variants of the PIII-based Celeron, using four packages, four cores, two bus speeds, four fab sizes, and more than 20 clock speeds. Table 4-1 summarizes the major differences between these variants.
Table 4-1. Comparison of Celeron variants
Core |
Covington |
Mendocino |
Coppermine128 |
Tualatin | |
---|---|---|---|---|---|
Package |
SECC |
SECC-2 PPGA |
FC-PGA |
FC-PGA |
FC-PGA2 |
Production dates |
1998 |
1998 - 2000 |
2000 - |
2001 - |
2001 - |
Clock speeds (MHz) |
266, 300 |
300A, 333, 366, 400, 433, 466, 500, 533 |
500A, 533A, 566, 600, 633, 667, 700, 733, 766 |
800, 850, 900, 950, 1,000, 1,100 |
1,200 |
L2 cache size |
none |
128 KB |
128 KB |
128 KB |
256 KB |
L2 cache bus width |
n/a |
64 bits |
256 bits |
256 bits |
256 bits |
System bus speed |
66 MHz |
66 MHz |
66 MHz |
100 MHz |
100 MHz |
SSE instructions | |||||
Dual CPU capable | |||||
Fabrication process |
0.35 μ |
0.25 μ |
0.18 μ |
0.18 μ |
0.13 μ |
Dual-CPU capability deserves an explanation. Although Intel never officially supported Celerons for SMP operation, the two earliest Celeron variants did in fact support dual-CPU operation. For Covington-core and SECC2 Mendocino-core Celerons, dual-CPU operation was impractical, because enabling SMP required physical surgery on the processor package—literally drilling holes in the package and soldering wires. With PPGA Mendocino-core Celerons, dual-CPU operation was eminently practical, because many dual Socket 370 motherboards were designed specifically to accept two Celerons, and no changes to the processors themselves were necessary. Beginning with the 66 MHz Coppermine128 Celerons, Intel physically disabled SMP operation in the core itself, so it is impossible to operate Coppermine- or Tualatin-core Celerons in SMP mode.
For additional information about Celeron processors, including detailed identification tables, visit http://www.hardwareguys.com/supplement/celeron.html.
The Pentium III, Intel’s final sixth-generation processor, began shipping in February 1999. The Pentium III has been manufactured in numerous variants, including speeds from 450 MHz to 1.33 GHz (Intel defines 1 GHz as 1,000 MHz), two bus speeds (100 MHz and 133 MHz), four packages (SECC, SECC2, FC-PGA, and FC-PGA2), and the following three cores:
- Pentium III (Katmai core)
Initial Pentium III variants use the
Katmai core
, essentially an enhanced Deschutes with the addition of 70 newStreaming SIMD Instructions
(formerly calledKatmai New Instructions
orKNI
and known colloquially asMMX/2
) that improve 3D graphics rendering and speech processing. They use the 0.25μ process, operate at 2.0v core voltage (with some versions requiring marginally higher voltage), use a 100 MHz FSB, incorporate 512 KB L2 cache running at half CPU speed, and have glueless support for two-way SMP. Katmai-core processors are available in SECC2 (Slot 1/SC242) and FC-PGA (Socket 370) packaging.- Pentium III (Coppermine core)
Later Pentium III variants use the
Coppermine core
, which is essentially a refined version of the Katmai core. Coppermine processors use the 0.18μ process, which reduces die size, heat production, and cost. They operate at nominal 1.6v core voltage (with faster versions requiring marginally higher voltage), are available at either 100 MHz or 133 MHz FSB, and (in most variants) support SMP. Coppermine-core processors are available in SECC2 (Slot 1/SC242) and FC-PGA (Socket 370) packaging in both 100 and 133 MHz FSB variants. Finally, Coppermine also incorporates the following significant improvements in L2 cache implementation and buffering:- Advanced Transfer Cache
Advanced Transfer Cache
(ATC) is how Intel summarizes the several important improvements in L2 cache implementation from Katmai to Coppermine. Although L2 cache size is reduced from 512 KB to 256 KB, it is now on-die (rather than discrete SRAM chips) and, like the Celeron, operates at full CPU speed rather than half. Bandwidth is also quadrupled, from the 64-bit bus used on Katmai and Mendocino-core Celeron processors to a 256-bit bus. Finally, Coppermine uses an 8-way set associative cache, rather than the 4-way set associative cache used by earlier Pentium III and Celeron processors. Migrating L2 cache on-die increased transistor count from just under 10 million for the Katmai to nearly 30 million for Coppermine, which may account for the reported early yield problems with the Coppermine.- Advanced System Buffering
Advanced System Buffering
(ASB) is how Intel describes the increase from Pentium III Katmai and earlier processors to the Coppermine from four to six fill buffers, four to eight queue entry buffers, and one to four writeback buffers. The increased number of buffers was primarily intended to prevent bottlenecks with 133 MHz FSB Coppermines, but also benefits those running at 100 MHz.- Pentium III (Tualatin core)
The most recent Pentium III variants use the
Tualatin core
, which is the last Pentium III core Intel will ever produce. Tualatin processors use the 0.13μ process, which reduces die size, heat production, and cost, and allows considerably higher clock speeds than the Coppermine core. Had it not been for Intel’s rapid transition to the Pentium 4, Tualatin-core Pentium IIIs could have been Intel’s flagship processor through 2002 and into 2003. Intel could have shipped Tualatins at ever-increasing clock speeds, beating the 0.18μ Palomino-core AMD Athlon on both clock speed and actual performance. Instead, Intel opted to compete using the Pentium 4. Intel has by their pricing mechanism effectively exiled Tualatin-core Pentium IIIs to niche status by selling fast Pentium 4 processors for much less than comparable Tualatin Pentium IIIs.Tualatins use the 133 MHz FSB, and are available in two major variants, both of which use the FC-PGA2 packaging (with Integrated Heat Spreader). The first variant, intended for desktop systems, has the standard 256 KB L2 cache. The second variant, intended for entry-level servers and workstations, has 512 KB L2 cache. Both variants are SMP-capable. Finally, Intel removed the much-hated Processor Serial Number from all Tualatin-core processors.
Table 4-2 summarizes the important differences between Pentium III variants available as of June 2002. When necessary to differentiate processors of the same speed, Intel uses the E suffix to indicate support for ATC and ASB, the B suffix to indicate 133 MHz FSB, and the EB suffix to indicate both. An A suffix designates 0.13μ Tualatin-core processors. All processors faster than 600 MHz include both ATC and ASB. Note that A-step FC-PGA processors do not support SMP. B-step and higher FC-PGA and FC-PGA2 processors support SMP, except the 1B GHz processor, which is not SMP-capable in any stepping.
Table 4-2. Intel Pentium III variants as of June 2002
1.40, 1.26,1.13 GHz |
1.33, 1.20,1.13A, 1A GHz |
1B GHz, 866, 800EB, 733, 667, 600EB, 533EB |
850, 800, 750, 700, 650, 600E |
850, 800, 750, 700, 650, 600, 550E, 500E |
866, 800, 733, 667, 600EB, 533EB |
600B, 533B |
600, 550, 500, 450 | |
---|---|---|---|---|---|---|---|---|
Package |
FC-PGA2 |
FC-PGA2 |
SECC2 |
SECC2 |
FC-PGA |
FC-PGA |
SECC2 |
SECC2 |
Process size |
0.13μ |
0.13μ |
0.18μ |
0.18μ |
0.18μ |
0.18μ |
0.25μ |
0.25μ |
FSB speed |
133 MHz |
133 MHz |
133 MHz |
100 MHz |
100 MHz |
133 MHz |
133 MHz |
100 MHz |
L2 cache size |
512 KB |
256 KB |
256 KB |
256 KB |
256 KB |
256 KB |
512 KB |
512 KB |
L2 cache speed |
CPU |
CPU |
CPU |
CPU |
CPU |
CPU |
1/2 CPU |
1/2 CPU |
SMP support | ||||||||
Processor S/N |
Warning
When Intel introduced the Pentium III in FC-PGA form, they changed Socket 370 pinouts. Those changes mean that, although an FC-PGA processor physically fits any Socket 370 motherboard, it will not run in motherboards designed for the Celeron/PPGA. Motherboards designed for FC-PGA processors are nearly all backward compatible with PPGA Celeron processors. Similarly, as with Tualatin-core Celerons, Tualatin-core Pentium IIIs operate only in late-model Socket 370 motherboards that use chipsets with explicit Tualatin support. Most motherboards designed to use PPGA Celerons or FC-PGA Coppermine-core Pentium IIIs are not compatible with Tualatin-core Pentium IIIs.
Figure 4-7 shows a
Pentium III processor in the Single-Edge Contact
Cartridge
(SECC2) package. Some early Pentium III models
were produced in the original SECC package, which closely resembles
the Pentium II SECC package shown in Figure 4-2.
Figure 4-8 shows a Pentium III processor in the
Flip Chip Plastic Grid Array
(FC-PGA) package.
Other than labeling, the Pentium III processor in the FC-PGA2 package
closely resembles the FC-PGA2 Celeron processor shown in Figure 4-6.
For additional information about Pentium III processors, including detailed identification tables, visit http://www.hardwareguys.com/supplement/pentium-iii.html. For information about Pentium III Xeon processors, visit http://www.hardwareguys.com/supplement/piii-xeon.html.
By late 2000, Intel found themselves in a conundrum. In March of that year, AMD had forced Intel’s hand by releasing an Athlon running at 1 GHz. Intel planned to release a 1.0 GHz version of their flagship processor, the Coppermine-core Pentium III, but not until much later. The Athlon/1.0G introduction was a wakeup call for Intel. They had to ship a Pentium III/1.0G immediately if they were to remain competitive on clock speed with the Athlon. One week after the Athlon/1.0G shipped, Intel shipped a Pentium III running at the magic 1.0 GHz.
The problem was that the Pentium III Coppermine core effectively topped out at about 1.0 GHz, while the Athlon Thunderbird core had plenty of headroom. For the next several months, AMD shipped faster and faster Athlons, while Intel remained stuck at 1.0 GHz. And to make matters worse, AMD could ship fast Athlons in volume, while Intel had very low yields on the fast Pentium III parts. Although 1.0 GHz Pentium IIIs were theoretically available, in reality even the 933 MHz parts were hard to come by. So Intel had to make the best of things, shipping mostly sub-900 MHz Pentium IIIs while AMD claimed the high end. Intel must have been gritting their collective teeth.
Adding insult to injury, Intel attempted unsuccessfully to ship a faster Pentium III, the ill-fated Pentium III/1.13G. These processors were available in such small volumes that many observers believed they must have been almost hand-made. Adding to Intel’s embarrassment, popular enthusiast web sites including Tom’s Hardware (http://www.tomshardware.com) and AnandTech (http://www.anandtech.com) reported that the 1.13 GHz parts did not function reliably. Intel was forced to admit this was true and withdraw the 1.13 GHz part, although they later reintroduced it successfully.
Intel had two possible responses to the growing clock speed gap. They could expedite the release of 0.13μ Tualatin-core Pentium IIIs, which have clock speed headroom at least equivalent to the Thunderbird-core and later Palomino-core Athlons, or they could introduce their seventh-generation Pentium 4 processor sooner than planned. Intel wasn’t anywhere near ready to convert their fabs to 0.13μ Tualatin-core Pentium III production, so their only real choice was to get the Pentium 4 to market quickly.
There were several problems with that course, not least of which were that the 0.18μ Willamette-core Pentium 4 was not really ready for release and that the only Pentium 4 chipsets Intel had available supported only Rambus RDRAM, which was hideously expensive at the time. But in November 2000, Intel was finally able, if only just, to ship the Pentium 4 processor running at 1.3, 1.4, and 1.5 GHz. Although many observers noted that that version of the Pentium 4 was a dead-end processor because it used Socket 423, which was due to be replaced by Socket 478 only months after the initial release, and that, despite its higher clock speed, the Pentium 4 had lower performance than Athlons running at lower clock speeds, the Pentium 4 did at least allow Intel to regain the clock speed crown, an inestimable marketing advantage.
Despite all that, the seventh-generation Pentium 4 (shown in Figure 4-9) is the most significant new Intel processor since the original Pentium Pro, which kicked off the sixth generation. The Pentium 4 is significant not so much for what it is now as for what it will become. Just as Intel scaled the clock speeds of sixth-generation cores from the 120 MHz of the first Pentium Pro to the 1.2+ GHz of the last Pentium III, we expect that they will scale the clock speed of the Pentium 4 by an order of magnitude or more, eventually reaching 10 GHz to 15 GHz before (we presume) introducing the Pentium 5.
With the Pentium 4, Intel has launched the fastest ramp-up in their history. In earlier generations, new processors coexisted with older processors for quite some time. Intel continued to derive substantial revenues from the 386 long after the 486 shipped, from the 486 long after the Pentium shipped, and from the Pentium long after the Pentium II shipped. With the Pentium 4, they’ve abandoned that sequence. Intel wants to kill their sixth-generation processors as quickly as possible, leaving the Pentium 4 and its derivatives as the only mainstream Intel processors.
Relative to sixth-generation processors, the Pentium 4 incorporates the following architectural improvements, which together define the seventh generation and which Intel collectively calls NetBurst Micro-architecture.
- Hyper Pipelined Technology
Hyper-pipelining doubles the pipeline depth compared to the Pentium III micro-architecture. The branch prediction/recovery pipeline, for example, is implemented in 20 stages in the Pentium 4, as compared to 10 stages in the Pentium III. Deep pipelines are a double-edged sword. Using a very deep pipeline makes it possible to achieve very high clock speeds, but a deep pipeline also means that fewer instructions can be completed per clock cycle. That means the Pentium 4 can run at much higher clock speeds than the Pentium III (or Athlon), but that it needs those higher clock speeds to do the same amount of work.
Early Pentium 4 processors were roundly condemned by many observers because they were outperformed by Pentium III and Athlon processors running at much lower clock speeds, which is solely attributable to the relative inefficiency of the Pentium 4 in terms of instructions per cycle (IPC). Ultimately, the low IPC efficiency of the Pentium 4 won’t matter, because Intel can easily boost the clock speed until the Pentium 4 greatly outperforms the fastest Pentium III or Athlon that can be produced. What superficially appears to be a weakness of the Pentium 4 is in fact its greatest strength.
- Improved Branch Prediction
The deep pipeline of the Pentium 4 made it mandatory to use a superior Branch Prediction Unit (BPU), because a deep pipeline with anything less than excellent branch prediction would bring the processor to its knees. When the pipeline is very deep, a pipeline clog wastes massive numbers of clock ticks, and the function of a BPU is to prevent that from happening. The Pentium 4 BPU is the most advanced available, 33% more efficient at avoiding mispredictions than the Pentium III BPU or the comparable Athlon BPU. The Pentium 4 BPU uses both a more effective branch prediction algorithm and a dedicated 4 KB branch target buffer that stores detail about branching history to achieve these results. The improved BPU is one component of the Advance Dynamic Execution (ADE) engine, Intel’s name for their very deep, out-of-order speculative execution engine.
- Level 1 Execution Trace Cache
In addition to the standard Level 1 8 KB data cache, the Pentium 4 includes a 12 KB L1 Execution Trace Cache. This cache stores decoded micro-op instructions in the order they will be executed, optimizing storage efficiency and performance by removing the micro-op decoded from the main execution loop and storing only those micro-op instructions that will be needed. By caching micro-op instructions before they are needed, the Execution Trace Cache ensures that the processor execution units seldom have to wait for instructions, and that the effects of branch mispredictions are minimized.
- Rapid Execution Engine
Even with an excellent BPU, integer code is more likely than floating-point code to be mispredicted, and such mispredictions have a catastrophic effect on throughput. To minimize their effect, the Pentium 4 includes two
Arithmetic Logic Units
(ALUs) that operate at twice the processor core frequency. For example, the Rapid Execution Engine on a 2 GHz Pentium 4 actually runs at 4 GHz. That allows a basic integer operation (e.g., Add, Subtract, AND, OR) to execute in half a clock cycle.- 400 or 533 MHz System Bus
One Achilles’ heel of the Pentium III (and, to a lesser extent, the Athlon) is the relatively slow link between the processor and memory. For example, using PC133 SDR-SDRAM, the Pentium III achieves peak data transfer rates of only 1,067 MB/s (133 MHz times 8 bytes/transfer). In practice, sustained data transfer rates are lower still because SDRAM is not 100% efficient and the SDRAM interface uses only minimal buffering. Conversely, the Pentium 4 has the fastest system bus available on any desktop processor. Although the bus actually operates at only 100 or 133 MHz, data transfers are quad-pumped for an effective bus speed of 400 or 533 MHz. Also, Intel uses elaborate buffering that ensures sustained true 400 or 533 MHz data transfers when using Rambus RDRAM memory. Sustained data transfer rates using SDR-SDRAM or DDR-SDRAM are smaller than peak transfer rates, but are still superior to the data transfer rates of the Pentium III or Athlon using similar memory.
In addition to its new features, the Pentium 4 also has two features that have been significantly enhanced relative to the Pentium III:
- Advanced Transfer Cache
Intel has enhanced the performance of the L2 Advanced Transfer Cache (ATC) that first appeared in the Pentium III. The Pentium 4 uses a non-blocking, 8-way set associative, inclusive, full-CPU-speed, on-die, L2 cache with a 256-bit interface that transfers data during each clock cycle. Because the Pentium 4 clock is faster than that of the Pentium III, L2 cache transfers also support a much higher data rate. For example, a Pentium III operating at 1 GHz transfers L2 cache data at 16 GB/s whereas a Pentium 4 at 1.5 GHz transfers L2 cache data at 48 GB/s (three times the transfer rate for a processor operating at 1.5 times the speed). The ATC also includes improved Data Prefetch Logic that anticipates what data will be needed by a program and loads it into cache before it is needed. Willamette-core Pentium 4 processors have a 256 KB L2 cache. Northwood-core Pentium 4 processors have a 512 KB L2 cache.
- Enhanced floating-point and SSE functionality
The Pentium 4 uses 128-bit floating-point registers and adds a dedicated register for data movement. These enhancements improve performance relative to the Pentium III on floating-point and multimedia applications. The Pentium 4 also includes SSE2, an updated version of the SSE that debuted with the Pentium III. SSE, which stands for Streaming SIMD Extensions, is an acronym within an acronym. SIMD, or Single Instruction Multiple Data, allows one instruction to be applied to a multiple data set, e.g., an array, which greatly speeds performance in such applications as video/image processing, encryption, speech recognition, and heavy-duty scientific number crunching. SSE2 adds 144 new instructions to the SSE instruction set, including 128-bit SIMD integer arithmetic operations and 128-bit SIMD double-precision floating-point operations. These new instructions can greatly reduce the number of steps needed to execute some tasks, but the catch is that the application software must explicitly support SSE2. For example, an application that is not designed to use SSE2 might run at the same speed on a Pentium 4 and an Athlon, while an SSE2-capable version of that application might run literally twice as fast on the Pentium 4.
Intel produces Pentium 4 processors using two cores—the
0.18μ Willamette core
and the
0.13μ Northwood core
—and two
form factors, the 423-pin PGA-423 and the smaller 478-pin mPGA-478.
Willamette-core processors were produced in both PGA-423 and mPGA-478
at core speeds of 1.30, 1.40, 1.50, 1.60, 1.70, 1.80, 1.90, and 2
GHz, all with 256 KB of L2 cache. Northwood-core Pentium 4 processors
are produced only in mPGA-478, initially at 2 and 2.20 GHz core
speeds, with faster versions planned. Intel also produces
Northwood-core processors slower than 2 GHz. Northwood-core
processors have 512 KB L2 cache.
The Willamette core and PGA-423 were stopgap solutions, released solely to combat AMD’s clock speed lead until the “real” Pentium 4—the mPGA-478 Northwood-core processor—could be shipped. Although Intel originally intended to phase out PGA-423 as a mainstream technology by late 2001, in the process relegating PGA-423 to upgrade status only, the very strong demand for mPGA-478 motherboards and processors caused shortages throughout 2001 and into 2002. We expect that PGA-423 and Willamette parts will continue to be sold in new systems until mid-2002. That does not change the fact that PGA-423 and Willamette are dead-end technologies, and should be avoided. Do not buy any Pentium 4 system or components that do not use mPGA-478 parts and the Northwood core.
For additional information about Pentium 4 processors, including detailed identification tables, visit http://www.hardwareguys.com/supplement/pentium-4.html. For information about Xeon processors, visit http://www.hardwareguys.com/supplement/p4-xeon.html.
In May 2002, Intel shipped Celeron processors based on the Pentium 4. These processors, which we call the Celeron 4, use standard mPGA478 packaging and fit Socket 478 motherboards. Not all older Socket 478 motherboards support the Celeron 4, and those that do require a BIOS upgrade. The first Celeron 4 models use a modified 0.18 Pentium 4 Willamette core called the Willamette128 core, which has L2 cache halved to 128 KB. Intel shipped the 1.7 GHz Celeron 4 initially, with the 1.8 GHz model following in June 2002. We expect Intel to ship 1.9 and 2.0 GHz Willamette128 Celeron 4 models later in 2002, followed by faster models based on the Northwood core with L2 cache reduced from the 512 KB Northwood standard to 256 KB.
Initial testing shows that even the 1.7 GHz Celeron 4 outperforms the fastest available AMD Duron, so we expect forthcoming Celeron 4 models to be excellent choices for building fast, inexpensive entry-level systems on the Intel 845G and 845GL platforms.
Tip
Intel has manufactured mobile variants of many of their processors, including the Pentium, Pentium II, Celeron, and Pentium III. These mobile versions are used in notebook computers and are not user-replaceable, so for all intents and purposes a notebook computer will always use the processor that was originally installed. For that reason, we have chosen to devote our available space to issues that are more likely to be important to more of our readers. For additional information about Intel mobile processors, visit http://developer.intel.com/design/mobile/.
Get PC Hardware in a Nutshell, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.