44 IBM eServer zSeries 900 Technical Guide
Figure 2-11 12-PU MultiChip Module
2.4.8 PU design
The z900 servers utilize two types of PU chips:
CMOS 8S (0.18 micron) with Copper Interconnect technology running at 769 MHz,
resulting in a cycle time of 1.3 nanoseconds.
CMOS 8SE (0.18 micron) with Copper Interconnect and Silicon-On-Insulator (SOI)
technologies running at 917 MHz, resulting in a cycle time of 1.09 nanoseconds.
Each PU chip is 17.9 x 9.9 mm and has 47 million transistors (Figure 2-12).
The z900 turbo models (2C1 to 2C9 and 210 to 216) use the CMOS 8SE chips, and the z900
non-turbo models (100, 101 to 116 and 1C1 to 1C9) use the CMOS 8S chips.
The 64-bit z/Architecture is supported by 139 new opcodes, 16x64-bit General Purpose
Registers (GPRs), and translation. It also has 34 new opcodes for the ESA/390 architecture.
Storage Control Chip
Storage Data Chip
Chapter 2. zSeries 900 system structure 45
Figure 2-12 Processing Unit
Each PU has a 512 KB on-chip Cache Level 1 (L1), which is now split into a 256 KB L1 cache
for instructions and a 256 KB L1 cache for data, providing greater bandwidth.
Decimal performance improvements have been made through a 64-bit versus a 32-bit adder
along with hardware BCD Divide. Non-destructive 64-bit shifts have been designed to
Dual processor design
The z900 servers use the same dual processor design (Figure 2-13) introduced on 9672 G6
MultiChip Module (MCM)
CMOS 8S on Standard Models
Copper Interconnect Technology
Cycle time = 1.3 ns
CMOS 8SE on Turbo Models
Copper and SOI Technologies
Cycle time = 1.09 ns
47 million transistors
512 KB L1 cache:
Data = 256 KB
Instructions = 256 KB
(17.9 x 9.9 mm)
Processing Unit (PU) Chip
PU12 PU13 PU9 PU8
46 IBM eServer zSeries 900 Technical Guide
Figure 2-13 Dual processor design
Each PU has a dual processor and each processor has its own Instruction Unit (I-Unit) and
Execution Unit (E-Unit), which includes the Floating Point function. The instructions are
executed in parallel on each processor and compared after processing.
This design simplifies error detection during instruction execution, saving additional circuits
and extra logic required to do this checking. The z900 servers also contain error checking
circuits for data flow parity checking, address path parity checking and L1 cache parity
Compression Unit on a chip
In the G5/G6 servers, hardware compression was made by microcode. The z900 PUs have a
new unit, the Compression Unit, on the chip. This new implementation provides better
hardware compression performance, allowing 2 to 3 times fewer cycles than the G6 servers.
Processor Branch History Table (BHT)
The Branch History Table (BHT) implementation on processors has a key performance
improvement effect. It was part of the IBM ES/9000 9021 design and its first CMOS
implementation was on 9672 G5 servers.
For 9672 G6 servers the hardware algorithm was further enhanced for very tight loops. On
z900 servers the BHT is 4 times larger than on G6 servers, having 4 entries of 2 KB each.
The z900’s BHT is multiported and has significant branch performance benefits. BHT allows
each CP to take instruction branches based on a stored BHT, which improves processing
times for calculation routines. Using a 100-iteration calculation routine as an example
(Figure 2-14), hardware preprocesses branch
99 times without the BHT. With BHT,
it preprocesses branch
Processing Unit (PU)
Floating Point function
Simple yet complete error detection
Data flow - parity checked
Address paths - parity checked
L1 Cache - parity checked
Processor logic (I - E - F) -
duplicated, then compared
Error detection for mis-compare
To B-Uni t To B-Unit
From E-Unit From E-Unit
To L2 Cache
From L2 Cache