Microsoft® Exchange Server 2003 Scalability with SP1 and SP2

8 1.3 RAMS

1.3 RAMS

RAMS was an abbreviation used by my team when we delivered the very

ﬁrst Exchange 2000 Academy program in September 1999. After spending

several months working with the Microsoft engineering team and develop-

ing the material, Donald Livengood of Hewlett Packard (HP) came up with

the RAMS acronym, which is derived from reliability, availability, manage-

ability, and scalability. These four key features are well implemented and

represented by many key functions of the product.

In fact, since the release of Microsoft Exchange 2000 in late 2000, and

later with Exchange 2003, many deployments managed to beneﬁt from the

RAMS features of the product, described further in this section. More

importantly, Exchange 2003 managed to grow out of the features from the

Windows Server 2003 environment, and, for each service pack or major

release, improves on the reliability, availability, manageability, and scalability.

Had the journey ended just yet? Deﬁnitely not—Microsoft gets to deal

with a legacy of Microsoft Exchange rollouts; even for future versions of the

product, we will see improvement in each of these areas. Some require fun-

damental changes at the operating system level, application level, or hard-

ware components level. As each mature, the end solution and user

experience improves.

1.3.1 Reliability

The goal of reliability in Microsoft Exchange 2003 is to perform service

functions under stated conditions within a given time period. Microsoft

Exchange has often suffered from database corruption errors that could be

caused by faulty hardware or software components, imposing on the

administrators a long and painful recovery process involving part or all of

the server and, always, the entire corrupted database. When Exchange 5.5

introduced unlimited storage, it also left the door open to unlimited prob-

lems: restoring a 16-GB database ﬁle takes much less time than restoring a

250-GB database ﬁle—time during which the users do not have access to

their mail service. Some deployments today have to deal (suffer?) with +1.5-

TB Information Stores, which cannot be repaired rapidly and for which

backup and recovery are painful.

To the extent possible, Microsoft improved the core database engine uti-

lized in Microsoft Exchange—Extensible Storage Engine (ESE)—to pre-

vent any malformed database pages from being stored on disk. Exchange

4.0 introduced the notion of database pages whose content could be vali-

1.3 RAMS 9

Chapter 1

dated by a simple checksum calculated over the 4-KB block making up a

database page. If the checksum stored with the page differed from the

checksum calculated after reading the page, the database was considered

corrupted, and the database was ﬂagged as bad. With Exchange 2003 SP2,

the checksum can recover information: a single-bit ﬂip can be recovered by

using an error-correcting checksum algorithm (instead of using a simple

error detection algorithm).

In a situation with page-level corruption, you have two choices:

1. Run the ESEUTIL tool to remove the invalid pages;

2. Restore the last known good database from backup and play back

the intermediate transactions stored in the transaction log ﬁles.

In fact, neither of these two solutions is very satisfactory, especially the

ﬁrst, since it could lead to irreversible loss of data—which is impermissible

in modern infrastructures.

Microsoft worked hard to reduce the likelihood of software-based cor-

ruption. Today, virtually all page corruptions are due to faulty hardware: the

component at fault could be a disk, a controller, or an interconnect ele-

ment, such as a host-bus adapter, or ﬁbre channel link (just like software,

hardware and ﬁrmware have bugs, too!). Some of these components have

their own built-in recovery mechanisms, and with Exchange 2003 SP2,

very few page-level corruptions are occurring; this is one thing the

Microsoft Exchange administrator does not have to worry about anymore!

In conjunction, hardware manufacturers have vastly improved the reli-

ability of storage infrastructures, especially when these are put under stress

load or abnormal activities (for example, a RAID5 volume rebuild). They

didn’t really wait for Microsoft to do this, but as RAID and multidisks vol-

umes became more utilized, a signiﬁcant effort and investment was put

forth to ensure that volume protection was actually efﬁcient. In addition,

the notion of checksum has been extended to the transaction log records

(preventing you from playing back a corrupted transaction into the data-

base) and, with SP1, to the streaming store.

Unfortunately, we are still lacking the tools needed to recover data from

corrupted transaction log ﬁles. This can be an issue because if for some rea-

son your database needs to be recovered and transactions played back, and

if the transaction logs are corrupted halfway through the replay, you essen-

tially have lost data. I will remind you several times in this book: protecting

Get Microsoft® Exchange Server 2003 Scalability with SP1 and SP2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Microsoft® Exchange Server 2003 Scalability with SP1 and SP2 by Pierre Bijaoui

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly