A CLI Implementation in Shared Source: Rotor

In the summer of 2001, a small team of developers in Redmond announced plans for a Microsoft rarity: a freely-available software distribution containing modifiable, redistributable, source code. This distribution, named the Shared Source CLI (SSCLI, also known affectionately by its code name, “Rotor”), was to contain a fully-functional CLI execution engine, a C# compiler, essential programming libraries, and a number of relevant developer tools. It had been quietly under development alongside the commercial .NET framework and represented an important facet of Microsoft’s developer tool strategy. In particular, the SSCLI had three goals to meet: to validate the portability of the CLI standard, to help people learn about and understand Microsoft’s commercial CLR offering, and to stimulate long-term academic interest in the CLI. Above all else, the SSCLI was to match the ECMA standard so that anyone who wished to understand or implement this standard would have a guide.

Although the SSCLI is nominally the subject of this book, the CLI standard is its heart. The SSCLI helps us illustrate how and why the CLI is such an interesting piece of work. The distribution itself is a large body of code, and as such, it can provide a significant leg up for researchers and experimenters working in the area of developer tools or systems design, as well as those teaching computer science. This book attempts to act as a top-level guide to the code for such people, giving information beyond the theory of the CLI to facilitate hacking and to explain the conventions of the code base. The CLI standard will be important for years to come, and there is no better way for you to understand it fully than by browsing, building, observing, and tweaking a running implementation.

While Rotor demonstrates one way to build a portable, programming language-independent version of the CLI standard, it is certainly not the only way. Alternate implementations exist at the time of writing, including two from Microsoft (the commercial .NET Framework and a version for the small devices that is called the “Compact Framework “), and two third-party, open source implementations, one from Ximian (called Mono) and one from the DotGNU project (called Portable.NET). Rotor itself, to provide additional developer tools and facilities, implements more than just the standard. To clarify what is contained in the distribution, Figure 1-3 contains a pictorial representation of the differences between Microsoft’s commercial offering (.NET CLR), the CLI and C# specifications, and Rotor.

The SSCLI, as shown in Figure 1-3, is a superset of the CLI standard, and the Microsoft commercial offering is, in turn, a superset of the SSCLI.

Rotor is a large collection of code built by many people over a number of years, and because of this, it is complex and stylistically variable. In terms of scale, it is comparable to the largest familiar source code distributions such as XFree86, Mozilla, and OpenOffice. As with these distributions, getting started in the code can be an intimidating prospect. This book will help make this task easier, beginning with this brief tour of the distribution itself.

Components of the Shared Source CLI distribution

Figure 1-3. Components of the Shared Source CLI distribution

The SSCLI is built using a combination of C++ and C#, with a smattering of assembler for processor-specific details. The distribution is built as a three step process. First, a platform-specific C++ compiler is used to build a Platform Adaptation Layer (PAL), which is a library that hides the differences between operating system APIs behind a single set of programming abstractions. After this, a set of build tools (including the C# compiler) that are needed to build the SSCLI are built and linked against the PAL library. Finally, the rest of the distribution is built using these tools and the PAL.

Table 1-1 lists some of the interesting subdirectories to visit in the SSCLI source code, which is on the CD that accompanies this book (it can also be downloaded from http://msdn.microsoft.com/net/sscli).

Table 1-1. Important subdirectories of the distribution and their contents

Subdirectory

Contents

/build

Contains built executables and libraries

/clr/src

Home to many core subdirectories

/bcl

The base class libraries, written in C#

/csharp

A C# compiler, written in C++

/classlibnative

Programming libraries implemented in C++

/debug

Support for managed debugging

/dlls/mscorsn

Strongname crypto code

/fjit

The SSCLI JIT compiler

/fusion

Code for locating versioned files

/ilasm

A CIL assembler

/ildasm

A CIL disassembler

/inc

Shared include files

/md

Metadata facilities

/toolbox/caspol

Source to the caspol security utility

/tools

Home to many-utility programs

/clix

The SSCLI managed executable launcher

/gac

Source to the gacutil cache utility

/peverify

The peverify CIL verification utility

/sos

The SOS debugging extension library

/strongname

The sn code-signing utility

/vm

The CLI execution engine

/docs

Documentation

/fx/src

Home to additional managed libraries

/net/system/net

The networking library

/regex/system/text

The regular expressions library

/jscript

A complete JScript compiler that compiles to CIL code, written in C# (a managed managed code compiler!)

/managedlibraries/remoting

Additional remoting support to what is found in the bcl directory

/pal

Multiple operating system-specific implementations of the PAL

/palrt

Low-level APIs that support the SSCLI implementation but are not operating system-specific

/samples

Sample programs that use the CLI

/tests

Extensive tests and test infrastructure

/tools

Tools used to build the SSCLI distribution

The subdirectories can be divided into four distinct conceptual areas, as follows:

  • The CLI execution engine

  • Component frameworks that both wrap and extend the execution engine

  • A portability layer (the PAL) used to move from one operating system to another

  • Tools, tests, compilers, documentation, and utilities for working with managed code

Let’s examine each of these areas in turn, focusing on where to find their implementation.

The CLI execution engine

The execution engine is the heart of the CLI. It contains the component model, as well as runtime services, such as exception handling, and automatic heap and stack management. In many respects, this is the big kahuna; it is the code that we refer to when we speak of “the runtime” or “the virtual execution environment.” JIT compilation, memory management, assembly and class loading, type resolution, metadata parsing, stack walking, and other fundamental mechanisms are implemented here. This code can be found in sscli/clr/src and in the four directories vm, fjit, md, and fusion, in which the bulk of the execution engine resides.

Many libraries typically combine to run managed code

Figure 1-4. Many libraries typically combine to run managed code

The execution engine, as shown in Figure 1-4, is built as a set of dynamically loadable libraries rather than as a standalone executable. The clix program launcher (or any program that wishes to use the services of the execution engine) loads the main shared library, sscoree, to create an instance of the CLI in process and then feeds this instance a start-up assembly to be executed. As a result, there is no main in the execution engine; it is packaged to be hosted by other programs. The execution engine depends on a number of other shared libraries, which include libraries that are broken because they are replaceable, such as the crypto code necessary to load and build signed assemblies that is located in mscorsn, as well as libraries that are potentially useful in many different places, such as the PAL, which can be found in rotor_pal and rotor_palrt. Finally, code that may not always be needed is also packaged into separately loaded libraries, such as mscordbc, which implements debugger support.

Programming libraries in the CLI

The shared infrastructure of the CLI includes not only standardized, low-level capabilities such as metadata, the common intermediate language, and the common type system, but also high-level, productivity-oriented class libraries . The contents of these libraries are briefly summarized by functional area in Table 1-2.

Table 1-2. High-level elements included in CLI standard libraries

Category

Facilities

Productivity libraries

Text formatting, regular expressions, collections, time, dates, file and network IO, configuration, diagnostics, globalization, isolated storage, XML

Execution engine libraries

Isolation domains, asynchronous callbacks, stackwalks, stack traces, garbage collector, handles, environment, threads, exceptions, monitor-based synchronization, security, verification, reflection, serialization, interop with native code

Type-related libraries

Primitive types, value types, delegates, strings, arrays

Extended numerics library

Decimal numbers, double and single precision floating point numbers, math

Programming language support

Compiler services, custom metadata attributes, resource reclamation

These libraries provide an interface to the facilities of the underlying operating system but in a way that has been tailored to exploit the services and conventions of the CLI, increasing programmer productivity through their consistency and quality.

These APIs also serve another, less obvious role: they facilitate component integration by exposing programming services and conventions that will promote good component hygiene through their use. Services that minimize the amount of bookkeeping that is necessary for component builders to implement, or that minimize the need for complex intercomponent management protocols, make for smoother and safer integration (and less code to write). The less a component needs to rely on other components and the fewer things that a component must do on behalf of other components, the more likely an application will be bug-free, simple to read, and robust. To realize the true promise of component-based software, components need to be built to rely on managed execution within an environment designed with these principles in mind.

One might think of the CLI libraries as a modern equivalent to the C runtime library. They do not attempt to provide all things to all programmers; instead, they are a core set of components for which nearly every programmer will find a use. Since the base libraries, found in sscli/clr/src/bcl, are specified to be part of any CLI implementation, they form a basis for portable application implementations. Additional libraries, found in the sscli/fx, sscli/clr/src/classlibnative, and sscli/managedlibraries directories, are either optional standard libraries or specific to the SSCLI. At this point in time, all of the libraries in the SSCLI are also found in the commercial Microsoft .NET Frameworks.

Note

Explorers of the programming libraries will find that, besides the documentation found in sscli/docs that is specific to the Rotor distribution and to its utilities, there is a separately downloadable file archive (which is also contained on the CD for this book) containing documentation for the class libraries. This documentation is derived from the documentation used in the Microsoft .NET Framework SDK, although it has been edited and converted to simple HTML files.

The Platform Adaptation Layera

The PAL is an interesting piece of software with more uses than might meet the eye at first glance. Of course, as is typical of any adaptation or driver layer in a large piece of code that is meant to run on many operating system platforms, the first goal of the PAL was to isolate implementers from the details of various operating systems. The choice in the case of the SSCLI was obvious: since it had started as Win32-specific code, the PAL was designed to present a subset of the Win32 API (which can be seen in sscli/pal/rotor_pal.h). This implementation is by no means complete, as it needs to provide only the calls that are actually made by the CLI. Do not attempt to use the PAL as a general Win32 emulation layer, because it is incomplete!

The PAL is, of course, the place where the work to bring Rotor to new platforms would begin, since the tools that are used to build Rotor depend on the PAL for their operating systems, resources. To see what is involved, examine the sscli/pal/unix directory. There is a significant amount of work having to do with providing a common exception-handling mechanism, common threading, a shared handle manager, IO, synchronization, debugging, and more. Specialized host processes, such as web servers or databases, might very well have their own similar runtime needs, which might need to take the semantics of the PAL into consideration. Because of this and because the PAL defines how operating system resources are used, understanding the various PAL implementations will be important for many people.

In addition to the PAL, there is a directory named sscli/palrt/src, which contains a library implementation of Win32 APIs that are needed by the SSCLI but are not dependent on the operating system for implementation. This library also includes a small number of PAL-specific APIs. It is a true hodgepodge of facilities, but to give it flavor, it contains decimal arithmetic, a stub implementation of some of the Microsoft COM component model, array-handling, memory management, and numerous other utility functions.

The most interesting aspect of the PAL has to do with execution engine control. The SSCLI is designed to run cooperatively with native code within native processes, which means that many operating system calls need to be caught to give the execution engine a chance to maintain bookkeeping information for the use of runtime systems, such as the garbage collector or the security system. This is a critical use of the PAL layer; the SSCLI implementation is built in terms of the abstractions that are presented by the PAL and without them, it could not maintain isolation, security, and control. For example, both threading and exception handling are implemented in the PAL and both of these are critical to the execution engine at runtime, since it uses exception frames to track managed code and the stacks associated with threads to store diffuse structures that hold the state of many of its services. Details of this aspect of the PAL will be covered at length in Chapter 6, while the PAL’s design itself is the topic of Chapter 9.

Tools, compilers, tests, documentation, and utilities

A significant percentage of the code in Rotor consists of support infrastructure that is used to build, test, and use its CLI implementation. The PAL, which we have just discussed, is such code. There are numerous additional developer tools, utilities, and test programs that can be found in various spots within the distribution. These fall into the broad categories of utilities for managed development and utilities for building the distribution.

As far as managed development goes, many of the tools in the Rotor distribution will be familiar to any programmer who has spent time with the SDK for the Microsoft .NET Framework because the two implementations share their basic set of utilities, such as linker, assembler, and disassembler. The sscli/clr/src, sscli/clr/src/tools, and sscli/clr/src/toolbox directories contain directories for these utilities, as well as for utilities that are unique to developing and running managed code with the SSCLI, such as clix.exe. Programmers should consult the documentation in sscli/docs to see whether features are shared between the Rotor version of a utility and its .NET Framework counterpart; not all features were ported.

The build system used to bootstrap Rotor can be found in sscli/tools. These tools are built against the PAL and are used to track dependencies, drive the build process, and assemble the libraries and executables, once built, into the sscli/build directory. Dependencies in Rotor are convoluted, as they are with most large projects, and so these tools are quite important. To understand how they are used and how developers should interact with them when modifying code, see sscli/docs/buildtools directory.

Once the SSCLI is built, it can be tested by using the tests in the sscli/tests directory. Of particular note are the PAL tests, found in sscli/tests/palsuite, which can be used to verify new PAL implementations or changes to an existing PAL, and the developer Build Verification Tests (BVT) found in sscli/tests/bvt, which can be used to check work being done in the execution engine. There are also tests for other areas such as the base class libraries; most of these, along with the BVTs, use the test harness found in sscli/tests/harness and documented in sscli/docs/testing_overview.html.

Documentation and technical notes for Rotor can be found in sscli/docs. This directory contains material that is useful for browsing the sources, for modifying code, and for understanding both the architecture of the CLI and the specific implementation choices that were made when building the SSCLI. There is also a detailed specification included for the PAL that would be very useful to anyone porting Rotor to new platforms. It is well worth taking some time to browse this directory.

Scoping This Book

This book focuses on how the CLI component model and its underlying execution engine are implemented in the SSCLI. The requirements that the resulting mechanisms place on the operating system, and general porting issues, are briefly discussed. Discussions of compilers, languages, and frameworks, however, are lacking, as well as non-component-oriented uses of the CLI, which fortunately can be found in the numerous other books on the .NET Framework and the CLI. Size and scope, along with the fact that the authors wanted to actually see this book in print, dictated the more focused approach.

A disclaimer is also called for: the numerous C++ samples in this book taken from the SSCLI source code have been considerably cleaned up, becoming pseudo-code in the process. This was done to remove ugly macros, error-handling, and asserts that pepper the Real Code, and to make the code more readable. If you are planning to add to or modify the SSCLI code, you should be aware of the invariants that must be maintained and adopt the same programming conventions and error handling methods used by the developers of the SSCLI. See Appendix D for a short description of these requirements.

Summary

The CLI is the first virtual execution environment designed from the ground up to be shared by many different programming languages. Platform providers, framework builders, and programmers are not forced into all-or-nothing language decisions just to take advantage of the facilities that make component-based computing work, such as exceptions, garbage collection, reflection, code access security, and data-driven extensibility. Using the CLI, it is easy to incorporate preexisting code into component-based programming efforts, which results in increased interoperability and shared infrastructure.

The CLI’s standardized format for packaging, describing, and deploying components is tied to neither operating system nor implementation language. This is important because this format forms the foundation for the CLI’s data-driven architecture. Data-driven mechanisms increase programmer productivity because they enable diverse programs, libraries, and tools to interact seamlessly and to evolve over time. A data-driven component model is as future-proof as today’s technology allows.

The abstract instruction set and the type system that outline the CLI’s virtual execution model offer a tempting glimpse of the Holy Grail: software that runs everywhere. The designers of the CLI certainly anticipated a world in which multiple implementations and multiple versions of their standard would run both side-by-side and on many platforms. Yet in this world, each implementation is likely to expose unique frameworks, services, utilities, tools, or language features that augment the basic capabilities, using the CLI’s excellent support for interoperability. What will result is akin to C language development, in which one rarely finds significant applications built on top of the standard runtime alone. Instead, applications judiciously combine standard facilities with either platform-specific libraries or libraries designed specifically for cross-platform use. Most significant CLI programs will combine standard components with either platform-specific components or third-party components designed specifically for cross-platform use.

The CLI’s language-agnostic approach, its data-driven architecture, and its virtual execution model were developed to create an arena in which components could cooperate effectively without sacrificing their security and autonomy. Its unfolding chain of metadata creates an environment in which it is possible to reason about the behavior of components and inject safeguards into their code before running them. Each stage in the CLI’s execution model involves receiving data from the prior stage and transforming or augmenting it before passing it on to another stage. This book describes this entire chain of stages and the execution engine in which they are implemented, from its initial bootstrap sequence to the death of its last managed resource.

Get Shared Source CLI Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.