Chapter 1. Why Rust?
In certain contexts—for example the context Rust is targeting—being 10x or even 2x faster than the competition is a make-or-break thing. It decides the fate of a system in the market, as much as it would in the hardware market.
All computers are now parallel...
Parallel programming is programming.Michael McCool et al., Structured Parallel Programming
TrueType parser flaw
used by nation-state attacker for surveillance;
all software is security-sensitive.
Systems programming languages have come a long way in the 50 years since we started using high-level languages to write operating systems, but two problems in particular have proven difficult to crack:
-
It’s difficult to write secure code. It’s especially difficult to manage memory correctly in C and C++. Users have been suffering with the consequences for decades, in the form of security holes dating back at least as far as the 1988 Morris worm.
-
It’s very difficult to write multithreaded code, which is the only way to exploit the abilities of modern machines. Even experienced programmers approach threaded code with caution: concurrency can introduce broad new classes of bugs and make ordinary bugs much harder to reproduce.
Enter Rust: a safe, concurrent language with the performance of C and C++.
Rust is a new systems programming language developed by Mozilla and a community of contributors. Like C and C++, Rust gives developers fine control over the use of memory, and maintains a close relationship between the primitive operations of the language and those of the machines it runs on, helping developers anticipate their code’s costs. Rust shares the ambitions Bjarne Stroustrup articulates for C++ in his paper “Abstraction and the C++ Machine Model:”
In general, C++ implementations obey the zero-overhead principle: What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.
To these Rust adds its own goals of memory safety and trustworthy concurrency.
The key to meeting all these promises is Rust’s novel system of ownership, moves, and borrows, checked at compile time and carefully designed to complement Rust’s flexible static type system. The ownership system establishes a clear lifetime for each value, making garbage collection unnecessary in the core language, and enabling sound but flexible interfaces for managing other sorts of resources like sockets and file handles. Moves transfer values from one owner to another, and borrowing lets code use a value temporarily without affecting its ownership. Since many programmers will have never encountered these features in this form before, we explain them in detail in Chapters 4 and 5.
These same ownership rules also form the foundation of Rust’s trustworthy concurrency model. Most languages leave the relationship between a mutex and the data it’s meant to protect to the comments; Rust can actually check at compile time that your code locks the mutex while it accesses the data. Most languages admonish you to be sure not to use a data structure yourself after you’ve given it to another thread; Rust checks that you don’t. Rust is able to prevent data races at compile time.
Rust is not really an object-oriented language, although it has some object-oriented characteristics. Rust is not a functional language, although it does tend to make the influences on a computation’s result more explicit, as functional languages do. Rust resembles C and C++ to an extent, but many idioms from those languages don’t apply, so typical Rust code does not deeply resemble C or C++ code. It’s probably best to reserve judgement about what sort of language Rust is, and see what you think once you’ve become comfortable with the language.
To get feedback on the design in a real-world setting, Mozilla has developed Servo, a new web browser engine, in Rust. Servo’s needs and Rust’s goals are well matched: a browser must perform well and handle untrusted data securely. Servo uses Rust’s safe concurrency to put the full machine to work on tasks that would be impractical to parallelize in C or C++. In fact, Servo and Rust have grown up together, with Servo using the latest new language features, and Rust evolving based on feedback from Servo’s developers.
Type Safety
Rust is a type-safe language. But what do we mean by “type safety”? Safety sounds good, but what exactly are we being kept safe from?
Here’s the definition of undefined behavior from the 1999 standard for the C programming language, known as C99:
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
Consider the following C program:
int
main
(
int
argc
,
char
**
argv
)
{
unsigned
long
a
[
1
];
a
[
3
]
=
0x7ffff7b36cebUL
;
return
0
;
}
According to C99, because this program accesses an element off the end of the array a
, its behavior is undefined, meaning that it can do anything whatsoever. When we ran this program on Jim’s laptop, it produced the following output:
undef: Error: .netrc file is readable by others.
undef: Remove password or make file unreadable by others.
Then it crashed. Jim’s laptop doesn’t even have a .netrc file. If you try it yourself, it will probably do something entirely different.
The machine code the C compiler generated for this main
function happens to place the array a
on the stack three words before the return address, so storing 0x7ffff7b36cebUL
in a[3]
changes poor main
’s return address to point into the midst of code in the C standard library that consults one’s .netrc file for a password. When main
returns, execution resumes not in main
’s caller, but at the machine code for these lines from the library:
warnx
(
_
(
"Error: .netrc file is readable by others."
));
warnx
(
_
(
"Remove password or make file unreadable by others."
));
goto
bad
;
In allowing an array reference to affect the behavior of a subsequent return
statement, the C compiler is fully standards-compliant. An undefined operation doesn’t just produce an unspecified result: it is allowed to cause the program to do anything at all.
The C99 standard grants the compiler this carte blanche to allow it to generate faster code. Rather than making the compiler responsible for detecting and handling odd behavior like running off the end of an array, the standard makes the programmer responsible for ensuring those conditions never arise in the first place.
Empirically speaking, we’re not very good at that. While a student at the University of Utah, researcher Peng Li modified C and C++ compilers to make the programs they translated report when they executed certain forms of undefined behavior. He found that nearly all programs do, including those from well-respected projects that hold their code to high standards. And undefined behavior often leads to exploitable security holes in practice. The Morris worm propagated itself from one machine to another using an elaboration of the technique shown before, and this kind of exploit remains in widespread use today.
In light of that example, let’s define some terms. If a program has been written so that no possible execution can exhibit undefined behavior, we say that program is well defined. If a language’s safety checks ensure that every program is well defined, we say that language is type safe.
A carefully written C or C++ program might be well defined, but C and C++ are not type safe: the program shown earlier has no type errors, yet exhibits undefined behavior. By contrast, Python is type safe. Python is willing to spend processor time to detect and handle out-of-range array indices in a friendlier fashion than C:
>>>
a
=
[
0
]
>>>
a
[
3
]
=
0x7ffff7b36ceb
Traceback (most recent call last):
File"<stdin>"
, line1
, in<module>
IndexError
:list assignment index out of range
>>>
Python raised an exception, which is not undefined behavior: the Python documentation specifies that the assignment to a[3]
should raise an IndexError
exception, as we saw. Certainly, a module like ctypes
that provides unconstrained access to the machine can introduce undefined behavior into Python, but the core language itself is type safe. Java, JavaScript, Ruby, and Haskell are similar in this way.
Note that being type safe is independent of whether a language checks types at compile time or at runtime: C checks at compile time, and is not type safe; Python checks at runtime, and is type safe.
It is ironic that the dominant systems programming languages, C and C++, are not type safe, while most other popular languages are. Given that C and C++ are meant to be used to implement the foundations of a system, entrusted with implementing security boundaries and placed in contact with untrusted data, type safety would seem like an especially valuable quality for them to have.
This is the decades-old tension Rust aims to resolve: it is both type safe and a systems programming language. Rust is designed for implementing those fundamental system layers that require performance and fine-grained control over resources, yet still guarantees the basic level of predictability that type safety provides. We’ll look at how Rust manages this unification in more detail in later parts of this book.
Rust’s particular form of type safety has surprising consequences for multithreaded programming. Concurrency is notoriously difficult to use correctly in C and C++; developers usually turn to concurrency only when single-threaded code has proven unable to achieve the performance they need. But Rust guarantees that concurrent code is free of data races, catching any misuse of mutexes or other synchronization primitives at compile time. In Rust, you can use concurrency without worrying that you’ve made your code impossible for any but the most accomplished programmers to work on.
Rust has an escape valve from the safety rules, for when you absolutely have to use a raw pointer. This is called unsafe code, and while most Rust programs don’t need it, we’ll show how to use it and how it fits into Rust’s overall safety scheme in Chapter 21.
Like those of other statically typed languages, Rust’s types can do much more than simply prevent undefined behavior. An accomplished Rust programmer uses types to ensure values are used not just safely but meaningfully, in a way that’s consistent with the application’s intent. In particular, Rust’s traits and generics, described in Chapter 11, provide a succinct, flexible, and performant way to describe characteristics that a group of types has in common, and then take advantage of those commonalities.
Our aim in this book is to give you the insights you need not just to write programs in Rust, but to put the language to work ensuring that those programs are both safe and correct, and to anticipate how they will perform. In our experience, Rust is a major step forward in systems programming, and we want to help you take advantage of it.
Get Programming Rust now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.