Chapter 17. Strings and Text
The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information.
Alan Perlis, epigram #34
We’ve been using Rust’s main textual types, String, str, and char, throughout the book. In “String Types”, we described the syntax for character and string literals and showed how strings are represented in memory. In this chapter, we cover text handling in more detail.
In this chapter:
-
We give you some background on Unicode that should help you make sense of the standard library’s design.
-
We describe the
chartype, representing a single Unicode code point. -
We describe the
Stringandstrtypes, representing owned and borrowed sequences of Unicode characters. These have a broad variety of methods for building, searching, modifying, and iterating over their contents. -
We cover Rust’s string formatting facilities, like the
println!andformat!macros. You can write your own macros that work with formatting strings and extend them to support your own types. -
We give an overview of Rust’s regular expression support.
-
Finally, we talk about why Unicode normalization matters and show how to do it in Rust.
Some Unicode Background
This book is about Rust, not Unicode, which has entire books devoted to it already. But Rust’s character and string types are designed around Unicode. Here are a few bits of Unicode that help explain Rust.
ASCII, Latin-1, and Unicode
Unicode