Chapter 17. Strings and Text

The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information.

Alan Perlis, epigram #34

We’ve been using Rust’s main textual types, String, str, and char, throughout the book. In “String Types”, we described the syntax for character and string literals and showed how strings are represented in memory. In this chapter, we cover text handling in more detail.

In this chapter:

  • We give you some background on Unicode that should help you make sense of the standard library’s design.

  • We describe the char type, representing a single Unicode code point.

  • We describe the String and str types, representing owned and borrowed sequences of Unicode characters. These have a broad variety of methods for building, searching, modifying, and iterating over their contents.

  • We cover Rust’s string formatting facilities, like the println! and format! macros. You can write your own macros that work with formatting strings and extend them to support your own types.

  • We give an overview of Rust’s regular expression support.

  • Finally, we talk about why Unicode normalization matters and show how to do it in Rust.

Some Unicode Background

This book is about Rust, not Unicode, which has entire books devoted to it already. But Rust’s character and string types are designed around Unicode. Here are a few bits of Unicode that help explain Rust.

ASCII, Latin-1, and Unicode


Get Programming Rust, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.