Chapter 17. Strings and Text

The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information.

Alan Perlis, epigram #34

We’ve been using Rust’s main textual types, String, str, and char, throughout the book. In “String Types”, we described the syntax for character and string literals, and showed how strings are represented in memory. In this chapter, we cover text handling in more detail.

In this chapter:

  • We give you some background on Unicode that should help you make sense of the standard library’s design.

  • We describe the char type, representing a single Unicode code point.

  • We describe the String and str types, representing owned and borrowed sequences of Unicode characters. These have a broad variety of methods for building, searching, modifying, and iterating over their contents.

  • We cover Rust’s string formatting facilities, like the println! and format! macros. You can write your own macros that work with formatting strings, and extend them to support your own types.

  • We give an overview of Rust’s regular expression support.

  • Finally, we talk about why Unicode normalization matters, and show how to do it in Rust.

Some Unicode Background

This book is about Rust, not Unicode, which has entire books devoted to it already. But Rust’s character and string types are designed around Unicode. Here are a few bits of Unicode that help explain Rust.

ASCII, Latin-1, and Unicode

Unicode and ASCII ...

Get Programming Rust now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.