Chapter 5. String and Text Processing

Someone will call Something will fall And smash on the floor Without reading the text Know what comes next Seen it before And it’s painful Things must change We must rearrange them Or we’ll have to estrange them All that I’m saying The game’s not worth playing Over and over again

Depeche Mode, “The Sun and the Rainfall”

5.0 Introduction

Users who come to Mathematica for its superior mathematical capabilities are pleasantly surprised to find strong abilities in programming areas outside of mathematics proper. This is certainly true in the area of textual and string processing. Mathematica’s rich library of functions for string and structured text manipulation rivals Java, Perl, or any other modern language you can tie a string around.

The sections in this introduction provide information on some of the basic tools of strings and string manipulation.

Characters and Character Encodings

Mathematica uses Unicode internally, but externally (e.g., when saving a notebook) it uses ASCII codes, encoding non-ASCII characters in a special form.

For example, lowercase Greek letters and other non-ASCII characters are encoded using backslash-bracketed character names (\[name]).

In[1]:=  alpha = "α"
Out[1]=  α

The function ToString will translate strings using different encoding schemes.

Characters and Character Encodings

The default character encoding used by Mathematica is stored in $CharacterEncoding ...

Get Mathematica Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.