This article is about how I used regular expressions to turn common English text into Braille, correctly using all the possible shorthand-like contractions and abbreviations that Braille provides. This is basically a case study in how a messy problem got solved straightaway with Perl—but I’ll also mention some of the new features to be found in Perl 5.005’s regular expressions, which I used along the way; and in the end, I find a surprising commonality between regexes and natural language writing systems.
When I was a little kid, I had a children’s book about Helen Keller. I don’t remember reading it; I just remember the back cover, which had the Braille alphabet printed on it—well, embossed, actually. They had the Roman letter “a” in ink, and then below that the embossed dot pattern for Braille “a”, and so on up to “z”. So I got the idea that Braille printing is just like a letter-for-letter substitution cipher for the Roman alphabet.
Then I started noticing on men’s room doors that below “MEN” in big Roman letters, there’d be the same word in Braille—but sometimes the word would have three Braille characters, and sometimes just two. And that I found perplexing. I couldn’t imagine how the word “men” could end up only two characters long.
So I asked my friend Sheri, who’s been reading and writing Braille since she was a kid, how “men” could be just two characters long. She ...