## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

# 8.7. A Better Tokenizer

## Problem

A simple method of tokenizing—or breaking up a string into its discrete elements—was presented in Recipe 2.6. However, this is not powerful enough to handle all your string-tokenizing needs. You need a tokenizer—also referred to as a lexer—that can split up a string based on a well-defined set of characters.

## Solution

Using the `Split` method of the `Regex` class, we can use a regular expression to indicate the types of tokens and separators that we are interested in gathering. This technique works especially well with equations, since the tokens of an equation are well-defined. For example, the code:

```using System;
using System.Text.RegularExpressions;

public static string[] Tokenize(string equation)
{
Regex RE = new Regex(@"([\+\-\*\(\)\^\\])");
return (RE.Split(equation));
}```

will divide up a string according to the regular expression specified in the `Regex` constructor. In other words, the string passed in to the `Tokenize` method will be divided up based on the delimiters `+`, `-`, `*`, `(`, `)`, `^`, or `\`. The following method will call the `Tokenize` method to tokenize the equation: `(y - 3)(3111*x^21 + x` `+ 320)`:

```public void TestTokenize( )
{
foreach(string token in Tokenize("(y - 3)(3111*x^21 + x + 320)"))
Console.WriteLine("String token = " + token.Trim( ));
}```

which displays the following output:

`String token = String token = ( String token = y String token = - String token = 3 String token = ) String token = String token = ( String token = 3111 String token = ...`

## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

No credit card required