Almost every useful program involves some kind of text processing, whether it is parsing data or generating output. This chapter focuses on common problems involving text manipulation, such as pulling apart strings, searching, substitution, lexing, and parsing. Many of these tasks can be easily solved using built-in methods of strings. However, more complicated operations might require the use of regular expressions or the creation of a full-fledged parser. All of these topics are covered. In addition, a few tricky aspects of working with Unicode are addressed.
You need to split a string into fields, but the delimiters (and spacing around them) aren’t consistent throughout the string.
split() method of string objects is really meant for very simple cases, and
does not allow for multiple delimiters or account for possible whitespace
around the delimiters. In cases when you need a bit more flexibility, use the
'asdf fjdk; afed, fjek,asdf, foo'
re.split() function is useful because you can specify multiple patterns for the separator. For example, as shown in the solution, the separator is either a comma (,), semicolon (;), or whitespace followed by any amount of extra whitespace. Whenever that pattern is found, the entire match becomes the delimiter between ...