Chapter 2. Strings and Text
Almost every useful program involves some kind of text processing, whether it is parsing data or generating output. This chapter focuses on common problems involving text manipulation, such as pulling apart strings, searching, substitution, lexing, and parsing. Many of these tasks can be easily solved using built-in methods of strings. However, more complicated operations might require the use of regular expressions or the creation of a full-fledged parser. All of these topics are covered. In addition, a few tricky aspects of working with Unicode are addressed.
2.1. Splitting Strings on Any of Multiple Delimiters
Problem
You need to split a string into fields, but the delimiters (and spacing around them) arenât consistent throughout the string.
Solution
The split()
method of string objects is really meant for very simple cases, and
does not allow for multiple delimiters or account for possible whitespace
around the delimiters. In cases when you need a bit more flexibility, use the
re.split()
method:
>>>
line
=
'asdf fjdk; afed, fjek,asdf, foo'
>>>
import
re
>>>
re
.
split
(
r'[;,\s]\s*'
,
line
)
[
'asdf'
,
'fjdk'
,
'afed'
,
'fjek'
,
'asdf'
,
'foo'
]
Discussion
The re.split()
function is useful because you can specify multiple patterns for the separator. For example, as shown in the solution, the separator is either a comma (,), semicolon (;), or whitespace followed by any amount of extra whitespace. Whenever that pattern is found, the entire match becomes the delimiter ...
Get Python Cookbook, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.