Splitting Strings on Any of Multiple Delimiters
Problem
You need to split a string into fields, but the delimiters (and spacing around them) aren’t consistent throughout the string.
Solution
The split() method of string objects is really meant for very simple cases and
does not allow for multiple delimiters or account for possible whitespace
around the delimiters. In cases when you need a bit more flexibility, use the
re.split() method:
>>>line='asdf fjdk; afed, fjek,asdf, foo'>>>importre>>>re.split(r'[;,\s]\s*',line)['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']
Discussion
The re.split() function is useful because you can specify multiple patterns
for the separator. For example, as shown in the solution, the separator is either a
comma (,), semicolon (;), or whitespace followed by any amount of extra
whitespace. Whenever that pattern is found, the entire match becomes the
delimiter between whatever fields lie on either side of the match.
The result is a list of fields, just as with str.split().
When using re.split(), you need to be a bit careful should the regular
expression pattern involve a capture group enclosed in parentheses.
If capture groups are used, then the matched text is also included
in the result. For example, watch what happens here:
>>>fields=re.split(r'(;|,|\s)\s*' ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access