Accessing Substrings
Credit: Alex Martelli
Problem
You want to access portions of a string. For example, you’ve read a fixed-width record and want to extract the record’s fields.
Solution
Slicing is great, of course, but it only does one field at a time:
afield = theline[3:8]
If you need to think in terms of field length,
struct.unpack
may be appropriate. Here’s an example of getting a
five-byte string, skipping three bytes, getting two eight-byte
strings, and then getting the rest:
import struct # Get a 5-byte string, skip 3, get two 8-byte strings, then all the rest: baseformat = "5s 3x 8s 8s" numremain = len(theline)-struct.calcsize(baseformat) format = "%s %ds" % (baseformat, numremain) leading, s1, s2, trailing = struct.unpack(format, theline)
If you need to split at five-byte boundaries, here’s how you could do it:
numfives, therest = divmod(len(theline), 5)
form5 = "%s %dx" % ("5s "*numfives, therest)
fivers = struct.unpack(form5, theline)Chopping a string into individual characters is of course easier:
chars = list(theline)
If you prefer to think of your data as being cut up at specific columns, slicing within list comprehensions may be handier:
cuts = [8,14,20,26,30] pieces = [ theline[i:j] for i, j in zip([0]+cuts, cuts+[sys.maxint]) ]
Discussion
This recipe was inspired by Recipe 1.1 in the Perl Cookbook. Python’s slicing takes the
place of Perl’s substr.
Perl’s built-in
unpack and
Python’s struct.unpack are similar. Perl’s is slightly handier, as it accepts a field length of ...