You need to be especially wary of implicit markup, often indicated by white space. For example, consider the simple case of a name:
The name is sometimes treated as a single thing, but quite often you need to extract the first name and last name separately, most commonly to sort by last name. This seems easy enough to do: just split the string on the white space. The first name is everything before the space. The last name is everything after the space. Of course, this algorithm falls apart as soon as you add middle names:
<Name>Lenny Alfred Bruce</Name>
You may decide that you don't really care about middle names, that they can just be appended to the first name. You're just going to sort ...