Item 29. Always Use a Parser
XML documents are just too rich in syntax sugar to be processed by anything short of a full-blown XML parser. I've seen many hackish systems held together by string and bailing wire based on regular expressions, grep, sed, raw stream processing, and other tools. These are extremely brittle and rarely able to handle the full panoply of documents they encounter. Problems include:
Detecting the encoding, including handling multibyte character sets
Comments that contain tags
Processing instructions that contain tags
CDATA sections
Unexpected placement of spaces and line breaks within tags
Default attribute values applied from the internal DTD subset
Character references like   and  
Predefined entity references such ...
Get Effective XML: 50 Specific Ways to Improve Your XML now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.