Item 29. Always Use a Parser

XML documents are just too rich in syntax sugar to be processed by anything short of a full-blown XML parser. I've seen many hackish systems held together by string and bailing wire based on regular expressions, grep, sed, raw stream processing, and other tools. These are extremely brittle and rarely able to handle the full panoply of documents they encounter. Problems include:

  • Detecting the encoding, including handling multibyte character sets

  • Comments that contain tags

  • Processing instructions that contain tags

  • CDATA sections

  • Unexpected placement of spaces and line breaks within tags

  • Default attribute values applied from the internal DTD subset

  • Character references like   and  

  • Predefined entity references such ...

Get Effective XML: 50 Specific Ways to Improve Your XML now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.