Item 29. Always Use a Parser

XML documents are just too rich in syntax sugar to be processed by anything short of a full-blown XML parser. I've seen many hackish systems held together by string and bailing wire based on regular expressions, grep, sed, raw stream processing, and other tools. These are extremely brittle and rarely able to handle the full panoply of documents they encounter. Problems include:

  • Detecting the encoding, including handling multibyte character sets

  • Comments that contain tags

  • Processing instructions that contain tags

  • CDATA sections

  • Unexpected placement of spaces and line breaks within tags

  • Default attribute values applied from the internal DTD subset

  • Character references like   and  

  • Predefined entity references such ...

Get Effective XML: 50 Specific Ways to Improve Your XML now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.