White Space in Attributes
The final and trickiest case are attribute values. Depending on attribute type, the parser normalizes attribute values before reporting them to the client application. First it converts all tabs, carriage returns, and line feeds to one space each. This is done for all attributes. Next, if the attribute has a declared type and that type is anything other than CDATA, the parser also condenses all runs of space to a single space and finally trims all leading and trailing white space from the value. However, the parser does not perform this second step for attributes that have type CDATA or are undeclared.
For example, consider the following document. The parser will trim the leading and trailing white space from the year ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access