Do not use regular expressions to parse XML / HTML data

Using regular expressions to parse XML or HTML text is probably the most frequently committed mistake. Although regular expressions are very useful, they have their limitations and these limits are usually met when trying to use them for XML or HTML parsing. HTML and XML are not regular languages by nature.

Luckily, there are other tools in Java for that purpose. The JDK contains readily available classes to parse these formats and convert them to Document Object Model (DOM), or to work with them on the fly using the SAX parsing model.

Do not use regular expressions for certain tasks when there are more specific parsers for the purpose. The fact that there are other readily available ...

Get Java 9 Regular Expressions now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.