Skip to Content
Regular Expressions Cookbook, 2nd Edition
book

Regular Expressions Cookbook, 2nd Edition

by Jan Goyvaerts, Steven Levithan
August 2012
Intermediate to advanced
609 pages
19h 16m
English
O'Reilly Media, Inc.
Content preview from Regular Expressions Cookbook, 2nd Edition

9.4. Match XML Names

Problem

You want to check whether a string is a legitimate XML name (a common syntactic construct). XML provides precise rules for the characters that can occur in a name, and reuses those rules for element, attribute, and entity names, processing instruction targets, and more. Names must be composed of a letter, underscore, or colon as the first character, followed by any combination of letters, digits, underscores, colons, hyphens, and periods. That’s actually an approximate description, but it’s pretty close. The exact list of permitted characters depends on the version of XML in use.

Alternatively, you might want to splice a pattern for matching valid names into other XML-handling regexes, when the extra precision warrants the added complexity.

Following are some examples of valid names:

  • thing

  • _thing_2_

  • :Российские-Вещь

  • fantastic4:the.thing

  • 日本の物

Note that letters from non-Latin scripts are allowed, even including the ideographic characters in the last example. Likewise, any Unicode digit is allowed after the first character, not just the Arabic numerals 0–9.

For comparison, here are several examples of invalid names that should not be matched by the regex:

  • thing!

  • thing with spaces

  • .thing.with.a.dot.in.front

  • -thingamajig

  • 2nd_thing

Solution

Like identifiers in many programming languages, there is a set of characters that can occur in an XML name, and a subset that can be used as the first character. Those character lists are dramatically different for XML 1.0 Fourth Edition ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Regular Expressions Cookbook

Regular Expressions Cookbook

Jan Goyvaerts, Steven Levithan

Publisher Resources

ISBN: 9781449327453Supplemental ContentErrata Page