You need to check the validity of an International Standard Book Number
(ISBN), which can be in either the older ISBN-10 or the current
ISBN-13 format. You want to allow a leading ISBN
identifier, and ISBN parts can
optionally be separated by hyphens or spaces. ISBN 978-0-596-52068-7
,
ISBN-13:
978-0-596-52068-7
, 978 0 596 52068 7
, 9780596520687
,
ISBN-10
0-596-52068-9
, and 0-596-52068-9
are all examples of valid
input.
You cannot validate an ISBN using a regex alone, because the last digit is computed using a checksum algorithm. The regular expressions in this section validate the format of an ISBN, whereas the subsequent code examples include a validity check for the final digit.
ISBN-10:
^(?:ISBN(?:-10)?:?●)?(?=[-0-9X●]{13}$|[0-9X]{10}$)[0-9]{1,5}[-●]?↵ (?:[0-9]+[-●]?){2}[0-9X]$
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
ISBN-13:
^(?:ISBN(?:-13)?:?●)?(?=[-0-9●]{17}$|[0-9]{13}$)97[89][-●]?[0-9]{1,5}↵ [-●]?(?:[0-9]+[-●]?){2}[0-9]$
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
ISBN-10 or ISBN-13:
^(?:ISBN(?:-1[03])?:?●)?(?=[-0-9●]{17}$|[-0-9X●]{13}$|[0-9X]{10}$)↵ (?:97[89][-●]?)?[0-9]{1,5}[-●]?(?:[0-9]+[-●]?){2}[0-9X]$
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
// `regex` checks for ISBN-10 or ISBN-13 format var regex = /^(?:ISBN(?:-1[03])?:? )?(?=[-0-9 ]{17}$|[-0-9X ]{13}$|↵ [0-9X]{10}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?(?:[0-9]+[- ]?){2}[0-9X]$/; if (regex.test(subject)) { // Remove non ISBN digits, then split into an array var chars = subject.replace(/[^0-9X]/g, "").split(""); // Remove the final ISBN digit from `chars`, and assign it to `last` var last = chars.pop(); var sum = 0; var digit = 10; var check; if (chars.length == 9) { // Compute the ISBN-10 check digit for (var i = 0; i < chars.length; i++) { sum += digit * parseInt(chars[i], 10); digit -= 1; } check = 11 - (sum % 11); if (check == 10) { check = "X"; } else if (check == 11) { check = "0"; } } else { // Compute the ISBN-13 check digit for (var i = 0; i < chars.length; i++) { sum += (i % 2 * 2 + 1) * parseInt(chars[i], 10); } check = 10 - (sum % 10); if (check == 10) { check = "0"; } } if (check == last) { alert("Valid ISBN"); } else { alert("Invalid ISBN check digit"); } } else { alert("Invalid ISBN"); }
import re import sys # `regex` checks for ISBN-10 or ISBN-13 format regex = re.compile("^(?:ISBN(?:-1[03])?:? )?(?=[-0-9 ]{17}$|↵ [-0-9X ]{13}$|[0-9X]{10}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?↵ (?:[0-9]+[- ]?){2}[0-9X]$") subject = sys.argv[1] if regex.search(subject): # Remove non ISBN digits, then split into an array chars = re.sub("[^0-9X]", "", subject).split("") # Remove the final ISBN digit from `chars`, and assign it to `last` last = chars.pop() if len(chars) == 9: # Compute the ISBN-10 check digit val = sum((x + 2) * int(y) for x,y in enumerate(reversed(chars))) check = 11 - (val % 11) if check == 10: check = "X" elif check == 11: check = "0" else: # Compute the ISBN-13 check digit val = sum((x % 2 * 2 + 1) * int(y) for x,y in enumerate(chars)) check = 10 - (val % 10) if check == 10: check = "0" if (str(check) == last): print "Valid ISBN" else: print "Invalid ISBN check digit" else: print "Invalid ISBN"
See Recipe 3.5 for help with implementing these regular expressions in other programming languages.
An ISBN is a unique identifier for commercial books and book-like products. The 10-digit ISBN format was published as an international standard, ISO 2108, in 1970. All ISBNs assigned since January 1, 2007 are 13 digits.
ISBN-10 and ISBN-13 numbers are divided into four or five elements, respectively. Three of the elements are of variable length; the remaining one or two elements are of fixed length. All five parts are usually separated with hyphens or spaces. A brief description of each element follows:
13-digit ISBNs start with the prefix 978 or 979.
The group identifier identifies the language-sharing country group. It ranges from one to five digits long.
The publisher identifier varies in length and is assigned by the national ISBN agency.
The title identifier also varies in length and is selected by the publisher.
The final character is called the check digit, and is computed using a checksum algorithm. An ISBN-10 check digit can be either a number from 0 to 9 or the letter X (Roman numeral for 10), while an ISBN-13 check digit ranges from 0 to 9. The allowed characters are different because the two ISBN types use different checksum algorithms.
The parts of the “ISBN-10 or ISBN-13” regex are shown in the following breakdown. Because this regex is written in free-spacing mode, the literal space characters in the regex have been escaped with backslashes. Java requires that even spaces within character classes be escaped in free-spacing mode:
^ # Assert position at the beginning of the string. (?: # Group but don't capture... ISBN # Match the text "ISBN". (?:-1[03])? # Optionally match the text "-10" or "-13". :? # Optionally match a literal ":". \ # Match a space character (escaped). )? # Repeat the group between zero and one time. (?= # Assert that the following can be matched here... [-0-9\ ]{17}$ # Match 17 hyphens, digits, and spaces, then the end | # of the string. Or... [-0-9X\ ]{13}$ # Match 13 hyphens, digits, Xs, and spaces, then the | # end of the string. Or... [0-9X]{10}$ # Match 10 digits and Xs, then the end of the string. ) # End the positive lookahead. (?: # Group but don't capture... 97[89] # Match the text "978" or "979". [-\ ]? # Optionally match a hyphen or space. )? # Repeat the group between zero and one time. [0-9]{1,5} # Match a digit between one and five times. [-\ ]? # Optionally match a hyphen or space. (?: # Group but don't capture... [0-9]+ # Match a digit between one and unlimited times. [-\ ]? # Optionally match a hyphen or space. ){2} # Repeat the group exactly two times. [0-9X] # Match a digit or "X". $ # Assert position at the end of the string.
Regex options: Free-spacing |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
The leading ‹(?:ISBN(?:-1[03])?:?●)?
› has three optional elements,
allowing it to match any one of the following seven strings (all
except the empty-string option include a space character at the
end):
ISBN●
ISBN-10●
ISBN-13●
ISBN:●
ISBN-10:●
ISBN-13:●
The empty string (no prefix)
Next, the positive lookahead ‹(?=[-0-9●]{17}$|[-0-9X●]{13}$|[0-9X]{10}$)
› enforces one of
three options (separated by the ‹|
› alternation operator) for the length and
character set of the rest of the match. All three options (shown next)
end with the ‹$
› anchor,
which ensures that there cannot be any trailing text that doesn’t fit
into one of the patterns:
- ‹
[-0-9●]{17}$
› Allows an ISBN-13 with four separators (17 total characters)
- ‹
[-0-9X●]{13}$
› Allows an ISBN-13 with no separators or an ISBN-10 with three separators (13 total characters)
- ‹
[0-9X]{10}$
› Allows an ISBN-10 with no separators (10 total characters)
After the positive lookahead validates the length and character
set, we can match the individual elements of the ISBN without worrying
about their combined length. ‹(?:97[89][-●]?)?
› matches the “978” or
“979” prefix required by an ISBN-13. The noncapturing group is
optional because it will not match within an ISBN-10 subject string.
‹[0-9]{1,5}[-●]?
› matches the one to
five digit group identifier and an optional, following separator.
‹(?:[0-9]+[-●]?){2}
› matches the
variable-length publisher and title identifiers, along with their
optional separators. Finally, ‹[0-9X]$
› matches the check digit at the end of
the string.
Although a regular expression can check that the final digit uses a valid character (a digit or X), it cannot determine whether it’s correct for the ISBN’s checksum. One of two checksum algorithms (determined by whether you’re working with an ISBN-10 or ISBN-13 number) are used to provide some level of assurance that the ISBN digits haven’t been accidentally transposed or otherwise entered incorrectly. The JavaScript and Python example code shown earlier implemented both algorithms. The following sections describe the checksum rules in order to help you implement these algorithms with other programming languages.
The check digit for an ISBN-10 number ranges from 0 to 10 (with the Roman numeral X used instead of 10). It is computed as follows:
Multiply each of the first 9 digits by a number in the descending sequence from 10 to 2, and sum the results.
Divide the sum by 11.
Subtract the remainder (not the quotient) from 11.
If the result is 11, use the number 0; if 10, use the letter X.
Here’s an example of how to derive the ISBN-10 check digit for
0-596-52068-
?:
Step 1: sum = 10×0 + 9×5 + 8×9 + 7×6 + 6×5 + 5×2 + 4×0 + 3×6 + 2×8 = 0 + 45 + 72 + 42 + 30 + 10 + 0 + 18 + 16 = 233 Step 2: 233 ÷ 11 = 21, remainder 2 Step 3: 11 − 2 = 9 Step 4: 9 [no substitution required]
The check digit is 9, so the complete sequence is ISBN 0-596-52068-9
.
An ISBN-13 check digit ranges from 0 to 9, and is computed using similar steps.
Multiply each of the first 12 digits by 1 or 3, alternating as you move from left to right, and sum the results.
Divide the sum by 10.
Subtract the remainder (not the quotient) from 10.
If the result is 10, use the number 0.
For example, the ISBN-13 check digit for 978-0-596-52068-
?
is calculated as follows:
Step 1: sum = 1×9 + 3×7 + 1×8 + 3×0 + 1×5 + 3×9 + 1×6 + 3×5 + 1×2 + 3×0 + 1×6 + 3×8 = 9 + 21 + 8 + 0 + 5 + 27 + 6 + 15 + 2 + 0 + 6 + 24 = 123 Step 2: 123 ÷ 10 = 12, remainder 3 Step 3: 10 − 3 = 7 Step 4: 7 [No substitution required]
The check digit is 7, and the complete sequence is ISBN 978-0-596-52068-7
.
This version of the “ISBN-10 or ISBN-13” regex uses word boundaries instead of anchors to help you find ISBNs within longer text while ensuring that they stand on their own. The “ISBN” identifier has also been made a required string in this version, for two reasons. First, requiring it helps eliminate false positives (without it, the regex could potentially match any 10 or 13 digit number), and second, ISBNs are officially required to use this identifier when printed:
\bISBN(?:-1[03])?:?●(?=[-0-9●]{17}$|[-0-9X●]{13}$|[0-9X]{10}$)↵ (?:97[89][-●]?)?[0-9]{1,5}[-●]?(?:[0-9]+[-●]?){2}[0-9X]\b
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
A limitation of the previous regexes is that they allow matching an ISBN-10 number preceded by the “ISBN-13” identifier, and vice versa. The following regex uses regex conditionals (see Recipe 2.17) to ensure that an “ISBN-10” or “ISBN-13” identifier is followed by the appropriate ISBN type. It allows both ISBN-10 and ISBN-13 numbers when the type is not explicitly specified. This regex is overkill in most circumstances because the same result could be achieved more manageably using the ISBN-10 and ISBN-13 specific regexes that were shown earlier, one at a time. It’s included here merely to demonstrate an interesting use of regular expressions:
^ (?:ISBN(-1(?:(0)|3))?:?\ )? (?(1) (?(2) (?=[-0-9X ]{13}$|[0-9X]{10}$) [0-9]{1,5}[- ]?(?:[0-9]+[- ]?){2}[0-9X]$ | (?=[-0-9 ]{17}$|[0-9]{13}$) 97[89][- ]?[0-9]{1,5}[- ]?(?:[0-9]+[- ]?){2}[0-9]$ ) | (?=[-0-9 ]{17}$|[-0-9X ]{13}$|[0-9X]{10}$) (?:97[89][- ]?)?[0-9]{1,5}[- ]?(?:[0-9]+[- ]?){2}[0-9X]$ ) $
Regex options: Free-spacing |
Regex flavors: .NET, PCRE, Perl, Python |
The most up-to-date version of the ISBN Users’ Manual can be found on the International ISBN Agency’s website at http://www.isbn-international.org.
The official Numerical List of Group Identifiers at http://www.isbn-international.org/en/identifiers/allidentifiers.html can help you identify a book’s originating country or area based on the first 1 to 5 digits of its ISBN.
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.