If you simply need to ensure that a string follows the basic Social Security number format and that obvious, invalid numbers are eliminated, the following regex provides an easy solution. If you need a more rigorous solution that checks with the Social Security Administration to determine whether the number belongs to a living person, refer to the links in the section of this recipe.
^(?!000|666)(?:[0-6][0-9]{2}|7(?:[0-6][0-9]|7[0-2]))-↵ (?!00)[0-9]{2}-(?!0000)[0-9]{4}$
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
if re.match(r"^(?!000|666)(?:[0-6][0-9]{2}|7(?:[0-6][0-9]|7[0-2]))-↵ (?!00)[0-9]{2}-(?!0000)[0-9]{4}$", sys.argv[1]): print "SSN is valid" else: print "SSN is invalid"
See Recipe 3.5 for help with implementing this regular expression with other programming languages.
United States Social Security numbers are nine-digit numbers in
the format AAA-GG-SSSS
:
The first three digits are assigned by geographical region and are called the area number. The area number cannot be 000 or 666, and as of this writing, no valid Social Security number contains an area number above 772.
Digits four and five are called the group number and range from 01 to 99.
The last four digits are serial numbers from 0001 to 9999.
This recipe follows all of the rules just listed. Here’s the regular expression again, this time explained piece by piece:
^ # Assert position at the beginning of the string. (?!000|666) # Assert that neither "000" nor "666" can be matched here. (?: # Group but don't capture... [0-6] # Match a character in the range between "0" and "6". [0-9]{2} # Match a digit, exactly two times. | # or... 7 # Match a literal "7". (?: # Group but don't capture... [0-6] # Match a character in the range between "0" and "6". [0-9] # Match a digit. | # or... 7 # Match a literal "7". [0-2] # Match a character in the range between "0" and "2". ) # End the noncapturing group. ) # End the noncapturing group. - # Match a literal "-". (?!00) # Assert that "00" cannot be matched here. [0-9]{2} # Match a digit, exactly two times. - # Match a literal "-". (?!0000) # Assert that "0000" cannot be matched here. [0-9]{4} # Match a digit, exactly four times. $ # Assert position at the end of the string.
Regex options: Free-spacing |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
Apart from the ‹^
›
and ‹$
› tokens that
assert position at the beginning and end of the string, this regex can
be broken into three groups of digits separated by hyphens. The first
group is the most complex. The second and third groups simply match
any two or four-digit number, respectively, but use a preceding
negative lookahead to rule out the possibility of matching all
zeros.
The first group of digits is much more complex and harder to
read than the others because it matches a numeric range. First, it
uses the negative lookahead ‹(?!000|666)
› to rule out the specific values
“000” and “666”. Next comes the task of eliminating any number higher
than 772.
Since regular expressions deal with text rather than numbers, we
have to break down the numeric range character by character. First, we
know that we can match any three-digit number starting with 0 through
6, because the preceding negative lookahead already ruled out the
invalid numbers 000 and 666. This first part is easily accomplished
using a couple of character classes and a quantifier: ‹[0-6][0-9]{2}
›. Since we need to
offer an alternative for numbers starting with 7, the pattern we just
built is put into a grouping as ‹(?:[0-6][0-9]{2}|7)
› in order to limit the reach
of the alternation operator.
Numbers starting with 7 are allowed only if they fall between
700 and 772, so the next step is to further divide any number that
starts with 7 based on the second digit. If it’s between 0 and 6, any
third digit is allowed. If the second digit is 7, the third digit must
be between 0 and 2. Putting these rules for numbers starting with 7
together, we get ‹7(?:[0-6][0-9]|7[0-2])
›, which matches the
number 7 followed by one of two options for the second and third
digit.
Finally, insert that into the outer grouping for the first set
of digits, and you get ‹(?:[0-6][0-9]{2}|7(?:[0-6][0-9]|7[0-2]))
›.
That’s it. You’ve successfully created a regex that matches a
three-digit number between 000 and 772.
If you’re searching for Social Security numbers in a larger
document or input string, replace the ‹^
› and ‹$
› anchors with word boundaries. Regular
expression engines consider all alphanumeric characters and the
underscore to be word characters.
\b(?!000|666)(?:[0-6][0-9]{2}|7(?:[0-6][0-9]|7[0-2]))-↵ (?!00)[0-9]{2}-(?!0000)[0-9]{4}\b
Regex options: None |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
The Social Security Administration website at http://www.socialsecurity.gov provides answers to common questions as well as up-to-date lists of what area and group numbers have been assigned.
The Social Security Number Verification Service (SSNVS) at http://www.socialsecurity.gov/employer/ssnv.htm offers two ways to verify over the Internet that names and Social Security numbers match the Social Security Administration’s records.
A more thorough discussion of matching numeric ranges, including examples of matching ranges with a variable number of digits, can be found in Recipe 6.5.
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.