4.12. Validate Social Security Numbers

Problem

You need to check whether someone entered text as a valid Social Security number.

Solution

If you simply need to ensure that a string follows the basic Social Security number format and that obvious, invalid numbers are eliminated, the following regex provides an easy solution. If you need a more rigorous solution that checks with the Social Security Administration to determine whether the number belongs to a living person, refer to the links in the section of this recipe.

Regular expression

^(?!000|666)(?:[0-6][0-9]{2}|7(?:[0-6][0-9]|7[0-2]))-↵
(?!00)[0-9]{2}-(?!0000)[0-9]{4}$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Python

if re.match(r"^(?!000|666)(?:[0-6][0-9]{2}|7(?:[0-6][0-9]|7[0-2]))-↵
(?!00)[0-9]{2}-(?!0000)[0-9]{4}$", sys.argv[1]):
    print "SSN is valid"
else:
    print "SSN is invalid"

Other programming languages

See Recipe 3.5 for help with implementing this regular expression with other programming languages.

Discussion

United States Social Security numbers are nine-digit numbers in the format AAA-GG-SSSS:

  • The first three digits are assigned by geographical region and are called the area number. The area number cannot be 000 or 666, and as of this writing, no valid Social Security number contains an area number above 772.

  • Digits four and five are called the group number and range from 01 to 99.

  • The last four digits are serial numbers from 0001 to 9999.

This recipe follows all of the rules just listed. Here’s the regular expression again, this time explained piece by piece:

^            # Assert position at the beginning of the string.
(?!000|666)  # Assert that neither "000" nor "666" can be matched here.
(?:          # Group but don't capture...
  [0-6]      #   Match a character in the range between "0" and "6".
  [0-9]{2}   #   Match a digit, exactly two times.
 |           #  or...
  7          #   Match a literal "7".
  (?:        #   Group but don't capture...
    [0-6]    #     Match a character in the range between "0" and "6".
    [0-9]    #     Match a digit.
   |         #    or...
    7        #     Match a literal "7".
    [0-2]    #     Match a character in the range between "0" and "2".
  )          #   End the noncapturing group.
)            # End the noncapturing group.
-            # Match a literal "-".
(?!00)       # Assert that "00" cannot be matched here.
[0-9]{2}     # Match a digit, exactly two times.
-            # Match a literal "-".
(?!0000)     # Assert that "0000" cannot be matched here.
[0-9]{4}     # Match a digit, exactly four times.
$            # Assert position at the end of the string.
Regex options: Free-spacing
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Apart from the ^ and $ tokens that assert position at the beginning and end of the string, this regex can be broken into three groups of digits separated by hyphens. The first group is the most complex. The second and third groups simply match any two or four-digit number, respectively, but use a preceding negative lookahead to rule out the possibility of matching all zeros.

The first group of digits is much more complex and harder to read than the others because it matches a numeric range. First, it uses the negative lookahead (?!000|666) to rule out the specific values “000” and “666”. Next comes the task of eliminating any number higher than 772.

Since regular expressions deal with text rather than numbers, we have to break down the numeric range character by character. First, we know that we can match any three-digit number starting with 0 through 6, because the preceding negative lookahead already ruled out the invalid numbers 000 and 666. This first part is easily accomplished using a couple of character classes and a quantifier: [0-6][0-9]{2}. Since we need to offer an alternative for numbers starting with 7, the pattern we just built is put into a grouping as (?:[0-6][0-9]{2}|7) in order to limit the reach of the alternation operator.

Numbers starting with 7 are allowed only if they fall between 700 and 772, so the next step is to further divide any number that starts with 7 based on the second digit. If it’s between 0 and 6, any third digit is allowed. If the second digit is 7, the third digit must be between 0 and 2. Putting these rules for numbers starting with 7 together, we get 7(?:[0-6][0-9]|7[0-2]), which matches the number 7 followed by one of two options for the second and third digit.

Finally, insert that into the outer grouping for the first set of digits, and you get (?:[0-6][0-9]{2}|7(?:[0-6][0-9]|7[0-2])). That’s it. You’ve successfully created a regex that matches a three-digit number between 000 and 772.

Variations

Find Social Security numbers in documents

If you’re searching for Social Security numbers in a larger document or input string, replace the ^ and $ anchors with word boundaries. Regular expression engines consider all alphanumeric characters and the underscore to be word characters.

\b(?!000|666)(?:[0-6][0-9]{2}|7(?:[0-6][0-9]|7[0-2]))-↵
(?!00)[0-9]{2}-(?!0000)[0-9]{4}\b
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

See Also

The Social Security Administration website at http://www.socialsecurity.gov provides answers to common questions as well as up-to-date lists of what area and group numbers have been assigned.

The Social Security Number Verification Service (SSNVS) at http://www.socialsecurity.gov/employer/ssnv.htm offers two ways to verify over the Internet that names and Social Security numbers match the Social Security Administration’s records.

A more thorough discussion of matching numeric ranges, including examples of matching ranges with a variable number of digits, can be found in Recipe 6.5.

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.