# 4.12. Validate Social Security Numbers

## Problem

You need to check whether a user has entered a valid Social Security number in your application or website form.

## Solution

If you simply need to ensure that a string follows the basic Social Security number format and that obvious, invalid numbers are eliminated, the following regex provides an easy solution. If you need a more rigorous solution that checks with the Social Security Administration to determine whether the number belongs to a living person, refer to the section of this recipe.

### Regular expression

`^(?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}\$`
 Regex options: None Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

### Python example

```if re.match(r"^(?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-↵
(?!0000)[0-9]{4}\$", sys.argv[1]):
print "SSN is valid"
else:
print "SSN is invalid"```

See Recipe 3.6 for help with implementing this regular expression with other programming languages.

## Discussion

United States Social Security numbers are nine-digit numbers in the format `AAA-GG-SSSS`:

• The first three digits were historically (prior to mid-2011) assigned by geographical region, and are thus called the area number. The area number cannot be 000, 666, or between 900 and 999.

• Digits four and five are called the group number and range from 01 to 99.

• The last four digits are serial numbers from 0001 to 9999.

This recipe follows all of the rules just listed. Here’s the regular expression again, this time explained piece by piece:

```^            # Assert position at the beginning of the string.
(?!000|666)  # Assert that neither "000" nor "666" can be matched here.
[0-8]        # Match a digit between 0 and 8.
[0-9]{2}     # Match a digit, exactly two times.
-            # Match a literal "-".
(?!00)       # Assert that "00" cannot be matched here.
[0-9]{2}     # Match a digit, exactly two times.
-            # Match a literal "-".
(?!0000)     # Assert that "0000" cannot be matched here.
[0-9]{4}     # Match a digit, exactly four times.
\$            # Assert position at the end of the string.```
 Regex options: Free-spacing Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Apart from the `^` and `\$` tokens that assert position at the beginning and end of the string, this regex can be broken into three sets of digits separated by hyphens. The first set allows any number from 000 to 899, but uses the preceding negative lookahead `(?!000|666)` to rule out the specific values 000 and 666. This kind of restriction can be pulled off without lookahead, but having this tool in our arsenal dramatically simplifies the regex. If you wanted to remove 000 and 666 from the range of valid area numbers without using any sort of lookaround, you’d need to restructure `(?!000|666)[0-8][0-9]{2}` as `(?:00[1-9]|0[1-9][0-9]|[1-578][0-9]{2}|6[0-57-9][0-9]|66[0-57-9])`. This far less readable approach uses a series of numeric ranges, which you can read all about in Recipe 6.7.

The second and third sets of digits in this pattern simply match any two- or four-digit number, respectively, but use a preceding negative lookahead to rule out the possibility of matching all zeros.

## Variations

### Find Social Security numbers in documents

If you’re searching for Social Security numbers in a larger document or input string, replace the `^` and `\$` anchors with word boundaries. Regular expression engines consider all alphanumeric characters and the underscore to be word characters.

`\b(?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}\b`
 Regex options: None Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby