Looking for patterns
Creating a good regular expression is a bit of a design process. A regular expression that is too rigid may not be able to match all of the potentially correct matches. On the other hand, a regular expression that is not specific enough may match a large number of strings incorrectly.
The key is to look for a well-defined pattern in the data that easily distinguishes the correct matches from otherwise incorrect matches. It is usually a helpful first step to look through the data itself. This allows you to get an intuitive sense for the existence and frequency of certain patterns.
The following python script uses pandas to read the dataset into a pandas dataframe, extract the address column, and print out a random sample ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access