If you asked me to propose a toast to a newly minted class of data analysts, I’d probably raise my glass and say, “May your data always be free of errors and may it always arrive perfectly structured!” Life would be ideal if these sentiments were feasible. In reality, you’ll sometimes receive data in such a sorry state that it’s hard to analyze without modifying it in some way. This is called dirty data, which is a general label for data with errors, missing values, or poor organization that makes standard queries ineffective. When data is converted from one file type to another or when a column receives the wrong ...

Get Practical SQL now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.