May 2018
Intermediate to advanced
392 pages
10h 22m
English

If you asked me to propose a toast to a newly minted class of data analysts, I’d probably raise my glass and say, “May your data always be free of errors and may it always arrive perfectly structured!” Life would be ideal if these sentiments were feasible. In reality, you’ll sometimes receive data in such a sorry state that it’s hard to analyze without modifying it in some way. This is called dirty data, which is a general label for data with errors, missing values, or poor organization that makes standard queries ineffective. When data is converted from one file type to another or when a column receives the wrong ...