Chapter 29. Dealing with Unwanted Characters
When cleaning data, finding unexpected characters in your data fields can cause significant issues. Those issues can occur at multiple points during your data preparation: loading, using, and outputting the data. Therefore, this chapter is focused on building your understanding of what unwanted characters are, the problems they introduce, and how to remove them.
What Is an Unwanted Character?
An unwanted character is simply a letter, number, or symbol within your data field that you do not need or that introduces potential problems in your data output. Data software is often very precise about what it is processing, and rightly so; otherwise, it could easily produce erroneous output. For example, if there are different data types within a single data field, it affects whether a field can be aggregated logically. Let’s look at the three main types of data fields that can be affected by unwanted characters when they’re loaded into Prep Builder:
- Numeric fields
- If a non-numeric character is loaded into a numeric field, the field will be imported as a string and thus can no longer be used in aggregations. For example, what should 10 + 1c3 equal? As you can see, that calculation isn’t possible, and that is why a numeric field must contain only numeric values.
- Dates
- If a non-numeric character is found in a field expecting only date values, the date value with the unwanted character will appear as a null because the date will be in an invalid ...
Get Tableau Prep: Up & Running now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.