5Operations on Dates, Strings, and Missing Values

Dates, with both the date and the time component, and alphanumeric strings are data types that, for their peculiar natures, are better handled by specific functions, tailored to their characteristics. Also, missing values, despite not being a data type, are literally the absence of data and represent a special case whose characteristics require a specific treatment and, sometimes, specific functions.

Regarding date and time, they both are special in their measurement units: base 60 for minutes and seconds, base 12 or 24 for hours, and more complicated for dates, even if we consider just the Gregorian calendar. At least, measurement units of time are regular; hours, minutes, and seconds always have the same duration; not so for dates, which are measured based on an irregular scale; months have different durations; and years too are not always of the same length. These complications have become so familiar to us that we consider it perfectly normal to count a different number of days for different months and to adjust February every four years, but for computational logic, such irregularities represent the difference between a trivial algebra and an irrational way of counting. For example, calculating the difference in days between two dates is in practice overly complicated, with respect to the triviality of the operation, because the calendar is needed, meaning that every operation is a special case. For this reason, date and ...

Get Data Science Fundamentals with R, Python, and Open Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.