2

Providing Structure to Unstructured Data

Abstract

Most of the data collected today is unstructured. For data managers, unstructured data is any stored information that comes in different sizes (e.g., a tweet, an email message, a book, a library corpus), that contains information that may express a single concept in many different ways (e.g., September 3, 2012; Sept 3, 2012, 09/03/12, 03/09/12, 3/9/12, Labor Day), that is not neatly packaged into spreadsheet cells (e.g., an audio file, a photograph of the Declaration of Independence, a tweet), that cannot be assigned a numeric value (such as yesterday, TRUE, nil), or that does not conform in any way to a specific data standard. Examples of unstructured data would include just about everything ...

Get Principles and Practice of Big Data, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.