June 2014
Beginner to intermediate
334 pages
6h 8m
English
Information can be described as structured, unstructured, or sometimes a mix of the two—semi-structured.
In a very general sense, structured data is anything that can be parsed by an algorithm. Common examples include JSON, CSV, and XML. If given structured data, we can design a piece of code to dissect the underlying format and easily produce useful results. As mining structured data is a deterministic process, it allows us to automate the parsing. This in effect lets us gather more input to feed our data analysis algorithms.
Unstructured data is everything else. It is data not defined in a specified manner. Written languages such as English are often regarded as unstructured because of the difficulty in parsing ...