Chapter 2. Data Is Knowledge
Data has a larger definition than ever before in history,1 and it is exploding on an almost incomprehensible scale.2 A Tesla automobile creates 10 GB of data per mile through its sensors,3 for instance.
Knowledge is the aggregation of all this data. Resistance to the true nature of our data stands between us and the natural assimilation of document, sensor, semantic, geospatial, and binary data into information we can act on. With the right organizational and technical support, these data components can work together in a mutually supportive manner.
In the case of a Tesla automobile, one piece of sensor data might be needed for split-second, life-or-death decision making, and it might play a role in a machine-learning mission that will collect data from millions of cars driving over millions of miles, spanning decades. Furthermore, that one piece of data could well have legal consequences and decision-making processes around it, in addition to possibly having a moral facet. With this potential impact and longevity, we need to take data modeling seriously from the very start.
Questions to Ask Before You Begin Data Modeling
-
What form does the data take at its point of origin?
-
If models are stored separately, how can we correlate them?
-
How do you access the data from the models?
-
Is there a common vocabulary across these models?
Further, the subtler aspect of data is that it’s “out there”—everywhere, in the cloud, virtualized and on premises, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access