Chapter 3. How an External Data Platform Fits into Your Data Architecture

Many data analysts, data scientists, citizen data scientists, and inhabitants of the data blogosphere are familiar with Doug Laney’s “3 V’s” of big data:

Volume
Refers to vast amounts of data, which can be generated, for example, from cell phones, social media, and photographs
Velocity
Measures the speed at which this vast amount of data is being generated, collected, and analyzed
Variety
Describes the different types of data—structured data (data that can be properly displayed in a data table such as name, phone number, ID, etc.) is blended with current data, which is mostly unstructured: images, audio, social media updates, etc.

Laney recently discussed the possible addition of two more V’s: veracity—or correctness and accuracy—and value. In order to improve accuracy and deliver value, data must be processed in a timely fashion, cleaned and stored for analytical purposes, and be monitored by proper governance all while meeting compliance standards. This is a tall order that also requires the infrastructure of scalable processing systems.

The new demands require new solutions, yet legacy business intelligence and data warehousing system architectures—which take both internal and external data from various structured sources—are falling behind. These systems are limited in their capacity and scalability mainly due to being on premises and having been created only for structured data use cases.

Organizations ...

Get Why External Data Needs to Be Part of Your Data and Analytics Strategy now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.