CHAPTER 5A Look Forward on Analytics: Not Everything Can Be a Nail

“Order doesn't come by itself.”

—Benoit Mandelbrot

The Fractalist: Memoir of a Scientific Maverick

This chapter reviews the use of a data topology for organizing and arranging data that can transcend a dedicated analytical environment, such as a data lake, and encompass all of the data needs of an organization and the broader enterprise ecosystem. A data topology covers three primary areas: creating a zone map, defining the data flows, and establishing the data topography. Overall, the data topology is a critical aspect when developing an information architecture for artificial intelligence (AI) because the inherent use of AI is now transcendent to the vitality of your organization.

A Need for Organization

Data can enter the data lake from anywhere. This includes online transaction processing (OLTP) systems, operational data stores, data warehouses, logs or other machine data, or cloud services, etc. Data that is brought into a data lake from one or more of these source systems is likely to encompass different types of data-based technologies and data formats. Variations may also transcend the spoken word (for audio files) and written language (for electronic documents) and include variations in page set encoding for storing data, such as Unicode and Extended Binary Coded Decimal Interchange Code (EBCDIC).

Regardless of format, data needs to be brought into the data lake by some means. However, the data ...

Get Smarter Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.