O'Reilly logo

Architecting the Industrial Internet by Carla Romano, Robert Stackowiak, Shyam Nath

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data lakes

A data lake is a repository of data in its natural format and can consist of data of all types, schema, structured, and semi-structured. Its purpose is to serve as a repository for all analyzable data, including raw and transformed structured data from applications and relational systems, semi-structured data such as document collections (for example, email), logs, clickstreams, devices, geolocation trails, social media, and weather using HDFS. Unstructured data such as images, video, and audio can also be included in a data lake. Data can simply be dumped in the data lake with no consideration for integration and transformation.

Data stored in its native format can later be parsed for analysis. It can serve as a staging area for ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required