Big Data Tools and Pipelines

Ideas and resources related to data tools.

Lego-powered Kafka training

Jesse Anderson walks viewers through the path data can take from publishers through a Kafka cluster and on to consumers.

Data modeling constructs and terminology

Identification of data sources is the first step in warehouse development. In this video training segment, Michael Blaha provides a framework by reviewing data modeling constructs and terminology, including dependent and independent entity types. Using IE (information engineering) notation and the ERwin tool, Blaha walks through a sample operational data model.

Architecting Hadoop Applications

In this O'Reilly training video, the "Hadoop Application Architectures" authors present an end-to-end case study of a clickstream analytics engine to provide a concrete example of how to architect and implement a complete solution with Hadoop. In this segment, they provide an overview of the complete architecture. Presenters: Mark Grover, Gwen Shapira, Jonathan Seidman, Ted Malaska