Learning | Data

Our take on the ideas, information, and tools that make data work.

Video play
Tara Donovan exhibition.

Solving the right problem

Max Shron and Sasha Laundy explore tactics for need-finding and problem scoping that make it possible to put investments in data to profitable use.

Runnable code code

Hadoop: What you need to know

Learn about the basics of how Hadoop works, why it's such an important technology, and how you should be using it without getting mired in the details.

Video play
William Caxton showing specimens of his printing to King Edward IV and his Queen.

Easy, reproducible reports with R

Garrett Grolemund demonstrates how to use R Markdown to combine code and text into a single .Rmd file to generate polished reports automatically in a variety of formats.

Video play
Frank Gehry's Dancing House windows.

Best practices for streaming applications

Mark Grover and Ted Malaska offer an overview of projects for streaming applications, including Kafka, Flume, and Spark Streaming, and discuss the architectural schemas available, such as Lambda and Kappa.

Video play
The color frontispiece from Albert Henry Munsell's 1905 pamphlet "A Color Notation."

Running Spark on Alluxio with S3

Calvin Jia presents an in-depth overview of Alluxio and its role in the big data ecosystem. In this segment, he reviews examples that show how Alluxio complements Spark and S3, to enable fast data access.

Runnable code code
Flowing stream.

Making Sense of Stream Processing

Stream processing is finally coming of age. This report shows you how stream processing can make data storage and processing systems more flexible and less complex.

Video play
Herding the crowd.

Organizing big data with the crowd

Using real-world cases, Lukas Biewald describes microtasking, where it fits in the crowdsourcing landscape, and how data scientists and developers can tap into the crowd to collect and process data sets.

Video play
Ornamental bars

Securing Apache Kafka

Jun Rao explains the threats that Kafka Security mitigates, the changes that were made to Kafka to enable security, and the steps required to secure an existing Kafka cluster.