Learning | Data

Our take on the ideas, information, and tools that make data work.

Video play

What is a resilient distributed dataset?

Alex Robbins guides you through an in-depth look at the Python API for Apache Spark. In this segment, he explores RDDs--the central abstraction in Spark and essential knowledge for anyone working in the system.

Video play
Expedition 47 Commander Tim Kopra of NASA captured this brightly lit night image of the city of Chicago on April 5, 2016, from the International Space Station.

Dive into scikit-learn

With scikit-learn, you can deploy machine learning models in just a few lines of code. Andreas Mueller summarizes the classification, regression, and clustering algorithms in this powerful machine learning library.

Video play
"Preparation for WAR to defend Commerce," <em>Birch's Views of Philadelphia</em>.

Building data science teams: Preparing your organization

How should you prepare when assembling and integrating a data science team into your organization? In this video training segment, Paco Nathan offers tips to consider in the early stages, including designating the right executive sponsor and encouraging basic hands-on data science training for management.

Video play
The bridge over the Crim Dell at the College of William and Mary

Architecting Hadoop Applications

In this O'Reilly training video, the "Hadoop Application Architectures" authors present an end-to-end case study of a clickstream analytics engine to provide a concrete example of how to architect and implement a complete solution with Hadoop. In this segment, they provide an overview of the complete architecture. Presenters: Mark Grover, Gwen Shapira, Jonathan Seidman, Ted Malaska

Video play
Gather Ye Rosebuds While Ye May, by J.W. Waterhouse

A/B Testing: a checklist

Lisa Qian lays out the process for a successful A/B test, from defining a goal and hypothesis, to knowing when to end the test. The most rigorous form of data-gathering when done right, A/B tests can't be run by guesswork or gut instinct.

Runnable code code
Line graph

Pivot Tables in Python

Pivot tables are an incredibly handy tool for exploring tabular data. This excerpt from the Python Data Science Handbook (Early Release) shows how to use the elegant pivot table features in Pandas to slice and dice your data.

Runnable code code
Visualizing Data with Seaborn

Data visualization with Seaborn

Seaborn provides an API on top of matplotlib, which uses sane plot and color defaults and simple functions for common statistical plot types.

Video play

Data Science at the Command Line

Whether you're entirely new to the command line or already dreaming in shell scripts, by the end of this webcast you will have a solid understanding of how to leverage the power of the command line.

Video play
Python bar chart

Analyzing Data with Python

In this webcast led by Sarah Guido, you'll get a bird's eye overview of some of the best tools for data analysis and how you can apply them to your workflow.

Video play
Our Brain, The Human Body and Health Revised by Alvin Davison

Thinking with Data

This webcast examines a framework for incorporating ideas from other fields (like design, argument studies, and consulting) into Data Science.