Skip to Content conference Strata + Hadoop World New York 2015: Video Compilation October 2015
Beginner to intermediate
148h 4m
English
Closed Captioning available in German, English, Spanish, French, Japanese, Korean, Portuguese (Portugal, Brazil), Chinese (Simplified), Chinese (Traditional) Course outline Business & Innovation conference sessions 3h
Data Innovations conference sessions 7h 18m
Data Science & Advanced Analytics conference sessions 12h 13m
Data science for Wall Street, Part 1 - Sean Owen (Cloudera), Juliet Hougland (Cloudera), Sandy Ryza (Cloudera)50m 5s
Data science for Wall Street, Part 2 - Sean Owen (Cloudera), Juliet Hougland (Cloudera), Sandy Ryza (Cloudera)41m 56s
Data science for Wall Street, Part 3 - Sean Owen (Cloudera), Juliet Hougland (Cloudera), Sandy Ryza (Cloudera)50m 12s
Machine Learning 101, Part 1 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato)51m 22s
Machine Learning 101, Part 2 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato)34m 45s
Machine Learning 101, Part 3 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato)51m 0s
Machine Learning 101, Part 4 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato)32m 3s
Machine Learning 101, Part 5 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato)43m 56s
Machine Learning 101, Part 6 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato)39m 28s
Machine Learning 101, Part 7 - Alice Zheng (Dato), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato)35m 3s
Scaling Python Analytics on Impala - Wes McKinney (Cloudera)45m 31s
Mapping Big Data: A Data Driven Market Report - Russell Jurney (Relato)22m 58s
Queering Quant: How Having All the Data Isn’t Enough to Represent a Complex Social Phenomena - Lauralea Banks Edwards (Washington State University)21m 8s
Data Science in the Wall Street Journal - Juan Huerta (Dow Jones)54m 6s
Data modeling for data science: Simplify your workload with complex types - Marcel Kornacker (Cloudera, Inc.), Josh Wills (Cloudera), Alexander Behm (Cloudera)47m 28s
Running experiments with logged-out users: Solving the mixed group problem - Raphael Lee (Airbnb), Victor Vazquez (Airbnb)36m 40s
How data science helps prevent churn at Avira, a 100-million user company - Iulia Pasov (Avira), Calin-Andrei Burloiu (Avira)19m 35s
Probabilistic programming in data science - Thomas Wiecki (Quantopian)22m 17s
Tackling machine learning complexity for data curation - Ihab Ilyas (Tamr, Inc.)15m 36s
Learning to love Bayesian statistics - Allen Downey (Olin College of Engineering)18m 35s
Design, User Experience, & Visualization conference sessions 7h 34m
Introduction to visualizations using D3, Part 1 - Brian Suda ((optional.is))46m 7s
Introduction to visualizations using D3, Part 2 - Brian Suda ((optional.is))53m 28s
Introduction to visualizations using D3, Part 3 - Brian Suda ((optional.is))38m 26s
Introduction to visualizations using D3, Part 4 - Brian Suda ((optional.is))30m 55s
Value in the details - understanding data through visual exploration - Richard Brath (Uncharted Software), Rob Harper (Uncharted)37m 12s
Data inclusion for all - Alex Kelly (General Motors), Kim Le (General Motors)28m 51s
Visualising Music Services - Alan Hannaway (7digital)38m 9s
From profiling to analysis: Designing visualization tools for purpose - Jeffrey Heer (Trifacta | University of Washington), Jock Mackinlay (Tableau)38m 18s
What have you done!? How to visualize methods and models for decision makers - Michael Freeman (University of Washington)37m 21s
LIVE from New York: An introduction to Linked Immersive Visualization Environments - Margit Zwemer (LiquidLandscape)33m 46s
Data, Design, and Organizations: Design thinking and prototyping approaches to data challenges in orgs - Peter Olson (IDEO), David Boardman (IDEO)38m 58s
Designing happiness with data - Pamela Pavliscak (Change Sciences)33m 12s
Hadoop Internals & Development conference sessions 7h 45m
Hadoop application architectures: Fraud detection, Part 1 - Gwen Shapira (Confluent), Ted Malaska (Cloudera), Mark Grover (Cloudera), Jonathan Seidman (Cloudera)48m 4s
Hadoop application architectures: Fraud detection, Part 2 - Gwen Shapira (Confluent), Ted Malaska (Cloudera), Mark Grover (Cloudera), Jonathan Seidman (Cloudera)41m 33s
Hadoop application architectures: Fraud detection, Part 3 - Gwen Shapira (Confluent), Ted Malaska (Cloudera), Mark Grover (Cloudera), Jonathan Seidman (Cloudera)38m 13s
Hadoop application architectures: Fraud detection, Part 4 - Gwen Shapira (Confluent), Ted Malaska (Cloudera), Mark Grover (Cloudera), Jonathan Seidman (Cloudera)49m 54s
Simplifying Hadoop: RecordService, a secure and unified data access path for compute frameworks - Lenni Kuff (Cloudera), Nong Li (Cloudera), Stephen Romanoff (Capital One )39m 39s
Hadoop's storage gap: Resolving transactional access/analytic performance trade-offs with Kudu - Todd Lipcon (Cloudera) and Binglin Chang (Xiaomi)39m 32s
Native erasure coding support inside HDFS - Zhe Zhang (Cloudera), Weihua Jiang (Intel)40m 27s
Transaction processing with Apache Hive, HBase, and Phoenix - Alan Gates (Hortonworks)40m 59s
OLTP on Hadoop: Reviewing the first Hadoop-based TPC-C benchmarks - Monte Zweben (Splice Machine Inc.), John Leach (Splice Machine)1h 2m 30s
What does it mean to virtualize the Hadoop distributed file system? - Thomas Phelan (BlueData)41m 57s
HDFS operations made easy: Guide to the improved, full service HDFS File Browser - Ravi Prakash (Altiscale)22m 39s
IoT & Real-time conference sessions 8h 52m
Process, store, and analyze like a boss with Team Apache: Kafka, Spark, and Cassandra, Part 1 - Patrick McFadin (DataStax)46m 38s
Process, store, and analyze like a boss with Team Apache: Kafka, Spark, and Cassandra, Part 2 - Patrick McFadin (DataStax)35m 38s
Process, store, and analyze like a boss with Team Apache: Kafka, Spark, and Cassandra, Part 3 - Patrick McFadin (DataStax)32m 48s
Process, store, and analyze like a boss with Team Apache: Kafka, Spark, and Cassandra, Part 4 - Patrick McFadin (DataStax)27m 44s
When it absolutely, positively, has to be there: Reliability guarantees in Kafka - Gwen Shapira (Confluent), Jeff Holoman (Cloudera)42m 25s
What does your smart device know about you? - Charles Givre (Booz | Allen | Hamilton)41m 55s
Twitter Heron: Stream processing at scale - Karthik Ramasamy (Twitter)43m 8s
Streaming in the extreme - Jim Scott (MapR Technologies, Inc.)39m 1s
IoT with Spark Streaming: Practical lessons from real-world use cases - Hari Shreedharan (Cloudera), Anand Iyer (Cloudera)40m 22s
An open source approach to gathering and analyzing device-sourced health data - Ian Eslick (VitalLabs)38m 16s
Elastic stream processing without tears - Michael Hausenblas (Mesosphere)29m 33s
Modeling predictive maintenance applications in the IoT Era - Yan Zhang (Microsoft)36m 38s
Building a real-time analytics stack with Kafka, Samza, and Druid - Fangjin Yang (Imply), Gian Merlino (Stealth)42m 1s
Oulu Smart City pilot - Susanna Pirttikangas (University of Oulu)36m 16s
Production Ready Hadoop conference sessions 11h 48m
Apache Hadoop operations for production systems, Part 1 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.)50m 42s
Apache Hadoop operations for production systems, Part 2 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.)34m 31s
Apache Hadoop operations for production systems, Part 3 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.)46m 2s
Apache Hadoop operations for production systems, Part 4 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.)42m 46s
Apache Hadoop operations for production systems, Part 5 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.)55m 11s
Apache Hadoop operations for production systems, Part 6 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.)26m 47s
Apache Hadoop operations for production systems, Part 7 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.)49m 31s
Apache Hadoop operations for production systems, Part 8 - Kathleen Ting (Cloudera), Miklos Christine (Databricks), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.)33m 12s
Building a Hadoop data application, Part 1 - Tom White (Cloudera), Ryan Blue (Cloudera)54m 26s
Building a Hadoop data application, Part 2 - Tom White (Cloudera), Ryan Blue (Cloudera)34m 31s
Building a Hadoop data application, Part 3 - Tom White (Cloudera), Ryan Blue (Cloudera)46m 41s
Hadoop in the cloud: An architectural how-to - Jairam Ranganathan (Cloudera)38m 10s
Multi-tenant, multi-cluster, and multi-container Apache HBase deployment - Jonathan Hsieh (Cloudera, Inc), Dima Spivak (Cloudera)39m 28s
The glue: Building the connectors and tools to manage big data warehouses - Siwei Zhu (Scribd), Kevin Perko (Scribd)36m 58s
Failing fast and falling often is no way to run a cluster! - Michael Segel (Segel & Associates)36m 50s
Real-world NoSQL schema design - Ted Dunning (MapR Technologies)42m 44s
Hadoop's storage gap: Resolving transactional access/analytic performance trade-offs with Kudu - Todd Lipcon (Cloudera, Inc.)39m 32s
Spark & Beyond conference sessions 12h 10m
Apache Drill bootcamp, Part 1 - Tomer Shiran (Dremio), Jacques Nadeau (Dremio)42m 37s
Apache Drill bootcamp, Part 2 - Tomer Shiran (Dremio), Jacques Nadeau (Dremio)30m 58s
Apache Drill bootcamp, Part 3 - Tomer Shiran (Dremio), Jacques Nadeau (Dremio)45m 40s
Apache Drill bootcamp, Part 4 - Tomer Shiran (Dremio), Jacques Nadeau (Dremio)41m 23s
Architecting a data platform, Part 1 - Stephen OSullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)46m 28s
Architecting a data platform, Part 2 - Stephen OSullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)46m 43s
Architecting a data platform, Part 3 - Stephen OSullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)44m 53s
Architecting a data platform, Part 4 - Stephen OSullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)36m 31s
What's coming for the Spark community - Patrick Wendell (Databricks)49m 26s
Supercharging R with Spark for end-to-end data science - Hossein Falaki (Databricks Inc.)40m 8s
Next-generation genomics analysis with Apache Spark - Timothy Danford (Tamr, Inc.)38m 50s
Lifelogging for insights - Håkan Jonsson (Sony Mobile Communications)39m 26s
Effective testing of Spark programs and jobs - Holden Karau (IBM)33m 13s
Estimating financial risk with Apache Spark - Sandy Ryza (Cloudera)35m 48s
Netflix: Integrating Spark at petabyte scale - Daniel Weeks (Netflix)38m 7s
First-ever scalable, distributed deep learning architecture using Spark and Tachyon - Christopher Nguyen (Adatao, Inc.), Vu Pham (Adatao, Inc), Michael Bui (Adatao, Inc.)37m 42s
Spark on Mesos - Dean Wampler (Typesafe)37m 29s
How Spark is working out at Comcast scale - Sridhar Alla (Comcast), Jan Neumann (Comcast)44m 58s
Financial Services conference sessions 4h 10m
Data-driven Business conference sessions 10h 2m
Welcome to data-driven business day - Alistair Croll (Solve For Interesting)14m 42s
Hacking the bias - Farrah Bostic (The Difference Engine)28m 46s
Beer, diapers, and correlation: A tale of ambiguity - Mark Madsen (Third Nature)31m 19s
How big data is creating a new breed of CFOs - Krish Venkataraman (Syncsort)19m 44s
Bringing the human dimension to data: A case study on transforming research at O’Reilly Media - Tricia Wang (Constellate Data ), Matt LeMay (Constellate Data)18m 23s
Human-in-the-loop-computing-as-a-service - Adam Devine (WorkFusion)20m 21s
Farming in the 21st century and beyond - Gary Short (Duncodin Limited)20m 31s
One trillion streams and counting - Alexander White (Next Big Sound)17m 24s
Building an insight machine - Matthew Granade (Domino Data Lab)15m 8s
When AI joins the team: Onboarding the next generation of employees - Jana Eggers (Nara Logics)17m 34s
Developing a modern enterprise data strategy, Part 1 - Scott Kurth (Silicon Valley Data Science), Edd Dumbill (Silicon Valley Data Science)47m 28s
Developing a modern enterprise data strategy, Part 2 - Scott Kurth (Silicon Valley Data Science), Edd Dumbill (Silicon Valley Data Science)59m 7s
Developing a modern enterprise data strategy, Part 3 - Scott Kurth (Silicon Valley Data Science), Edd Dumbill (Silicon Valley Data Science)56m 34s
Big Data is Not Enough - Rahel Jhirad (Hearst)17m 54s
Death of the click: How big data is killing your favorite metrics - Claudia Perlich (Dstillery)35m 27s
How a global entertainment company successfully built a data lake for continued digital dominance - Joe Caserta (Caserta Concepts), Elliott Cordo (Caserta Concepts, LLC)31m 54s
How Hadoop is powering Walmart’s data-driven business - Jeremy King (Walmart Global eCommerce)39m 3s
What can Big Pharma teach us about Wall Street? What can Wall Street teach us about Big Pharma? - Joe Klobusicky (Geisinger Health System), Ali Habib (Northwestern Feinberg School of Medicine), Ekaterina Volkova (Cornell University)35m 27s
Your data is screaming at you. Learn to listen through customer choice modeling - Vivek Farias (Celect)41m 4s
Science fiction to product: Data-driven development - Micha Gorelick (Fast Forward Labs)34m 27s
How to Build a Company on Open Source - Travis Oliphant (Continuum Analytics, Inc.), Peter Wang (Continuum Analytics, Inc.)45m 42s
How to Build Publishing & On-Demand Learning Environments with IPython - Kyle Kelley (Rackspace), Andrew Odewahn (O'Reilly Media)35m 58s
How to Use Pandas for Data Analysis - Jeff Reback (Continuum Analytics, Inc.) - Part 134m 4s
How to Use Pandas for Data Analysis - Jeff Reback (Continuum Analytics, Inc.) - Part 236m 45s
How to Create Beautiful Visualizations with Bokeh, Part 1 - Sarah Bird (Aptivate), Bryan Van de Ven (Continuum Analytics)50m 18s
How to Create Beautiful Visualizations with Bokeh, Part 2 - Bryan Van de Ven (Continuum Analytics), Sarah Bird (Aptivate)24m 2s
How to Build Big Data Workflows - Andy Terrel (Fashion Metric), Ben Zaitlen (Continuum Analytics, Inc.)37m 9s
How to Solve Problems in Geophysics with Python - Paige Bailey (Chevron)44m 22s
How to Leverage the Blaze Ecosystem - Jim Crist (Continuum Analytics) & Phil Cloud (Continuum Analytics)43m 37s
How to Think About Python - James Powell (NumFOCUS)35m 6s
Introduction to Publication Quality Plotting with Matplotlib - Damon McDougall (UT Austin), Michael Droettboom (Space Telescope Science Institute)41m 40s
Interactive Computing in the Jupyter Notebook – Present and Future - Jason Grout (Bloomberg L.P.), Chris Colbert (Continuum Analytics)41m 55s
R Quickstart: Wrangle, transform, and visualize data, Part 1 - Garrett Grolemund (RStudio)43m 13s
R Quickstart: Wrangle, transform, and visualize data, Part 2 - Garrett Grolemund (RStudio)32m 17s
Work with Big Data in R - Nathan Stephens (RStudio, Inc.)57m 22s
Reproducible Reports with Big Data, Part 1 - Yihui Xie (RStudio, Inc.)35m 13s
Reproducible Reports with Big Data, Part 2 - Yihui Xie (RStudio, Inc.)27m 45s
Interactive Shiny Applications built on Big Data, Part 1 - Garrett Grolemund (RStudio)42m 18s
Interactive Shiny Applications built on Big Data, Part 2 - Garrett Grolemund (RStudio)28m 14s
Hardcore Data Science conference sessions 4h 15m
I+G conference sessions 4h 10m
Hadoop Use Cases conference sessions 5h 22m
Sponsored conference sessions 19h 17m
Putting Modern BI to Work: Innovative Use Cases - Ali Tore (ClearStory Data)42m 4s
Big data analytics in the cloud - Matt Winkler (Microsoft)43m 8s
Where Do You Go From Here? Lessons and Landmarks from Real-World Cisco USC - Robert Novak (Cisco)41m 28s
Expand your mind to fit the big data Data Center: the scale and cost of information management architectures - Robert Eve (Cisco), Robert Novak (Cisco), Nenshad Bardoliwalla (Paxata, Inc.)39m 11s
Delivering trusted data for analyst autonomy and operational agility with a unified big data fabric - Vishal Bamba (Transamerica), Murthy Mathiprakasam (Informatica)41m 23s
End User Panel on Real-Time Data Analytics - Eric Frenkiel (MemSQL), Ian Hanson (Digital Ocean), Noah Zucker (Novus Partners), Michael DePrizio (Akamai Technologies)36m 21s
How Riot Games uses Platfora to improve League of Legends' performance - Peter Schlampp (Platfora), Chris Kudelka (Riot Games)46m 35s
Hydrate a data lake in days with CDAP - Jonathan Gray (Cask)55m 1s
Machine learning in big data – look forward or be left behind - Bill Porto (RedPoint Global)43m 40s
Design patterns for real-time data analytics - Sheetal Dolas (Hortonworks)40m 56s
How Pepsi Wrangles the Diverse Data of Consumer Packaged Goods - Matthew Derda (Pepsi), Doug Stradley (Trifacta)37m 58s
Catalog, secure, and govern your Hadoop data lake - Alex Gorelik (Waterline Data), Jim Kaskade (CSC), David Tabacco (Merck & Co., Inc.), David Paige (Cox Automotive)45m 7s
Enter the snake pit for fast and easy Spark and Cassandra - Jon Haddad (DataStax)24m 22s
Combining open source software and cloud-native data processing services on Google Cloud Platform - Eric Brewer (Google)33m 21s
Think like a data scientist: Build your big data blueprint - Bill Schmarzo (EMC Consulting)45m 58s
Fast fish eat slow fish: How to move faster - Samuel Cozannet (Canonical)14m 28s
Requirements for Secure, Multi-Tenant Hadoop: It’s Much More than YARN - Anant Chintamaneni (BlueData)43m 15s
Patterns from the future - Paul Kent (SAS)46m 33s
Pentaho featuring Forrester: Delivering governed data for analytics at scale - Michele Goetz (Forrester Research), Chuck Yarbrough (Pentaho)31m 37s
Do you know where your data is? - Nidhi Aggarwal (Tamr, Inc.)36m 21s
Case study: How YP.com addresses real-world analytical challenges for SQL on Hadoop - William Theisinger (YP), Ignacio Hwang (HP)27m 56s
How Autodesk is using Tableau to visualize its Kafka-Splunk-Hadoop pipeline - Charlie Crocker (Autodesk)42m 33s
Eventual consistent systems a.k.a mostly inconsistent systems vs. strongly consistent systems in big data - Jagane Sundar (WANdisco)42m 26s
Faster time to insight using Spark, Tachyon, and Zeppelin - Nirmal Ranganathan (Rackspace Hosting)36m 4s
Big data modeling and analytic patterns – beyond schema on read - Ron Bodkin (Think Big, a Teradata Company)38m 43s
Commercializing IOT: What do you need to know? - Ashish Verma (Deloitte)46m 3s
Enable secure data sharing and analytics in Hadoop with 5 key steps - Reiner Kappenberger (HP Security Voltage)45m 23s
Apache Spark as a code-free data science workbench - Michal Iwanowski (DeepSense.io), Piotr Niedzwiedz (DeepSense.io)35m 6s
SAP HANA Vora to query Big Data with greater ease - Balalji Krishna (SAP)54m 19s
Law, Ethics, & Open Data conference sessions 2h 41m
Ask Me Anything conference sessions 2h 33m
Ask me anything: Hadoop application architectures - Gwen Shapira (Confluent), Jonathan Seidman (Cloudera), Ted Malaska (Cloudera), Mark Grover (Cloudera)38m 35s
Ask me anything: Apache Spark - Patrick Wendell (Databricks), Reynold Xin (Databricks)36m 27s
Ask me anything: Hadoop operations for production systems - Miklos Christine (Databricks), Kathleen Ting (Cloudera), Philip Zeyliger (Cloudera), Philip Langdale (Cloudera, Inc.)39m 14s
Ask me anything: Developing a modern enterprise data strategy - John Akred (Silicon Valley Data Science), Julie Steele (Silicon Valley Data Science), Scott Kurth (Silicon Valley Data Science)39m 19s
Security & Governance conference sessions 1h 50m
Solutions Showcase Theater 7h 33m
Show More
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month, and much more. Watch now
Unlock full access
More than 5,000 organizations count on O’Reilly O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement. Julian F. I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology. Addison B. I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed. Amir M. I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do. Mark W.