conference

Strata + Hadoop World 2017 - San Jose, California

by O'Reilly Media, Inc.

March 2017

Beginner to intermediate

151h 28m

English

O'Reilly Media, Inc.

Closed Captioning available in German, English, Spanish, French, Japanese, Korean, Portuguese (Portugal, Brazil), Chinese (Simplified), Chinese (Traditional)

Watch now

Unlock full access

Course outline

The machine-learning renaissance Mike Olson (Cloudera)
13m 49s
Applying data and machine learning to scale education Daphne Koller (Calico Labs | Coursera)
19m 31s
Turning the internet upside down: Driving big data right to the edge (sponsored by MapR) Ted Dunning (MapR Technologies)
8m 55s
Launching Pokémon GO Phil Keslin (Niantic, Inc.), Beau Cronin (Embedding.js)
18m 3s
Machines and the magic of fast learning (sponsored by MemSQL) Eric Frenkiel (MemSQL)
5m 33s
Becoming smarter about credible news Tom Reilly (Cloudera), Khalid Al-Kofahi (Thomson Reuters)
9m 5s
Making good robots Andra Keay (Silicon Valley Robotics)
13m 20s
Big data, AI, the genome, and everything (sponsored by Microsoft) Vijay Narayanan (Microsoft)
14m 12s
Ray: A Distributed Execution Framework for Emerging AI Applications - Michael Jordan (UC Berkeley)
12m 58s
Driving enterprise open source adoption, from data lake to AI (sponsored by Teradata) Ron Bodkin (Think Big Analytics)
6m 3s
Data in disasters: Saving lives and innovating in real time Desiree Matel-Anderson (The Field Innovation Team)
14m 8s
Machine learning is about your data and deployment, not just model development (sponsored by IBM) Dinesh Nirmal (IBM)
6m 6s
Machine learning at Google (sponsored by Google) Rob Craft (Google)
7m 17s
The business case for deep learning, Spark, and friends - Edd Wilder-James (Silicon Valley Data Science)
33m 32s
Why stream? The advantages of working with streaming data - Ellen Friedman (Independent)
27m 23s
Cloudy with a chance of on-prem - Jim Scott (MapR Technologies, Inc.)
30m 26s
Stats: What you need to know - Gabriela de Queiroz (R-Ladies)
28m 3s
What is AI? - Melanie Warrick (Skymind)
26m 13s
Visualization without guesswork - Aneesh Karve (Quilt Data, Inc)
31m 16s
Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 1
50m 11s
Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 2
51m 53s
Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 3
52m 12s
Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 4
54m 11s
Moving big data as a service to a multicloud world - Sriram Ganesan (Qubole), Prakhar Jain (Qubole)
38m 30s
BI and SQL analytics with Hadoop in the cloud - Henry Robinson (Cloudera), Alex Gutow (Cloudera)
40m 4s
Running a Cloudera cluster in production on Azure - Paige Liu (Microsoft), John Zhuge (Cloudera)
36m 13s
RubiX: A caching framework for big data engines in the cloud - Shubham Tagra (Qubole)
36m 45s
The enterprise geospatial platform: A perfect fusion of cloud and open source technologies - Naghman Waheed (Monsanto), Martin Mendez-Costabel (Monsanto)
32m 56s
Practical considerations for running Spark workloads in the cloud - Anand Iyer (Cloudera), Eugene Fratkin (Cloudera)
39m 20s
Alluxio (formerly Tachyon): The journey thus far and the road ahead - Haoyuan Li (Alluxio), Calvin Jia (Alluxio)
41m 12s
Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 1
46m 39s
Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 2
48m 14s
Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 3
48m 9s
Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 4
41m 4s
Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) - Part 1
25m 18s
Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) - Part 2
33m 37s
Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) - Part 3
43m 59s
Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 1
37m 27s
Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 2
34m 51s
Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 3
49m 38s
Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 4
35m 26s
Scalable deep learning for the enterprise with DL4J - Tom Hanlon (Skymind), Dave Kale (Skymind), Susan Eraly (Skymind), and Josh Patterson (Skymind) - Part 1
39m 59s
Scalable deep learning for the enterprise with DL4J - Tom Hanlon (Skymind), Dave Kale (Skymind), Susan Eraly (Skymind), and Josh Patterson (Skymind) - Part 2
41m 19s
Scalable deep learning for the enterprise with DL4J - Tom Hanlon (Skymind), Dave Kale (Skymind), Susan Eraly (Skymind), and Josh Patterson (Skymind) - Part 3
15m 15s
Uber's data science workbench - Peng Du (Uber Inc.) and Randy Wei (Uber Inc.)
40m 30s
How Microsoft predicts churn of cloud customers using deep learning and explains those predictions in an interpretable way - Feng Zhu (Microsoft), Valentine Fontama (Microsoft)
46m 23s
Intelligent pattern profiling on semistructured data with machine learning - Sean Kandel (Trifacta), Karthik Sethuraman (Trifacta)
40m 51s
Squeezing deep learning onto mobile phones - Anirudh Koul (Microsoft)
43m 14s
Recommending 1+ billion items to 100+ million users in real time: Harnessing the structure of the user-to-object graph to extract ranking signals at scale - Jure Leskovec (Pinterest)
43m 23s
Semantic natural language understanding at scale using Spark, machine-learned annotators, and deep-learned ontologies - David Talby (Atigeo), Claudiu Branzan (G2 Web Services)
40m 1s
Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML - Michael Dusenberry (IBM Spark Technology Center), Frederick Reiss (IBM Spark Technology Center)
41m 18s
PyTorch: A flexible and intuitive framework for deep learning - James Bradbury (Salesforce Research)
43m 1s
The dangers of statistical significance when studying weak effects in big data: From natural experiments to p-hacking - Robert Grossman (University of Chicago)
38m 9s
Tensor abuse in the workplace - Ted Dunning (MapR Technologies)
40m 26s
The frontiers of attention and memory in neural networks - Stephen Merity (Salesforce Research)
43m 25s
Automatic speaker segmentation: Using machine learning to identify who is speaking when - Matar Haller (Winton Capital)
29m 3s
Feature engineering for diverse data types - Alice Zheng (Amazon)
40m 34s
When is data science a house of cards? Replicating data science conclusions - June Andrews (Pinterest), Frances Haugen (Pinterest)
42m 12s
Distributed deep learning on AWS using MXNet - Anima Anandkumar (UC Irvine)
37m 20s
The state of TensorFlow today and where it is headed in 2017 - Rajat Monga (Google)
40m 43s
Clustering user sessions with NLP methods in complex internet applications - Dorna Bandari (Pinterest Inc.)
37m 3s
Weld: An optimizing runtime for high-performance data analytics - Shoumik Palkar (Stanford University)
32m 1s
Learning from incomplete, imperfect data with probabilistic programming - Michael Lee Williams (Fast Forward Labs)
37m 20s
The power of persuasion modeling - Michelangelo D'Agostino (Civis Analytics), Bill Lattner (Civis Analytics)
40m 46s
Making self-service data science a reality - Matt Brandwein (Cloudera), Tristan Zajonc (Cloudera)
40m 27s
The app trap: Why every mobile app needs anomaly detection - Ira Cohen (Anodot)
39m 40s
Predicting customer lifetime value for a subscription-based business - Chao Zhong (Microsoft)
37m 12s
Building a recommender from a big behavior graph over Cassandra - Gleicon Moraes (luc.id), Arthur Grava (Luizalabs)
37m 59s
Seven steps to high-velocity data analytics with DataOps - Christopher Bergh (DataKitchen), Gil Benghiat (DataKitchen)
39m 37s
Machine learning to automate localization with Apache Spark and other open source tools - Michelle Casbon (Qordoba)
39m 3s
Compressed linear algebra in Apache SystemML - Frederick Reiss (IBM Spark Technology Center), Arvind Surve (IBM)
43m 19s
Leveraging open source automated data science tools - Eduardo Arino de la Rubia (Domino Data Lab)
42m 38s
Executive Briefing: Doing data right—Legal best practices for making your data work - Alysa Z. Hutnik (Kelley Drye & Warren LLP), Crystal Skelton (Kelley Drye & Warren LLP)
38m 19s
Big data governance for the hybrid cloud: Best practices and how-to - Mark Donsky (Cloudera), Sudhanshu Arora (Cloudera)
38m 12s
Data at risk: Backing up the world's research data - Max Ogden (Independent)
39m 28s
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 1
30m 17s
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 2
33m 2s
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 3
29m 23s
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 4
40m 34s
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 5
36m 41s
Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 1
45m 14s
Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 2
44m 16s
Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 3
46m 32s
Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 4
47m 58s
Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 1
37m 37s
Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 2
27m 9s
Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 3
36m 15s
Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 4
47m 2s
Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 1
42m 12s
Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 2
47m 21s
Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 3
30m 25s
Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 4
51m 11s
Zillow: Transforming real estate through big data and machine learning - Jasjeet Thind (Zillow)
40m 42s
Spark Structured Streaming for machine learning - Holden Karau (IBM), Seth Hendrickson (IBM)
38m 13s
Sparklyr: An R interface for Apache Spark - Edgar Ruiz (RStudio)
39m 21s
Spark at scale in Bing: Use cases and lessons learned - Kaarthik Sivashanmugam (Microsoft)
41m 13s
Hoodie: Incremental processing on Hadoop at Uber - Vinoth Chandar (Uber), Prasanna Rajaperumal (Uber)
40m 5s
How Spark can fail or be confusing and what you can do about it - Yin Huai (Databricks)
39m 21s
Debugging Apache Spark - Holden Karau (IBM), Joey Echeverria (Rocana)
38m 47s
Effective Spark with Alluxio - Calvin Jia (Alluxio)
40m 33s
Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 1
34m 0s
Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 2
36m 13s
Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 3
46m 53s
Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 4
49m 30s
Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 1
39m 12s
Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 2
45m 6s
Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 3
44m 47s
Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 4
43m 13s
Data Science and Design Or, on the unpredictability of the iterative design process - Rumman Chowdhury (Accenture)
36m 36s
Beyond polarization: Data UX for a diversity of workers - Joe Hellerstein (UC Berkeley), Giorgio Caviglia (Trifacta), Alon Bartur (Trifacta)
40m 39s
Bringing data into design: How to craft personalized user experiences - Ricky Hennessy (frog), Charlie Burgoyne (frog)
37m 36s
Why the next wave of data lineage is driven by automation, visualization, and interaction - Sean Kandel (Trifacta)
39m 59s
Building interactive data products for risk measurement and monitoring - Warren Reed (US Treasury’s Office of Financial Research)
34m 14s
A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 1
56m 36s
A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 2
48m 7s
A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 3
35m 1s
A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 4
48m 18s
Paint the landscape and secure your data center with Apache Spot - Cesar Berho (Intel), Alan Ross (Intel)
38m 33s
Cloudy with a chance of fraud: A look at cloud-hosted attack trends - Ting-Fang Yen (DataVisor)
33m 42s
Pluggable security in Hadoop - Yuliya Feldman (Dremio Corporation)
35m 56s
Don’t sleep on sleeper cells: Using big data to drive detection - Yinglian Xie (DataVisor)
37m 42s
Malicious site detection with large-scale belief propagation - Alexander Ulanov (Hewlett Packard Labs), Manish Marwah (Hewlett Packard Labs)
40m 59s
Big data for operational insights - Felix Gorodishter (GoDaddy)
39m 51s
Shifting left for continuous quality in an Agile data world - Avinash Padmanabhan (Intuit)
32m 56s
Mistakes were made, but not by us: Lessons from a year of supporting Apache Kafka - Ryan Pridgeon (Confluent), Dustin Cote (Confluent)
42m 20s
Achieving real-time ingestion and analysis of security events through Kafka and Metron - Kevin Mao (Capital One)
27m 45s
The Netflix data platform: Now and in the future - Kurt Brown (Netflix)
38m 3s
Making architecture choices for small and big data problems - Nischal HP (Unnati Data Labs), Raghotham Sripadraj (Unnati Data Labs)
30m 30s
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at LinkedIn - Shirshanka Das (LinkedIn), Yael Garten (LinkedIn)
42m 13s
The future of column-oriented data processing with Arrow and Parquet - Julien Le Dem (Dremio), Jacques Nadeau (Dremio)
41m 33s
DevOps for models: How to manage millions of models in production - Teresa Tung (Accenture Labs), Jurgen Weichenberger (Accenture Analytics), Ishmeet Grewal (Accenture Technology Labs)
40m 26s
One cluster does not fit all: Architecture patterns for multicluster Apache Kafka deployments - Gwen Shapira (Confluent)
38m 25s
Deep learning for IT operations intelligence using open source tools - Shivnath Babu (Duke University | Unravel Data Systems)
38m 28s
Real-time analytics at Uber scale (sponsored by MemSQL) - James Burkhart (Uber)
43m 27s
Ingredients to a successful data analytics project (sponsored by Dell EMC) - Erin Banks (Dell EMC)
39m 12s
Advanced data federation and cost-based optimization using Apache Calcite and Spark SQL (sponsored by DataScience) - Jason Slepicka (DataScience)
42m 24s
Big data analytics accelerating innovation in sports (sponsored by Intel) - Sasi Kuppannagari (Intel Corporation)
42m 14s
Fixing what’s broken: Big data in the enterprise (sponsored by Cask) - Jonathan Gray (Cask)
41m 47s
Machine learning and microservices: A framework for next-gen applications (sponsored by MapR Technologies) - Nitin Bandugula (MapR Technologies)
37m 1s
Building a modern data architecture (sponsored by Zaloni) - Ben Sharma (Zaloni)
41m 10s
Building an automation-driven Lambda architecture (sponsored by BMC) - Darren Chinen (Malwarebytes), Sujay Kulkarni (Malwarebytes), Manjunath Vasishta (Malwarebytes)
33m 39s
Get data lakes, data catalogs, and real-time streams in less time with fewer people and more machine learning (sponsored by Informatica) - Murthy Mathiprakasam (Informatica)
37m 0s
Continuous queries over high-velocity event streams using an in-memory database (sponsored by VoltDB) - Ethan Zhang (VoltDB)
37m 25s
Five steps to a killer data lake, from ingest to machine learning (sponsored by Pentaho) - Mark Burnette (Pentaho, a Hitachi Group Company)
32m 48s
When big data leads to big results (sponsored by Paxata) - Chandhu Yalla (Intel), Nenshad Bardoliwalla (Paxata)
41m 5s
Outsmarting insider threats: Safeguarding your most sensitive assets (sponsored by SAS) - Charlotte Crain (SAS), Tyler Freckman (SAS)
37m 12s
Exploiting Hadoop with artificial intelligence and machine learning (sponsored by DataRobot) - Greg Michaelson (DataRobot)
31m 27s
How Peak Games is building analytics infrastructure to improve user experience (sponsored by Snowflake) - Serdar Sahin (Peak Games)
31m 58s
Building data lakes in the cloud with self-service access (sponsored by Talend) - Eric Anderson (Beachbody), Shyam Konda (Beachbody)
40m 1s
Virtualizing Hadoop and Spark: Architecture, performance, and best practices (sponsored by VMware) - Justin Murray (VMware)
45m 32s
Fregata: TalkingData's lightweight, large-scale machine-learning library on Spark (sponsored by TalkingData) - Xiatian Zhang (TalkingData Ltd.)
23m 34s
Presto: Distributed SQL on anything (sponsored by Teradata) - Kamil Bajda-Pawlikowski (Teradata)
41m 34s
Using big data, the cloud, and AI to enable intelligence at scale (sponsored by Microsoft) - Wee Hyong Tok (Microsoft), Danielle Dean (Microsoft)
36m 45s
Modern big data service architecture: Evolving from cloud-native and serverless to intelligent data clouds (sponsored by Futurewei Technologies) - Luhui Hu (Futurewei Technologies)
28m 41s
Machine learning with Google Cloud Platform (sponsored by Google) - Rob Craft (Google)
34m 23s
Replication as a service (sponsored by WANDisco) - Jagane Sundar (WANdisco)
29m 52s

Content preview from Strata + Hadoop World 2017 - San Jose, California

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Watch now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnb

BlueOrigin

Electronic Arts

HomeDepot

Nasdaq

Rakuten

Tata Consultancy Services

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

You might also like

Strata + Hadoop World 2016 - San Jose, California: Video Compilation

Strata + Hadoop World 2016 - San Jose, California: Video Compilation

O'Reilly Media, Inc.

Strata Data Conference - San Jose 2018

Strata Data Conference - San Jose 2018

O'Reilly Media, Inc.

Strata + Hadoop World New York 2015: Video Compilation

Strata + Hadoop World New York 2015: Video Compilation

O'Reilly Media, Inc.

Strata Conference New York + Hadoop World 2014: Video Compilation

Strata Conference New York + Hadoop World 2014: Video Compilation

O'Reilly Media, Inc.

Publisher Resources

ISBN: 9781491976166Errata Page