Skip to Content conference Strata + Hadoop World 2017 - San Jose, California March 2017
Beginner to intermediate
151h 28m
English
Closed Captioning available in German, English, Spanish, French, Japanese, Korean, Portuguese (Portugal, Brazil), Chinese (Simplified), Chinese (Traditional) Course outline Big data & the Cloud 7h 53m
Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 150m 11s
Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 251m 53s
Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 352m 12s
Architecting and building enterprise-class Spark and Hadoop in cloud environments - James Malone (Google), John Mikula (Google Cloud) - Part 454m 11s
Moving big data as a service to a multicloud world - Sriram Ganesan (Qubole), Prakhar Jain (Qubole)38m 30s
BI and SQL analytics with Hadoop in the cloud - Henry Robinson (Cloudera), Alex Gutow (Cloudera)40m 4s
Running a Cloudera cluster in production on Azure - Paige Liu (Microsoft), John Zhuge (Cloudera)36m 13s
RubiX: A caching framework for big data engines in the cloud - Shubham Tagra (Qubole)36m 45s
The enterprise geospatial platform: A perfect fusion of cloud and open source technologies - Naghman Waheed (Monsanto), Martin Mendez-Costabel (Monsanto)32m 56s
Practical considerations for running Spark workloads in the cloud - Anand Iyer (Cloudera), Eugene Fratkin (Cloudera)39m 20s
Alluxio (formerly Tachyon): The journey thus far and the road ahead - Haoyuan Li (Alluxio), Calvin Jia (Alluxio)41m 12s
Data science & advanced analytics 27h 38m
Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 146m 39s
Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 248m 14s
Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 348m 9s
Getting started with TensorFlow - Amy Unruh (Google) and Yufeng Guo (Google) - Part 441m 4s
Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) - Part 125m 18s
Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) - Part 233m 37s
Guerrilla guide to Python and Apache Hadoop - Juliet Hougland (Cloudera) - Part 343m 59s
Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 137m 27s
Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 234m 51s
Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 349m 38s
Modeling big data with R, sparklyr, and Apache Spark - John Mount (Win-Vector LLC) - Part 435m 26s
Scalable deep learning for the enterprise with DL4J - Tom Hanlon (Skymind), Dave Kale (Skymind), Susan Eraly (Skymind), and Josh Patterson (Skymind) - Part 139m 59s
Scalable deep learning for the enterprise with DL4J - Tom Hanlon (Skymind), Dave Kale (Skymind), Susan Eraly (Skymind), and Josh Patterson (Skymind) - Part 241m 19s
Scalable deep learning for the enterprise with DL4J - Tom Hanlon (Skymind), Dave Kale (Skymind), Susan Eraly (Skymind), and Josh Patterson (Skymind) - Part 315m 15s
Uber's data science workbench - Peng Du (Uber Inc.) and Randy Wei (Uber Inc.)40m 30s
How Microsoft predicts churn of cloud customers using deep learning and explains those predictions in an interpretable way - Feng Zhu (Microsoft), Valentine Fontama (Microsoft)46m 23s
Intelligent pattern profiling on semistructured data with machine learning - Sean Kandel (Trifacta), Karthik Sethuraman (Trifacta)40m 51s
Squeezing deep learning onto mobile phones - Anirudh Koul (Microsoft)43m 14s
Recommending 1+ billion items to 100+ million users in real time: Harnessing the structure of the user-to-object graph to extract ranking signals at scale - Jure Leskovec (Pinterest)43m 23s
Semantic natural language understanding at scale using Spark, machine-learned annotators, and deep-learned ontologies - David Talby (Atigeo), Claudiu Branzan (G2 Web Services)40m 1s
Leveraging deep learning to predict breast cancer proliferation scores with Apache Spark and Apache SystemML - Michael Dusenberry (IBM Spark Technology Center), Frederick Reiss (IBM Spark Technology Center)41m 18s
PyTorch: A flexible and intuitive framework for deep learning - James Bradbury (Salesforce Research)43m 1s
The dangers of statistical significance when studying weak effects in big data: From natural experiments to p-hacking - Robert Grossman (University of Chicago)38m 9s
Tensor abuse in the workplace - Ted Dunning (MapR Technologies)40m 26s
The frontiers of attention and memory in neural networks - Stephen Merity (Salesforce Research)43m 25s
Automatic speaker segmentation: Using machine learning to identify who is speaking when - Matar Haller (Winton Capital)29m 3s
Feature engineering for diverse data types - Alice Zheng (Amazon)40m 34s
When is data science a house of cards? Replicating data science conclusions - June Andrews (Pinterest), Frances Haugen (Pinterest)42m 12s
Distributed deep learning on AWS using MXNet - Anima Anandkumar (UC Irvine)37m 20s
The state of TensorFlow today and where it is headed in 2017 - Rajat Monga (Google)40m 43s
Clustering user sessions with NLP methods in complex internet applications - Dorna Bandari (Pinterest Inc.)37m 3s
Weld: An optimizing runtime for high-performance data analytics - Shoumik Palkar (Stanford University)32m 1s
Learning from incomplete, imperfect data with probabilistic programming - Michael Lee Williams (Fast Forward Labs)37m 20s
The power of persuasion modeling - Michelangelo D'Agostino (Civis Analytics), Bill Lattner (Civis Analytics)40m 46s
Making self-service data science a reality - Matt Brandwein (Cloudera), Tristan Zajonc (Cloudera)40m 27s
The app trap: Why every mobile app needs anomaly detection - Ira Cohen (Anodot)39m 40s
Predicting customer lifetime value for a subscription-based business - Chao Zhong (Microsoft)37m 12s
Building a recommender from a big behavior graph over Cassandra - Gleicon Moraes (luc.id), Arthur Grava (Luizalabs)37m 59s
Seven steps to high-velocity data analytics with DataOps - Christopher Bergh (DataKitchen), Gil Benghiat (DataKitchen)39m 37s
Machine learning to automate localization with Apache Spark and other open source tools - Michelle Casbon (Qordoba)39m 3s
Compressed linear algebra in Apache SystemML - Frederick Reiss (IBM Spark Technology Center), Arvind Surve (IBM)43m 19s
Leveraging open source automated data science tools - Eduardo Arino de la Rubia (Domino Data Lab)42m 38s
Law, ethics, governance 1h 55m
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 130m 17s
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 233m 2s
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 329m 23s
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 440m 34s
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Andy Konwinski (Databricks) - Part 536m 41s
Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 145m 14s
Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 244m 16s
Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 346m 32s
Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 447m 58s
Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 137m 37s
Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 227m 9s
Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 336m 15s
Unraveling data with Spark using machine learning - Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera) - Part 447m 2s
Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 142m 12s
Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 247m 21s
Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 330m 25s
Just enough Scala for Spark - Dean Wampler (Lightbend) - Part 451m 11s
Zillow: Transforming real estate through big data and machine learning - Jasjeet Thind (Zillow)40m 42s
Spark Structured Streaming for machine learning - Holden Karau (IBM), Seth Hendrickson (IBM)38m 13s
Sparklyr: An R interface for Apache Spark - Edgar Ruiz (RStudio)39m 21s
Spark at scale in Bing: Use cases and lessons learned - Kaarthik Sivashanmugam (Microsoft)41m 13s
Hoodie: Incremental processing on Hadoop at Uber - Vinoth Chandar (Uber), Prasanna Rajaperumal (Uber)40m 5s
How Spark can fail or be confusing and what you can do about it - Yin Huai (Databricks)39m 21s
Debugging Apache Spark - Holden Karau (IBM), Joey Echeverria (Rocana)38m 47s
Effective Spark with Alluxio - Calvin Jia (Alluxio)40m 33s
Visualization & user experience 8h 47m
Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 134m 0s
Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 236m 13s
Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 346m 53s
Exploration and visualization of large, complex datasets with R, Hadoop, and Spark - Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting) - Part 449m 30s
Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 139m 12s
Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 245m 6s
Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 344m 47s
Introduction to visualizations using D3 - Brian Suda (optional.is) - Part 443m 13s
Data Science and Design Or, on the unpredictability of the iterative design process - Rumman Chowdhury (Accenture)36m 36s
Beyond polarization: Data UX for a diversity of workers - Joe Hellerstein (UC Berkeley), Giorgio Caviglia (Trifacta), Alon Bartur (Trifacta)40m 39s
Bringing data into design: How to craft personalized user experiences - Ricky Hennessy (frog), Charlie Burgoyne (frog)37m 36s
Why the next wave of data lineage is driven by automation, visualization, and interaction - Sean Kandel (Trifacta)39m 59s
Building interactive data products for risk measurement and monitoring - Warren Reed (US Treasury’s Office of Financial Research)34m 14s
Platform security & cybersecurity 6h 14m
A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 156m 36s
A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 248m 7s
A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 335m 1s
A practitioner’s guide to securing your Hadoop cluster - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera) - Part 448m 18s
Paint the landscape and secure your data center with Apache Spot - Cesar Berho (Intel), Alan Ross (Intel)38m 33s
Cloudy with a chance of fraud: A look at cloud-hosted attack trends - Ting-Fang Yen (DataVisor)33m 42s
Pluggable security in Hadoop - Yuliya Feldman (Dremio Corporation)35m 56s
Don’t sleep on sleeper cells: Using big data to drive detection - Yinglian Xie (DataVisor)37m 42s
Malicious site detection with large-scale belief propagation - Alexander Ulanov (Hewlett Packard Labs), Manish Marwah (Hewlett Packard Labs)40m 59s
Data engineering and architecture 6h 52m
Sponsored Sessions 14h 10m
Real-time analytics at Uber scale (sponsored by MemSQL) - James Burkhart (Uber)43m 27s
Ingredients to a successful data analytics project (sponsored by Dell EMC) - Erin Banks (Dell EMC)39m 12s
Advanced data federation and cost-based optimization using Apache Calcite and Spark SQL (sponsored by DataScience) - Jason Slepicka (DataScience)42m 24s
Big data analytics accelerating innovation in sports (sponsored by Intel) - Sasi Kuppannagari (Intel Corporation)42m 14s
Fixing what’s broken: Big data in the enterprise (sponsored by Cask) - Jonathan Gray (Cask)41m 47s
Machine learning and microservices: A framework for next-gen applications (sponsored by MapR Technologies) - Nitin Bandugula (MapR Technologies)37m 1s
Building a modern data architecture (sponsored by Zaloni) - Ben Sharma (Zaloni)41m 10s
Building an automation-driven Lambda architecture (sponsored by BMC) - Darren Chinen (Malwarebytes), Sujay Kulkarni (Malwarebytes), Manjunath Vasishta (Malwarebytes)33m 39s
Get data lakes, data catalogs, and real-time streams in less time with fewer people and more machine learning (sponsored by Informatica) - Murthy Mathiprakasam (Informatica)37m 0s
Continuous queries over high-velocity event streams using an in-memory database (sponsored by VoltDB) - Ethan Zhang (VoltDB)37m 25s
Five steps to a killer data lake, from ingest to machine learning (sponsored by Pentaho) - Mark Burnette (Pentaho, a Hitachi Group Company)32m 48s
When big data leads to big results (sponsored by Paxata) - Chandhu Yalla (Intel), Nenshad Bardoliwalla (Paxata)41m 5s
Outsmarting insider threats: Safeguarding your most sensitive assets (sponsored by SAS) - Charlotte Crain (SAS), Tyler Freckman (SAS)37m 12s
Exploiting Hadoop with artificial intelligence and machine learning (sponsored by DataRobot) - Greg Michaelson (DataRobot)31m 27s
How Peak Games is building analytics infrastructure to improve user experience (sponsored by Snowflake) - Serdar Sahin (Peak Games)31m 58s
Building data lakes in the cloud with self-service access (sponsored by Talend) - Eric Anderson (Beachbody), Shyam Konda (Beachbody)40m 1s
Virtualizing Hadoop and Spark: Architecture, performance, and best practices (sponsored by VMware) - Justin Murray (VMware)45m 32s
Fregata: TalkingData's lightweight, large-scale machine-learning library on Spark (sponsored by TalkingData) - Xiatian Zhang (TalkingData Ltd.)23m 34s
Presto: Distributed SQL on anything (sponsored by Teradata) - Kamil Bajda-Pawlikowski (Teradata)41m 34s
Using big data, the cloud, and AI to enable intelligence at scale (sponsored by Microsoft) - Wee Hyong Tok (Microsoft), Danielle Dean (Microsoft)36m 45s
Modern big data service architecture: Evolving from cloud-native and serverless to intelligent data clouds (sponsored by Futurewei Technologies) - Luhui Hu (Futurewei Technologies)28m 41s
Machine learning with Google Cloud Platform (sponsored by Google) - Rob Craft (Google)34m 23s
Replication as a service (sponsored by WANDisco) - Jagane Sundar (WANdisco)29m 52s
Hadoop platform & applications 3h 30m
Data, transportation, and logistics 3h 41m
Stream processing & analytics 8h 48m
Sensors, IOT & Industrial Internet 1h 22m
Real-time applications 4h 28m
Data-driven business management 8h 35m
The Solutions Showcase Theater 4h 45m
Cloudera and SAS: Leaders Coming Together - Clark Bradley, Principal Technical Architect (Cloudera)5m 54s
Streaming and Microservices for Fast Data - Dale Kim,Sr. Director, Industry Solutions (MapR)9m 44s
Scoring Machine Learning Models at Scale - John Bowler, Software Engineer (MemSQL)10m 13s
SAP Vehicles Network helps Hertz and Mojio improve customer experience - Steven Kim, Sr. Director, Connected Vehicles (SAP)10m 50s
Thinking Data Lakes – From build to operate - Prakul Sharma, Senior Manager (Deloitte)8m 28s
How Industry Leader Speeds & Blends Data for Daily Insights - Nicolas Morales, Sr. Director, Technical Sales & Solutions (Clearstory Data)10m 7s
Water Mission’s solar-powered Living Water™ Treatment Systems bring clean, safe water to thousands of communities - Andrea Braida, Portfolio Marketing Manager (IBM)10m 3s
Building and Shipping Models That Really Work! - Ali Marami, Chief Data Scientist (R-Brain)6m 25s
Achieving Efficient Analytics and Management of Indexes - Munir Bondre, CTO (Fuzzy Logix)9m 35s
How Spotify moved from one of Europe's largest on-prem Hadoop clusters to Google Cloud - William Vambenepe, Senior Product Manager (Google)12m 55s
Moving complex retail analytics onto Hadoop - Sharon Kirkham, VP Analytics and Consultancy (Kognitio)8m 50s
Hyper-Acceleration of Big Data Workloads with FPGAs - Roop Ganguly, Solution Architect (Bigstream Solutions)8m 50s
The Self-Service Platform for Data Engineering & Data Science - Lovan Chetty, Director of Product Management (Cazena)9m 25s
How to Automate Data Operations so You Can Build Machine Learning and Advanced Analytics - Saket Saurabh, Co-founder and CEO (Nexla)10m 49s
Building the modern data platform - David Hsieh, CMO (Qubole)8m 49s
Right-Sizing Your Big Data Infrastructure - Tom Lyon, Founder and Chief Scientist (Drivescale)8m 11s
Deep Learning Solutions with BIG DL - Radhika Rangarajan, Senior Technical Program Manager, Big Data (Intel)9m 10s
How the DataScience Cloud Helped Topix Optimize Advertising Decisions - William Merchan, CSO (DataScience)9m 25s
How to easily maximize performance and minimize cost on the cloud - Kunal Agarwal, CEOCTO (Unravel)6m 16s
Power BI in action - Sanjay Soni, Sr.Technical Product Marketing Manager (Microsoft)10m 36s
In-Memory Computing for Real-Time Big Data - Nikita Ivanov, CTO (Gridgain)6m 37s
Machine Learning Automation and Social Analytics - Siva Gopal and Devi Kondapi (MSR Cosmos)9m 33s
User-enabled Data Lakes with Open Source Kylo - Scott Reisdorf, Principal R&D Software Engineer (Think Big Analytics)4m 47s
User experience in data analytics platform - a design thinking approach - Naresh Agarwal, AVP, Brillio Data Practice (Brillio)8m 56s
Super Simple Hybrid Data Access - Sumit Sarkar, Sr. Manager, Product Marketing (Progress Software)10m 6s
Cloudera and SAS: Leaders Coming Together - Jesse Luebert, Solutions Architect (SAS)8m 33s
Big Data Managed Services by CenturyLink - Avinash Gupta - VP Sales and Marketing, Andrew Clyne, VP Chief Data Officer, James Foppe, Sr. Engineer (CenturyLink) 11m 30s
Accelerating Data Science delivery with DevOps - Aziz Shamim,Solutions Engineering Manager, Americas Central (Github)9m 40s
Converge Machine Learning, Streaming Analytics, and BI with a GPU-accelerated In-Memory Analytics Database - Manan Goel, Vice President of Products (Kinetica) 9m 35s
Scaling Bi & analytics for Hadoop & cloud-based platforms - Priyank Patel, Co-founder & Chief Product Officer (Arcadia Data)12m 12s
Strategies for organizations to survive and thrive with data science. - Cameron Sim, CEO (Crewspark)9m 45s
Business case studies 6h 36m
Strata Business Summit 1h 56m
Enterprise adoption 3h 55m
Show More
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month, and much more. Watch now
Unlock full access
More than 5,000 organizations count on O’Reilly O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement. Julian F. I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology. Addison B. I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed. Amir M. I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do. Mark W.