Strata + Hadoop World London 2015: Video Compilation

Video description

Explore solutions to your most challenging data problems

How are large businesses using data? What happens when data from the Internet of Things starts flowing in earnest? Find out with this complete video compilation of the Strata Conference + Hadoop World Conference in London during May 2015. You’ll be front-row center for every presentation, whether it’s a keynote, a tutorial, or a workshop.

In eight tracks, this year’s conference captured the most challenging problems and compelling opportunities in data today, including:

  • Business & Industry: Learn how Google, Goldman Sachs, Intel, Ebay, CERN, BBC, Siemens AG, and other businesses use data—and how you can benefit from their hard-won experiences—in this expanded track.
  • Internet of Things: Prepare for one of data’s biggest challenges: the torrent of data from the IoT.
  • Hadoop Platform: Dive deep into the dominant big data stack, with practical lessons, integration tricks, and glimpses of the road ahead.
  • Hadoop & Beyond: Discover tools beyond Hadoop—like Cassandra, Storm, Accumulo, Kafka, and Spark—and how they fit in the data science toolkit.
  • Data Science: Master the latest algorithms and advances in machine learning.
  • Tools & Technology: Learn big data and analytics tools and techniques first-hand from developers at data’s cutting edge.
  • Design: Tackle vital design issues like user experience, experimental design, new interfaces, interactivity, and visualization.
  • Privacy, Law, and Ethics: Gain insight into the thorny issues of privacy, governance, ethics, and compliance.

Download these videos or stream them through our HD player, and gain a clear perspective on data, including all the analytics, architectures, techniques, tools, and technologies you need to use it successfully.

Publisher resources

View/Submit Errata

Table of contents

  1. Keynotes
    1. Hadoop 2015: What we’ve learned in 5 years - Rick Farnell (Think Big, A Teradata Company) 00:11:30
    2. Keynote with Cait O'Riordan (Shazam) 00:09:35
    3. Bigtable’s next big step - Cory O'Connor (Google) 00:05:24
    4. Keynote with Julie Mayer (Ariadne Capital) 00:12:37
    5. Ideas that Matter - Tim Harford (The Financial Times) 00:20:46
    6. British Telecom Featured Keynote - Phillip Radley (BT) 00:12:19
    7. Road to real-time digital business - Rod Smith (IBM Emerging Internet Technologies) 00:10:06
    8. Keynote with Christina Flounders (Bloomberg LP) 00:08:43
    9. Connected Car – World Record Race - Gareth Martin (HP Enterprise Services) 00:05:26
    10. Bringing life to design: Data science in 3D - Mike Haley (Autodesk, Inc.) 00:10:26
    11. Hadoop: It’s as easy as riding a bike - Tamara Dull (SAS Institute Inc.) 00:06:47
    12. Is Privacy Becoming a Luxury Good? - Julia Angwin (ProPublica) 00:13:22
  2. Business & Industry
    1. Where the rubber meets the road: Decision-making based on data - Christine Foster (ShopKeep) 00:42:24
    2. It Ain’t What You Do To Data, It’s What You Do With It - Edd Dumbill (Silicon Valley Data Science) 00:39:27
    3. Bridging big data with big health - Leslie McIntosh (Washington University School of Medicine) 00:39:51
    4. Automating decision-making with big data: How to make it work - Lars Trieloff (Blue Yonder) 00:44:11
    5. Data-driven retailing in the modern world - Jason Foster (Marks and Spencer) 00:37:41
    6. The data strategy revolution: building an in-house data insights lab - Nathan Shetterley (Accenture), Hallie Benjamin (Accenture), and John Miller (Accenture) 00:40:02
    7. The curiosity advantage: the most important skill for data science - Oana Calugar (AliveShoes ) 00:22:07
    8. Data Strategy and the CDO - Scott Kurth (Silicon Valley Data Science) and Julie Steele (Silicon Valley Data Science) 00:37:48
    9. Using data science to transform OpenTable into your local dining expert - Sudeep Das (OpenTable) 00:37:38
    10. Sex, drugs and data: Using web data to add £3bn to the UK economy - Andrew Fogg (import • io ) 00:34:36
  3. Data Science
    1. Introduction to Machine Learning with IPython and scikit-learn - Olivier Grisel - Part 1 1:40:56
    2. Introduction to Machine Learning with IPython and scikit-learn - Olivier Grisel - Part 2 1:22:40
    3. Reproducible research with R and Shiny - Garrett Grolemund (RStudio) - Part 1 1:30:04
    4. Reproducible research with R and Shiny - Garrett Grolemund (RStudio) - Part 2 1:02:19
    5. Ideas that Matter - Tim Harford (The Financial Times) 00:47:07
    6. Improving feature engineering in the lab and production with Ivory - Ben Lever (Ambiata) 00:40:51
    7. Deploying machine learning in production - Alice Zheng (Dato) 00:21:34
    8. What we are made of: Analyzing the human genome with SQL - Felipe Hoffa (Google) 00:14:34
    9. What's there to know about A/B testing? - Noel Welsh (Underscore Consulting) 00:18:10
    10. Measuring the benefit effect for customers with Bayesian predictive modeling - JeongMin Kwon (SK Planet) 00:14:57
    11. Forecasting Space-time Events - Jeremy Heffner (Azavea) 00:44:04
    12. Deep learning made doubly easy with reusable deep features - Carlos Guestrin (Dato) 00:46:27
    13. A taste of random decision forests on Apache Spark - Sean Owen (Cloudera) 00:44:29
    14. Scalable machine learning - Mikio Braun (TU Berlin) 00:36:03
    15. Hunting criminals with hybrid analytics, semi-supervised learning, and agent feedback - David Talby and Claudiu Branzan (Atigeo) 00:22:07
    16. Big data 2.0 democratizes machine learning technology for Wall Street - Divanny Lamas (Context Relevant) 00:15:28
    17. Poor man's parallel pipelines - Jeroen Janssens (Elsevier) 00:25:06
    18. Fast > Perfect: Practical approximation examples for mobile app analytics using Spark Streaming - Kevin Schmidt and Luis Angel Vicente Sanchez (Mind Candy Ltd.) 00:39:01
    19. Untangling influence and desire: Visual analysis of massive graph data - David Jonker and Scott Langevin (Uncharted Software Inc.) 00:36:06
  4. Design
    1. D3.js Tutorial - D3 and interactive visualizations for everyone! - Sebastian Gutierrez ( - Part 1 1:31:52
    2. D3.js Tutorial - D3 and interactive visualizations for everyone! - Sebastian Gutierrez ( - Part 2 1:50:04
  5. Hadoop & Beyond
    1. Apache Spark: The faster new execution engine for Apache Hive - Xuefu Zhang (Cloudera) and Rui Li (Intel) 00:39:10
    2. Apache Spark: What's new; what's coming - Patrick Wendell (Databricks) 00:49:20
    3. Systems that enable data agility: Lessons from LinkedIn - Martin Kleppmann (Independent) 00:40:59
    4. Using the Zeta Architecture: To become a hero - Jim Scott (MapR Technologies, Inc.) 00:56:16
    5. From Bigtable to HBase and back again - history and future - Cory O'Connor (Google) and Emre Baran (Qubit) 00:42:48
    6. Big JSON, baffling performance - Jacques Nadeau (Apache Foundation/MapR) 00:45:11
    7. Search evolved: Unraveling your data - Costin Leau (Elastic) 00:45:03
    8. Taming the firehose: Build analytics over 45 billion tweets using Elasticsearch and Spark - Shashank Singh (Microsoft) and Anirudh Koul (Microsoft) 00:46:07
    9. SPARKTA: A real-time analytics platform based on Apache Spark - Oscar Méndez (STRATIO) and David Morales (STRATIO) 00:41:17
    10. Spark on Mesos - Dean Wampler (Typesafe) 00:38:06
    11. Say goodbye to batch - Tyler Akidau (Google) 00:42:35
    12. Introducing Apache Flink: Fast and reliable data analytics in clusters - Stephan Ewen (Data Artisans) 00:47:08
  6. Hadoop Platform
    1. Hadoop Application Architectures Ask Us Anything - Moderated by: Mark Grover - Panelists: Jonathan Seidman, Gwen Shapira, and Ted Malaska - Part 1 1:27:10
    2. Hadoop Application Architectures - Ask Us Anything - Moderated by: Mark Grover - Panelists: Jonathan Seidman, Gwen Shapira, and Ted Malaska - Part 2 1:22:18
    3. Building an Apache Hadoop Data Application - Tom White, Joey Echeverria, and Ryan Blue - Part 1 1:23:54
    4. Building an Apache Hadoop Data Application - Tom White, Joey Echeverria, and Ryan Blue - Part 2 00:50:39
    5. Friction-free ETL: Automating data transformation with Impala - Marcel Kornacker (Cloudera, Inc.) 00:39:46
    6. Apache Kylin - Extreme OLAP engine for Hadoop - Luke Han (eBay) and Yang Li (eBay) 00:46:27
    7. Scaling SQL-on-Hadoop for BI - Yanpei Chen (Cloudera) and Dileep Kumar (Cloudera) 00:45:06
    8. The year in review - key changes in the Hadoop platform in the past 12 months - Jairam Ranganathan (Cloudera) 00:38:50
    9. Information architecture for Apache Hadoop - Mark Samson (Cloudera) 00:38:08
    10. The Future of Apache Hadoop Security - Joey Echeverria (Rocana) 00:39:52
    11. Transparent encryption in HDFS - Charles Lamb (Cloudera) and Andrew Wang (Cloudera) 00:39:56
    12. Adding insert, update, and delete to Hive - Alan Gates (Hortonworks) 00:48:25
  7. IoT/Machine Data
    1. The Internet of Trains - Gerhard Kress (Siemens AG) 00:45:06
    2. Multi-model databases and the art of aircraft maintenance - Max Neunhöffer (ArangoDB GmbH) 00:50:51
    3. How to talk to a house - Simon Elliston Ball (Hortonworks) 00:52:27
    4. How (the Internet of) Things are turning the Internet upside down - Ted Dunning (MapR Technologies) 00:43:19
    5. Smart cars of tomorrow: real-time driving patterns - Ellie Dobson (Pivotal), Michael Minella (Pivotal), and Ronert Obst (Pivotal) 00:44:07
  8. Privacy, Law, & Ethics
    1. Algorithm ethics: The inevitable subjective judgments in analytics - Majken Sander ( and Joerg Blumtritt (Datarella™) 00:45:11
    2. Using data for EVIL - Francine Bennett (Mastodon C) and Duncan Ross (TES Global) 00:48:55
    3. Sharing humanitarian data at the United Nations - Francis Irving (ScraperWiki Ltd.) 00:37:00
    4. Steady UX: Balancing personalisation and privacy to create understanding and trust - Ann Wuyts (Sentiance) 00:46:02
    5. Being a good data citizen - Phil Harvey (DataShaka) 00:26:19
    6. Visualizing the world's largest democratic exercise - Anand Subramanian (Gramener) 00:44:16
  9. Tools & Technology
    1. Getting started with Apache Cassandra - Christopher Batey (DataStax) - Part 1 1:05:48
    2. Getting started with Apache Cassandra - Christopher Batey (DataStax) - Part 2 1:47:52
    3. Big Data and IoT solutions in minutes - Maarten Ectors (Canonical) 00:24:56
  10. Sponsored
    1. Modernize Your Data Management by Optimizing Your Data Warehousing Environments - Paul Davies (Cisco) and Dimitris Papavassiliou (Cisco) 00:41:05
    2. Lowering the entry point to getting going with Hadoop and obtaining business value - Mark Torr (SAS) 00:40:00
    3. Purpose Built Analytics Infrastructure, Intel and HP Powering Performance at Scale - Brandon Draeger (Intel) and Joseph George (Hewlett-Packard (HP)) 00:38:36
    4. Oozie or easy: Managing Hadoop workflows the EASY way - Tom Geva (BMC Software) 00:44:16
    5. The age of agile analytics has arrived! - Frank Saeuberlich (Teradata) 00:46:05
    6. Road to real-time digital business - Rod Smith (IBM Emerging Internet Technologies) 00:51:49
    7. Repeatedly Deliver Trusted and Timely Data for Big Data Analytics - Scott Hedrick (Informatica) and Mathieu Lagrange (Informatica) 00:41:07
    8. A modern, flexible approach to Hadoop implementation, incorporating innovations from HP Vertica & IDOL - Gilles Noisette (HP) 00:37:21
  11. Data-Driven Business Day
    1. DDBD session with Julie Meyer - Julie Meyer (Ariadne Capital) 00:35:45
    2. The Collision of the Internet of Things and the Industrial Internet - Alasdair Allan (Babilim Light Industries) 00:23:16
    3. Data + Insight: Combining UX research with advanced data analytics to build better products at Schibsted Media Group - Valerie Coulton (Schibsted Media Group) 00:22:14
    4. Situational awareness: This is not the data you're looking for - Simon Wardley (Leading Edge Forum (CSC)) 00:24:42
    5. Big data and self-sufficiency - Mark Dijksman (BigData.Company) 00:18:43
    6. Big data stories: Decisions that drive successful projects - Ellen Friedman (Independent) 00:17:37
    7. Measuring and understanding the value of 'social' at SoundCloud - Cory Levinson (SoundCloud) 00:19:47
    8. Big data and the Internet of Things: Two sides of the same coin? - Tamara Dull (SAS Institute Inc.) 00:20:07
    9. Enable breakthroughs in Parkinson's research through big data analytics - Shahar Cohen (Intel Parkinson Project) 00:15:32
    10. Designing a million genomes: Machine learning, automation, and biotech - Aaron Kimball (Zymergen, Inc.) 00:19:24
    11. Data science as art: Visual storytelling with smartphone tracking data - Benedikt Koehler (DataLion) 00:20:04
    12. The myBBC revolution: data innovation at the BBC - Phil Fearnley (BBC) 00:18:57
    13. Large-Scale Emotion Analytics - Daniel McDuff (Affectiva) 00:19:24
    14. The future of machine intelligence and why it matters - Shivon Zilis (Bloomberg Beta) 00:20:24
    15. How to datafy your business - Carme Artigas (Synergic Partners) 00:19:04
    16. Getting the public sector to act data-driven - Siim Sikkut (Government Office of Estonia) 00:20:42
    17. DDBD closing remarks - Alistair Croll, Founder, Solve For Interesting 00:01:32

Product information

  • Title: Strata + Hadoop World London 2015: Video Compilation
  • Author(s):
  • Release date: May 2015
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491927960