Strata + Hadoop World 2016 - London, United Kingdom: Video Compilation

Video Description

Sold out Strata+Hadoop London 2016 is a tour through the giant city of data led by guides expert in knowing just where to go. There is a lot to see in this video compilation that shows you every bit: 211 speakers, 108 sessions, 20 keynotes and 14 tutorials. Start your trip with a long-form tutorial exploring data territory such as: An 8-hour deep dive into all phases of managing Hadoop clusters; an 8-hour excursion through the hardcore data science world of data management, machine learning, natural language processing, crowd-sourcing, and algorithm design; an 8-hour Spark camp on all things Apache; or 3½-hour tours on D3 data visualizations, artificial intelligence, optimizing workflow in R, and more. Want something shorter? Try visiting a mind-blowing conference session (30-40 minutes each) on topics ranging from H20 and TensorFlow to e-commerce A/B testing, predictive analysis, and natural language processing. Not interested? How about streaming analytics at 300 billion events per day with Kafka, Samza, and Druid or using Spark and Hadoop in high-speed trading environments? It’s a travelogue of data wonders with something for everyone.

  • Gain front row access to all 211 speakers, 108 sessions, 20 keynotes, and 14 tutorials
  • Download the videos or view them through O'Reilly's HD player
  • Hear from big data experts at Intel, deepsense.io, IBM, Google, Terradata, and more
  • Watch Cloudera’s Doug Cutting and Tom White predict the future of Apache Hadoop
  • Learn about Spark, Kafka Streams, Kudu, Kappa, Drill, Heron, Flink, Eagle, and NiFi
  • Be inspired by data innovations in cancer research, epilepsy monitoring, and mine field clearing
  • Explore Scotland's Data Lab, the Danish Agency for Digitstation, and the ethics of data processing
  • Hear about big data use at LinkedIn, Intuit, Uber, Etsy, HPE, Docker, Facebook, and Microsoft

Table of Contents

  1. Keynotes
    1. Modern data strategy and CERN - Mike Olson (Cloudera) and Manuel Martin Marquez (CERN) 00:15:08
    2. The Internet of Things: It’s the (sensor) data, stupid - Martin Willcox (Teradata International) 00:11:11
    3. Data relativism and the rise of context services - Joe Hellerstein (UC Berkeley) 00:15:09
    4. Saving whales with deep learning - Piotr Niedzwiedz (deepsense.io) 00:05:15
    5. Data wants to be shareable - Mona Vernon (Thomson Reuters Labs) 00:13:21
    6. Analytics innovation in cancer research - Gilad Olswang (Intel) 00:05:53
    7. The future of (artificial) intelligence - Stuart Russell (UC Berkeley) 00:20:19
    8. The curious case of the data scientist - David Selby (IBM) 00:11:05
    9. Drawing insights from imperfection: A year of Dear Data - Stefanie Posavec (NA) 00:14:41
    10. Big data at Google: Solving problems at scale - Jordan Tigani (Google) 00:05:03
    11. The other half of big data - Tricia Wang (Constellate Data) 00:17:13
    12. Bringing big data and design to policy making - Cat Drew (UK Policy Lab and Government Data Science Partnership) 00:13:45
    13. Machine learning for human rights advocacy: Big benefits, serious consequences - Megan Price (Human Rights Data Analysis Group) 00:15:13
  2. Data innovations
    1. A hands-on introduction to Apache Kafka - Ian Wrigley (Confluent) - Part 1 1:26:03
    2. A hands-on introduction to Apache Kafka - Ian Wrigley (Confluent) - Part 2 1:33:54
    3. AI for business: A hands-on introduction to what machine learning can do - Marc Warner (ASI) - Part 1 1:23:07
    4. AI for business: A hands-on introduction to what machine learning can do - Marc Warner (ASI) - Part 2 1:08:30
    5. Experiments in The Data Lab: Creating a national hub for data science in Scotland - Brian Hills (The Data Lab) 00:36:29
    6. The innards of H2O - Cliff Click (0xdata) 00:40:55
    7. TensorFlow: Machine learning for everyone - Sherry Moore (Google) 00:38:57
    8. The future of column-oriented data processing with Arrow and Parquet - Julien Le Dem (Dremio) 00:43:41
    9. 90% of the world's trade is transported by sea, but what data do we have about ship activity worldwide? - Tal Guttman (Windward) 00:39:50
    10. The evolution of massive-scale data processing - Tyler Akidau (Google) 00:41:36
    11. Streaming analytics at 300 billion events per day with Kafka, Samza, and Druid - Xavier Léauté (Metamarkets) 00:43:50
    12. Triggers in Apache Beam (incubating): User-controlled balance of completeness, latency, and cost in streaming big data pipelines - Kenneth Knowles (Google) 00:44:32
    13. Introducing Kafka Streams, Apache Kafka's new stream processing library - Neha Narkhede (Confluent) 00:47:05
  3. Data science & advanced analytics
    1. R and reproducible reporting for big data - Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions), and Richard Pugh (Mango Solutions) - Part 1 1:19:29
    2. R and reproducible reporting for big data - Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions), and Richard Pugh (Mango Solutions) - Part 2 1:17:56
    3. Deep learning and natural language processing with Spark - Andy Petrella (Data Fellas) and Melanie Warrick (Skymind) 00:40:29
    4. Semantic natural language understanding with Spark Streaming, UIMA, and machine-learned ontologies - David Talby (Atigeo) and Claudiu Branzan (Atigeo) 00:45:30
    5. Sightseeing, venues, and friends: Predictive analytics with Spark ML and Cassandra - Natalino Busa (Teradata) 00:38:44
    6. Introduction to generalized low-rank models and missing values - Jo-fai Chow (H2O.ai) 00:29:12
    7. Petascale genomics - Tom White (Cloudera) 00:40:13
    8. Panel: The future of intelligence - Marc Warner (ASI), Stuart Russell (UC Berkeley), and Jaan Tallinn (CSER) 00:39:20
    9. The polyglot data scientist - Jeroen Janssens (Tilburg University) 00:25:11
    10. Beyond guide dogs: How advances in deep learning can empower the blind community - Anirudh Koul (Microsoft) and Saqib Shaikh (Microsoft) 00:37:52
    11. Predicting out-of-sample performance of a large cohort of trading algorithms with machine learning - Thomas Wiecki (Quantopian) 00:38:30
    12. Scala: The unpredicted lingua franca for data science - Andy Petrella (Data Fellas) and Dean Wampler (Lightbend) 00:42:56
    13. Land mine or Coke can: Machine learning from GPR data - Dirk Gorissen (Skycap | World Bank) 00:33:39
    14. Data modeling for data science: Simplify your workload with complex types - Marcel Kornacker (Cloudera) 00:40:28
    15. Applications of natural language understanding: Tools and technologies - Alyona Medelyan (Entopix) 00:39:31
  4. Data-driven business
    1. Developing a modern enterprise data strategy - Scott Kurth (Silicon Valley Data Science) and John Akred (Silicon Valley Data Science) - Part 1 1:28:28
    2. Developing a modern enterprise data strategy - Scott Kurth (Silicon Valley Data Science) and John Akred (Silicon Valley Data Science) - Part 2 1:24:42
    3. The Bag of Little Bootstraps: A/B experimenting with big data made small - Emily Sommer (Etsy) 00:38:23
    4. Beyond the hunch: Communicating uncertainty for effective data-driven business - Abigail Lebrecht (uSwitch) 00:40:31
    5. What’s next for music services? The answer is in the data - Paul Shannon (7digital Group Plc) and Alan Hannaway (7digital) 00:42:48
    6. Intuit, Uber, and Etsy: Scaling innovation with A/B testing - Lucian Lita (Intuit), Mita Mahadevan (Intuit Inc.), Shalin Mantri (Uber), and Gabrielle Gianelli (Etsy) 00:43:48
    7. How AI revolutionizes business strategy - Kenneth Cukier (The Economist) 00:43:32
    8. The best university in the world - Duncan Ross (TES Global) and Francine Bennett (Mastodon C) 00:44:50
    9. 20 percent blissful, 80 percent ignorance - Phil Harvey (DataShaka) 00:24:40
    10. Data gravity and complex systems - Dave McCrory (Basho Technologies) 00:28:21
    11. Analytics: A first-class architectural concern in a SaaS platform - Calum Murray (Intuit) 00:35:08
    12. Situational awareness: On the importance of mapping - Simon Wardley (Leading Edge Forum (CSC)) 00:42:31
    13. Data-driven businesses: Disrupting business models with big data - Carme Artigas (Synergic Partners) 00:24:35
    14. Building better cross-team communication - Ellen Friedman (Independent) 00:23:46
    15. What Esperanto can teach us about collaboration in the big data environment - Anne Sophie Roessler (Dataiku) 00:19:53
    16. What should I eat: The road map to better food and smarter nutrition science - Taryn Fixel (ingredient1) 00:22:35
    17. Your TOS is not informed consent: Ethical experimentation for the Web - Rachel Shadoan (Akashic Labs) 00:22:19
    18. How to ask good questions - Farrah Bostic (The Difference Engine) 00:30:27
    19. Every business is a data business - Mona Vernon (Thomson Reuters Labs) 00:27:36
    20. Data scientists everywhere - Kim Nilsson (Pivigo) 00:21:09
    21. Harnessing big data to transform the energy sector - Erik Nygard (Limejump Ltd) 00:13:57
    22. Data science as catalyst of Autodesk's business model transformation - Laurent Gaubert (Autodesk) 00:19:24
    23. My AlgorithmicMe knows me better than Google or my mum - Majken Sander (BusinessAnalyst.dk) 00:22:49
    24. Otto’s little army of real-time bots: How online retailers can defend shopping carts and retarget customers in real time - Rupert Steffner (Otto GmbH & Co. KG) 00:21:42
    25. My AlgorithmicMe: The "Who is. . .?" of the future - Majken Sander (BusinessAnalyst.dk) and Joerg Blumtritt (Datarella) 00:38:28
    26. Demonstrating the art of the possible with Spark and Hadoop - Joy Spohn (IBM) and Adrian Houselander (IBM) 00:34:48
  5. Enterprise adoption
    1. Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 1 1:25:46
    2. Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 2 1:34:04
    3. Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 3 1:23:20
    4. Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 4 1:19:21
    5. Architecting a data platform - John Akred (Silicon Valley Data Science) and Stephen O'Sullivan (Silicon Valley Data Science) - Part 1 1:24:41
    6. Architecting a data platform - John Akred (Silicon Valley Data Science) and Stephen O'Sullivan (Silicon Valley Data Science) - Part 2 1:36:03
    7. Big SQL: The future of in-cluster analytics and enterprise adoption - Moderated by: Surya Mukherjee (Ovum) - Panelists: Lloyd Tabb (Looker Data Science), Nick Amabile (FullStack Analytics), Rex Gibson (Knewton), dp Suresh (Yahoo!) 00:39:16
    8. BI on Hadoop: What are your options? - Tomer Shiran (Dremio) 00:40:44
  6. Hadoop internals & development
    1. Hadoop application architectures: Fraud detection - Jonathan Seidman (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent), and Ted Malaska (Cloudera) - Part 1 1:29:48
    2. Hadoop application architectures: Fraud detection - Jonathan Seidman (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent), and Ted Malaska (Cloudera) - Part 2 1:23:17
    3. The next 10 years of Apache Hadoop - Doug Cutting (Cloudera), Tom White (Cloudera), and Ben Lorica (O'Reilly Media) 00:39:56
    4. Hadoop's storage gap: Resolving transactional access/analytic performance trade-offs with Apache Kudu (incubating) - Todd Lipcon (Cloudera, Inc.) 00:41:54
    5. Building real-time BI systems with HDFS and Kudu - Ruhollah Farchtchi (Zoomdata) 00:35:37
    6. Why is my Hadoop job slow? - Bikas Saha (Hortonworks Inc) 00:39:04
    7. Scaling out to 10 clusters, 1,000 users, and 10,000 flows: The Dali experience at LinkedIn - Carl Steinbach (LinkedIn) 00:35:43
    8. Floating elephants: Developing data wrangling systems on Docker - Chad Metcalf (Docker) and Seshadri Mahalingam (Trifacta) 00:29:07
  7. Data 101
    1. Developing data scientists: Breaking the skills cap - Yuelin Li (ASI) 00:28:42
    2. The business case for Spark, Kafka, and friends - John Akred (Silicon Valley Data Science) 00:31:12
    3. What is AI? - Melanie Warrick (Skymind) 00:28:01
  8. Hardcore data science
    1. Mobile advertising: The preclick experience - Mounia Lalmas (Yahoo) 00:26:40
    2. Analytics for large-scale time series and event data - Ira Cohen (Anodot) 00:29:31
    3. Recent trends in recommender systems - Danny Bickson (1972) 00:28:50
    4. Visual data analysis for intelligent machines - Francesca Odone (University of Genova) 00:33:09
    5. Deep learning for web-scale text - Piotr Mirowski (Google DeepMind) 00:27:54
    6. Detecting anomalies in the real world - Alessandra Staglianò (The ASI) 00:31:05
    7. Recent advances in deep learning research - Olivier Grisel (Inria & scikit-learn) 00:31:46
    8. Hardcore data science in practice - Mikio Braun (Zalando SE) 00:29:16
    9. Data science++: Improving data science by adding domain understanding - Matthew Smith (Microsoft Research) 00:28:31
    10. A methodology for taxonomy generation and maintenance from large collections of textual data - Roxana Danger (reed.co.uk) 00:27:58
    11. A functional data integration pipeline using Scala - Johannes Bauer (Cambridge Analytica) 00:40:11
  9. IoT & real-time
    1. An Introduction to time series with Team Apache - Patrick McFadin (DataStax) - Part 1 1:12:50
    2. An Introduction to time series with Team Apache - Patrick McFadin (DataStax) - Part 2 1:31:21
    3. What does your smart car know about you? - Charles Givre (Booz | Allen | Hamilton) 00:42:25
    4. When it absolutely, positively has to be there: Reliability guarantees in Kafka - Gwen Shapira (Confluent) and Jeff Holoman (Cloudera) 00:43:27
    5. Real-time epilepsy monitoring with smart clothing: A case study in time series, open source technology, and connected devices - Eric Kramer (Dataiku) 00:37:58
    6. Industrial big data and sensor time series data: Different but not difficult - Gopal GopalKrishnan (OSIsoft, LLC.) and Hoa Tram (OSIsoft) 00:50:56
    7. High-performance data flow with a GUI—and guts - Simon Elliston Ball (Hortonworks) 00:41:47
    8. Watermarks: Time and progress in streaming dataflow and beyond - Slava Chernyak (Google Inc.) 00:35:01
    9. Putting Kafka into overdrive - Gwen Shapira (Confluent) and Todd Palino (LinkedIn) 00:39:39
    10. Stream analytics in the enterprise: A look at Intel’s internal IoT implementation - Moty Fania (Intel) 00:39:42
    11. Legacy or Kafka? What an ideal messaging system should bring to Hadoop - Jim Scott (MapR Technologies, Inc.) 00:38:51
    12. Making sense of exactly-once semantics - Flavio Junqueira (Confluent) 00:39:45
    13. Processing billions of events in real time with Heron - Karthik Ramasamy (Twitter) 00:48:05
    14. Data privacy in the age of the Internet of Things - Alasdair Allan (Babilim Light Industries) 00:35:03
    15. Kappa architecture in the telecom industry - Ignacio Manuel Mulas Viela (Ericsson) and Nicolas Seyvet (Ericsson AB) 00:33:51
  10. Spark & beyond
    1. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera), and Krishna Sankar (Volvo Cars) - Part 1 1:30:58
    2. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera), and Krishna Sankar (Volvo Cars) - Part 2 1:28:39
    3. Spark 2.0: What’s next? - Tathagata Das (Databricks) 00:41:39
    4. Anomaly detection in telecom with Spark - Ted Dunning (MapR Technologies) 00:44:47
    5. Beyond shuffling: Tips and tricks for scaling Spark jobs - Holden Karau (IBM) 00:41:25
    6. Securing Apache Spark on production Hadoop clusters - Kostas Sakellis (Cloudera) 00:40:19
    7. The future of streaming in Spark: Structured streaming - Tathagata Das (Databricks) 00:41:57
    8. Introduction to Apache Spark for Java and Scala developers - Ted Malaska (Cloudera) 00:39:53
    9. Breaking Spark: Top five mistakes to avoid when using Apache Spark in production - Neelesh Srinivas Salian (Cloudera) 00:27:42
  11. Visualization & user experience
    1. Introduction to visualizations using D3 - Brian Suda ((optional.is)) - Part 1 1:19:09
    2. Introduction to visualizations using D3 - Brian Suda ((optional.is)) - Part 2 1:20:15
    3. Good city life - Daniele Quercia (Bell Labs) 00:39:31
    4. Pixels and place: What online experiences can borrow from offline spaces and vice versa - Kate O'Neill (KO Insights) 00:42:14
    5. Opportunities for hardware acceleration in big data analytics - Kanu Gulati (Zetta Venture Partners) 00:27:42
    6. The rise of the GPU: GPUs will change how you look at big data - Todd Mostak (MapD) 00:45:36
  12. Sponsored
    1. Which whale is it anyway? Face recognition for right whales using deep learning - Robert Bogucki (deepsense.io) and Maciej Klimek (deepsense.io) 00:33:28
    2. Realizing the value of combining the IoT and big data analytics - Frank Saeuberlich (Teradata) and Eliano Marques (Think Big Analytics) 00:42:01
    3. Federated analytics innovation in cancer research - Gilad Olswang (Intel) 00:43:22
    4. Best practices to extract value from Hadoop with predictive analytics - Zoltan Prekopcsak (RapidMiner) 00:33:13
    5. Building a modern data architecture - Ben Sharma (Zaloni) 00:36:20
    6. High-frequency decisioning, from big data to fast data - Tugdual Grall (MapR Technologies) 00:40:38
    7. Avoid big data becoming a big problem - Raghunath Nambiar (Cisco) 00:43:59
    8. Operating batch in the data-driven enterprise - Joe Goldberg (BMC Software Inc.) 00:40:11
    9. Developing a successful big data strategy - Seb Darrington (EMC) 00:38:36
    10. Business transformation and outcomes through big data - Louise Matthews (Hortonworks) 00:34:36
    11. The business bottom line of data lakes: Real-life experiences - Franz Aman (Informatica) 00:41:22
  13. Security
    1. Simplifying Hadoop with RecordService, a secure and unified data access path for compute frameworks - Alex Leblang (Cloudera) 00:33:26
    2. Best practices and solutions to manage and govern a multinational big data platform - Clara Fletcher (Accenture) 00:38:30
    3. HopsWorks: Multitenant Hadoop as a service - Jim Dowling (Swedish ICT - SICS) 00:39:12
  14. Hadoop use cases
    1. Improving the customer experience with big data wrangling on Hadoop - Dan Jermyn (Royal Bank of Scotland) and Connor Carreras (Trifacta) 00:35:55
    2. Simple, fast, and flexible risk aggregation in Hadoop - Deenar Toraskar (Think Reactive) 00:29:26
    3. Risk data aggregation and risk reporting for financial services - Ben Sharma (Zaloni) 00:33:20
    4. The future is now: Leveraging Hadoop for real-time, predictive insights - Steven Noels (NGDATA) 00:43:03
    5. Year 2025: Big data as enabler of fully automated vehicles - Dr. Thomas Beer (Continental) and Felix Werkmeister (Continental) 00:40:59
    6. Analyzing dynamic JSON with Apache Drill - Tomer Shiran (Dremio) 00:40:56
  15. Law, ethics, governance
    1. Denmark is data driven - Mads Hjorth (Danish Agency for Digitisation) 00:39:47
    2. Using data for evil IV: The journey home - Duncan Ross (TES Global) and Francine Bennett (Mastodon C) 00:39:53
    3. Protecting individual privacy in a data-driven world - Jason McFall (Privitar) 00:39:37
    4. Don't build a data swamp: Hadoop governance case studies for financial services - Mark Donsky (Cloudera) and Chang She (Cloudera) 00:39:02

Product Information

  • Title: Strata + Hadoop World 2016 - London, United Kingdom: Video Compilation
  • Author(s): O'Reilly Media, Inc.
  • Release date: June 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491944639