O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Strata Data Conference - London, UK 2018

Video Description

How are companies across the planet preparing for Europe's tough new GDPR rules? How do enterprises like Barclays, Deutsche Telekom, and Shell drive innovation and profitability with real-time analytics and machine learning? What is algorithm bias and how does it impact the social media recommendation systems at platforms like Facebook? These are just a few of the timely and fascinating topics discussed at the four-day Strata Data Conference held in London during May 2018. The conference featured more than 200 of the globe's most recognized big data strategists, scientists, and engineers speaking on the skills, technologies, and practices required to create successful data-driven projects and organizations. This video compilation gives you complete access to the conference's top keynote addresses, tutorials, and topical sessions—more than 100 hours of material to view at your own pace. Highlights include:

  • Keynote speeches, including Eva Kaili (European Parliament) on data protection and innovation; Christine Foster (Alan Turing Institute) on building the data-driven future we want to live in; and Pierre Romera (ICIJ) on the challenges of sifting through 1.4 TB of data to produce journalism's headline generating "Paradise Papers"
  • Sessions on Data Science and Machine Learning, including Jaya Mathew and Francesca Lazzeri (Microsoft) on using deep learning models for fraud detection; Ted Dunning (MapR Technologies) on AI Rendezvous technology; Paco Nathan (O'Reilly Media) and Ihab Ilyas (University of Waterloo) on HITL/human guided machine learning; Elisa Celis (EPFL) on fairness and diversity in online social systems; and David Talby (Pacific AI) on natural language understanding at scale with spaCy and Spark NLP
  • Sessions on Data Engineering and Architecture, including Jim Webber (Neo4j) on Neo4j's new causal clustering architecture; Dean Wampler (Lightbend) on streaming microservices with Akka Streams and Kafka Streams; Flavio Junqueira (Dell EMC) on building stream pipelines with Pravega; and Jacques Nadeau (Dremio) on using Apache Arrow for high-speed data access
  • The Strata Business Summit, including Executive Briefings on key issues such as predictive analytics and machine learning, Cloud strategy, governance security and privacy, and AI by luminaries such as Audrey Lobo-Pulo (The Australian Treasury), Kim Nilsson (Pivigo), Martin Goodson (Evolution AI), Simon Chan (Salesforce), and Ira Cohen (Anodot)
  • FinData Day with sessions on using big data techniques to analyze risk, detect fraud, and improve customer service in the financial industry delivered by heavy-weights at Deutsche Börse, Nordea, OptiRisk Systems, TBC Bank, Arcadia Data, ORDIX AG, and more
  • High-value tutorials led by experts, such as Barbara Fusinska (Google) on the basics of NLP with Python and Eugene Fratkin (Cloudera) on how to run data analytic workloads in the cloud
  • Sessions on Law, Ethics, and Governance including Guillaume Chaslot (AlgoTransparency) on spotting bias in social media recommendations and Aurélie Pols (Mind Your Privacy) on how to navigate the GDPR privacy regulations
  • Sessions on Emerging Technologies, including Jorie Koster-Hale (Dataiku) on using big data methods to detect criminal activity and Aurélien Géron (Kiwisoft) on using state-of-the-art deep computer vision architectures in manufacturing

The Strata Data Conference London 2018 video compilation is the ideal reference for anyone who wants to tap into the opportunities big data presents. Get yours now.

Table of Contents

  1. Keynotes
    1. Charting a data journey to the cloud - Mick Hollison (Cloudera), Sven Löffler (Deustche Telecom), Robert Neumann (Ultra Tendency) 00:18:31
    2. Journey to GDPR compliance - Alison Howard (Microsoft) 00:11:50
    3. Humans and the machine: Machine learning in context (sponsored by IBM) - Jean-François Puget (IBM Analytics) 00:10:10
    4. Building a stronger data ecosystem - Ben Lorica (O'Reilly Media) 00:09:04
    5. The Paradise Papers: Behind the scenes with the ICIJ - Pierre Romera (International Consortium of Investigative Journalists (ICIJ)) 00:14:21
    6. Data protection and innovation - Eva Kaili (European Parliament | The Science and Technology Options Assessment Panel) 00:16:28
    7. So, you want to be successful in the open future? - Louise Beaumont (Publicis Groupe | techUK | NPSO) 00:14:58
    8. Machine learning: Research and industry - Mikio Braun (Zalando SE) 00:14:15
    9. When to KISS - Zubin Siganporia (QED Analytics) 00:10:44
    10. Out of the lab and into real life - Christine Foster (The Alan Turing Institute) 00:09:12
    11. The good, the bad, and the internet? - Martha Lane Fox (CBE) 00:16:16
  2. Tutorials
    1. Architecting a data platform for enterprise use - Mark Madsen (Think Big Analytics), Todd Walter (Teradata) - Part 1 1:25:23
    2. Architecting a data platform for enterprise use - Mark Madsen (Think Big Analytics), Todd Walter (Teradata) - Part 2 1:39:53
    3. Architecting a next-generation data platform - Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera) - Part 1 1:36:50
    4. Architecting a next-generation data platform - Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera) - Part 2 1:30:26
    5. Kafka streaming microservices with Akka Streams and Kafka Streams - Dean Wampler (Lightbend), Boris Lublinsky (Lightbend) - Part 1 1:32:09
    6. Kafka streaming microservices with Akka Streams and Kafka Streams - Dean Wampler (Lightbend), Boris Lublinsky (Lightbend) - Part 2 1:17:44
    7. Modern real-time streaming architectures - Arun Kejariwal (MZ), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio) - Part 1 1:28:17
    8. Modern real-time streaming architectures - Arun Kejariwal (MZ), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio) - Part 2 1:32:38
    9. Leveraging Spark and deep learning frameworks to understand data at scale - Vartika Singh (Cloudera), Juan Yu (Cloudera), Marton Balassi (Cloudera), Steven Totman (Cloudera) - Part 1 1:36:44
    10. Leveraging Spark and deep learning frameworks to understand data at scale - Vartika Singh (Cloudera), Juan Yu (Cloudera), Marton Balassi (Cloudera), Steven Totman (Cloudera) - Part 2 1:21:24
    11. Introduction to natural language processing with Python - Barbara Fusinska (Google) - Part 1 1:13:55
    12. Introduction to natural language processing with Python - Barbara Fusinska (Google) - Part 2 1:22:39
  3. Sponsored
    1. The IoT and AI for good (sponsored by Hitachi Vantara) - Wael Elrifai (Hitachi Vantara) 00:40:23
    2. Fortune 100 lessons: Architecting data lakes for real-time analytics and AI (sponsored by Attunity) - Ted Orme (Attunity) 00:42:05
    3. A tale of two BI standards: Data warehouses and data lakes (sponsored by Arcadia Data) - Randy Lea (Arcadia Data) 00:38:33
    4. Building the bridge from big data to machine learning and artificial intelligence (sponsored by Google Cloud) - Ryan Lippert (Google Cloud) 00:38:06
    5. Enabling data-driven development for autonomous driving at BMW (sponsored by BMW) - Miha Pelko (BMW Group), Aleksandr Melkonyan (BMW AG) 00:39:32
    6. Cloud-native data science with Anaconda, Docker, and Kubernetes (sponsored by Anaconda) - Mathew Lodge (Anaconda) 00:46:10
    7. Incorporating data sources inside and outside of the data center (sponsored by Cisco) - Han Yang (Cisco Systems) 00:37:02
  4. Strata Business Summit
    1. General Data Protection Regulation (GDPR) tutorial and ePrivacy introduction - Part 1 - Aurélie Pols (Mind Your Privacy) 1:29:56
    2. General Data Protection Regulation (GDPR) tutorial and ePrivacy introduction - Part 2 - Aurélie Pols (Mind Your Privacy) 1:23:14
    3. Measure what matters: How your measurement strategy can reduce opex - Part 1- Radhika Dutt (Radical Product), Geordie Kaytes (Fresh Tilled Soil), Nidhi Aggarwal (Radical Product) 1:10:28
    4. Measure what matters: How your measurement strategy can reduce opex - Part 2 - Radhika Dutt (Radical Product), Geordie Kaytes (Fresh Tilled Soil), Nidhi Aggarwal (Radical Product) 00:45:48
    5. Managing data science in the enterprise - Part 1 - Dan Enthoven (Domino Data Lab) 1:01:13
    6. Managing data science in the enterprise - Part 2 - Dan Enthoven (Domino Data Lab) 1:16:48
    7. Leveraging public-private partnerships using data analytics for economic insights - Audrey Lobo-Pulo (The Australian Treasury), Nick O'Donnell (LinkedIn) 00:45:43
    8. Executive Briefing: Becoming a data-driven enterprise—A maturity model - Teresa Tung (Accenture Labs), Jean-Luc Chatelain (Accenture) 00:39:31
    9. The app trap: Why every mobile app and mobile operator needs anomaly detection - Ira Cohen (Anodot) 00:45:25
    10. Executive Briefing: Lessons learned managing data science projects—Adopting a team data science process - Danielle Dean (Microsoft) 00:37:55
    11. Executive Briefing: What you need to know about fast data - Dean Wampler (Lightbend) 00:42:55
    12. Successful data cultures: Inclusivity, empathy, retention, and results - Kim Nilsson (Pivigo), Phil Harvey (Microsoft) 00:36:59
    13. Executive Briefing: BI on big data - Mark Madsen (Think Big Analytics), Shant Hovsepian (Arcadia Data) 00:38:09
    14. Executive Briefing: Why machine-learned models crash and burn in production and what to do about it - David Talby (Pacific AI) 00:39:46
    15. Executive Briefing: Machine learning—Why you need it, why it's hard, and what to do about it - Mick Hollison (Cloudera) 00:40:38
    16. Python for financial analysts - Saeed Amen (Cuemacro) 00:40:17
    17. On the limits of decision making with artificial intelligence - Martin Goodson (Evolution AI) 00:25:56
    18. The journey of machine learning platform adoption in enterprise - Simon Chan (Salesforce) 00:32:42
    19. Executive Briefing: The ROI of data-driven digital transformation - Kevin Sigliano (IE Business School ) 00:37:42
    20. The artful science of metrics: Measurements that work - Ketan Gangatirkar (Indeed) 00:44:56
  5. Data-driven business management
    1. Delivering business value to Pepsico through mobility and retail data - Carme Artigas (Synergic Partners), Nuria Bombardo (Pepsico) 00:28:42
    2. Discovery through real-time monitoring: A case study from the automotive industry - Maria Assunta Palmieri (Data Reply ) 00:20:32
    3. Driving better predictions in the oil and gas industry with modern data architecture - Jane McConnell (Teradata), Paul Ibberson (Teradata) 00:30:39
    4. How Typeform's data and analytics team managed to embed its data scientists into cross-functional teams while maintaining their cohesion - Viola Melis (Typeform), David Martín Borregón (Typeform) 00:24:28
    5. Extracting value from data: How Cox Automotive is using data to drive growth and transform the way the world buys, sells, and owns cars - Allison Nau (Cox Automotive UK) 00:30:21
    6. Analytics-driven insights are the new oil: How Shell is transforming data science - Dan Jeavons (Shell) 00:33:52
    7. Are we doing this wrong? Advertisement features A/B testing - Chen Salomon (Playbuzz) 00:38:24
    8. Predictive analytics for FMCG business strategies and tactics - Erik Elgersma (FrieslandCampina) 00:27:26
  6. Ask Me Anything
    1. Ask Me Anything: Streaming applications and architectures - Dean Wampler (Lightbend), Boris Lublinsky (Lightbend) 00:41:10
    2. Ask Me Anything: Architecting a data platform for enterprise use - Mark Madsen (Think Big Analytics), Shant Hovsepian (Arcadia Data), Mikio Braun (Zalando SE) 00:39:09
  7. Data Science & Machine Learning
    1. DataOps: Nine steps to transform your data science impact - Harvinder Atwal (Moneysupermarket) 00:43:10
    2. 50 reasons to learn the shell for doing data science - Jeroen Janssens (Data Science Workshops B.V.) 00:37:48
    3. Human in the loop: A design pattern for managing teams working with machine learning - Paco Nathan (O'Reilly Media) 00:43:20
    4. How Captricity manages 10,000 tiny deep learning models in production - Ramesh Sridharan (Captricity) 00:42:48
    5. Code Property Graph: A modern, queryable data storage for source code - Fabian Yamaguchi (ShiftLeft) 00:37:31
    6. Detecting small-scale mines in Ghana - Elena Terenzi (Microsoft), Michael Lanzetta (Microsoft) 00:39:05
    7. Rendezvous with AI - Ted Dunning (MapR Technologies) 00:44:38
    8. Machine learning for time series: What works and what doesn't - Mikio Braun (Zalando SE) 00:44:04
    9. Interpretable machine learning products - Mike Lee Williams (Cloudera Fast Forward Labs) 00:44:50
    10. Solving data cleaning and unification using human-guided machine learning - Ihab Ilyas (University of Waterloo | Tamr) 00:41:36
    11. Democratizing data within your organization - Mark Grover (Lyft), Deepak Tiwari (Lyft) 00:43:35
    12. Risk-sharing pools: Winning zero-sum games through machine learning - Baiju Devani (Aviva Canada), Étienne Chassé St-Laurent (Aviva Canada) 00:47:55
    13. Deep learning for recommender systems - Nick Pentreath (IBM) 00:42:58
    14. Operationalize deep learning models for fraud detection with Azure Machine Learning Workbench - Francesca Lazzeri (Microsoft), Jaya Mathew (Microsoft) 00:41:10
    15. Real-time motorcycle racing optimization - fausto morales (Arundo), Marty Cochrane (Arundo) 00:28:13
    16. Spark NLP in action: Intelligent, high-accuracy fact extraction from long financial documents - David Talby (Pacific AI), Saif Addin Ellafi (John Snow Labs), Paul Parau (UiPath) 00:41:28
    17. Modeling time series in R - Jared Lander (Lander Analytics) 00:38:17
    18. Big data meets renewable energy: Building a real-time asset management platform for renewable energy - Stamatis Stefanakos (D ONE AG) 00:31:37
  8. Data Engineering & Architecture
    1. Big data, big quality: Data quality at Spotify - Irene Gonzálvez (Spotify) 00:27:07
    2. Architecting data platforms for cybersecurity - Charaka Goonatilake (Panaseer) 00:41:54
    3. Scaling the AI hierarchy of needs with TensorFlow, Spark, and Hops - Jim Dowling (Logical Clocks) 00:37:11
    4. How BT delivers better broadband and TV using Spark and Kafka - Phillip Radley (BT) 00:47:00
    5. Why knowledge graphs are important to finance - Haikal Pribadi (GRAKN.AI) 00:39:11
    6. Audi's journey to an enterprise big data platform - Carsten Herbe (Audi Business Innovation GmbH), Matthias Graunitz (Audi AG) 00:33:24
    7. Web analytics at scale with Druid at Naver - Jason Heo (Naver), Dooyong Kim (Navercorp) 00:34:33
    8. Continuous delivery and machine learning - Guillaume Salou (OVH) 00:26:28
    9. Batch and real-time processing in LINE's log analysis platform - Wataru Yukawa (LINE) 00:24:37
    10. Mixing causal consistency and asynchronous replication for large Neo4j clusters - Jim Webber (Neo4j) 00:33:26
    11. Accelerating development velocity of production ML systems with Docker - Kinnary Jangla (Pinterest) 00:33:18
    12. Time for a new relation: Going from RDBMS to a graph database - Patrick McFadin (DataStax) 00:41:08
    13. How to protect big data in a containerized environment - Thomas Phelan (BlueData) 00:37:25
    14. Machine learning platform lifecycle management - Hope Wang (Intuit) 00:36:29
  9. Findata
    1. Real-time trade surveillance is not just about trade data - Paul Lashmet (Arcadia Data) 00:26:03
    2. How Deutsche Börse designed a world-class analytics lab - Konrad Sippel (Deutsche Börse) 00:25:43
    3. Macroeconomic news sentiment: Enhanced risk assessment for sovereign bond spreads - Christina Erlwein-Sayer (OptiRisk Systems) 00:30:00
    4. No stone unturned: Financial research as an intelligence organization - Robert Passarella (Alpha Features) 00:27:39
    5. Monetizing your data for financial markets - Saeed Amen (Cuemacro) 00:27:55
    6. Welcome to your open future - Louise Beaumont (Publicis Groupe | techUK | NPSO) 00:33:20
    7. Fast analytics on fast data: Kudu as a storage layer for banking applications - Olaf Hein (ORDIX AG) 00:25:20
    8. Using data flow and machine learning to measure real transformation in culture, capacity, and delivery - Angelique Mohring (GainX) 00:37:44
  10. Law, ethics, & governance
    1. How will the GDPR impact machine learning? - Steve Touw (Immuta) 00:39:43
    2. Finding bias in social media recommendations - Guillaume Chaslot (AlgoTransparency) 00:39:12
    3. Hadoop under attack: Securing data in a banking domain - Federico Leven (ReactoData) 00:40:29
    4. Designing ethical artificial intelligence - Jivan Virdee (Fjord), Hollie Lubbock (Fjord) 00:41:42
    5. Multi-data center and multitenant durable messaging with Apache Pulsar - Ivan Kelly (Streamlio) 00:36:42
    6. Rent, rain, and regulations: Leveraging structure in big data to predict criminal activity - Jorie Koster-Hale (Dataiku) 00:45:05
    7. Executive Briefing: Data privacy in the age of the Internet of Things - Alasdair Allan (Babilim Light Industries) 00:39:37
    8. Executive Briefings: Killer robots and how not to do data science - Kate Vang (DataKind UK), Christine Henry (DataKind UK) 00:40:11
  11. Emerging technologies & case studies
    1. Revolutionizing the newsroom with artificial intelligence - Daniel Gilbert (News UK), Jonathan Leslie (Pivigo) 00:38:33
    2. GPU-accelerated threat detection with GOAI - Joshua Patterson (NVIDIA), Mike Wendt (NVIDIA) 00:35:51
    3. Building a healthcare decision support system for ICD10/HCC coding through deep learning - Manas Ranjan Kar (Episource) 00:41:17
    4. The ultimate data scientist's playground: Building a multipetabyte analytic infrastructure for cyber defense - Lee Blum (Verint Systems) 00:37:02
    5. Kafka in jail: Running Kafka in container-orchestrated clusters - Sean Glover (Lightbend) 00:41:03
    6. Using LSTMs to aid professional translators - Darren Cook (QQ Trend) 00:39:22
    7. Scaling data science (teams and technologies) - David Asboth (Cox Automotive Data Solutions), Shaun McGirr (Cox Automotive Data Solutions) 00:32:59
    8. Unlocking the hidden potential of bad news: Using news-derived data to uncover and solve complex societal problems - Niranjan Thomas (Dow Jones) 00:20:51
  12. Visualization & user experience
    1. Data visualization in a big data world - Jeff Fletcher (Cloudera) 00:39:14
    2. The business leader’s guide to designing indispensable analytics solutions and data products - Brian O'Neill (Designing for Analytics) 00:40:21
    3. Architectural design for interactive visualization - Bargava Subramanian (Impel Labs), Amit Kapoor (narrativeVIZ Consulting) 00:40:21
    4. A heretical monitoring view: Using PostgreSQL to store Prometheus metrics and visualizing them in Grafana - Erik Nordström (Timescale) 00:40:22
    5. Deep learning in the browser: Explorable explanations, model inference, and rapid prototyping - Amit Kapoor (narrativeVIZ Consulting), Bargava Subramanian (Impel Labs) 00:37:56
    6. Human-in-the-loop data science with Jupyter widgets - Pascal Bugnion (ASI Data Science) 00:32:56
    7. How DHL is increasing efficiency and reducing distance traveled across the warehouse with the IoT - Michael Troughton (Conduce), Jonathan Genah (DHL Supply Chain) 00:28:50
    8. Humanizing data: How to find the why - Hollie Lubbock (Fjord), Jivan Virdee (Fjord) 00:30:26
  13. Big data & data science in the cloud
    1. The cloud is expensive, so build your own redundant Hadoop clusters. - Stuart Pook (Criteo) 00:40:15
    2. Analytics in the cloud: Building a modern cloud-based big data warehouse - Greg Rahn (Cloudera) 00:40:49
    3. Data science across data sources with Apache Arrow - Tomer Shiran (Dremio) 00:40:59
    4. Making stateless containers reliable and available even with stateful applications - Paul Curtis (MapR Technologies) 00:40:25
    5. Using Siamese CNNs for removing duplicate entries from real estate listing databases - Sergey Ermolin (Intel), Olga Ermolin (MLS Listings) 00:38:04
    6. Practical advice for driving down the cost of cloud big data platforms - Christopher Royles (Cloudera) 00:38:38
    7. Stream processing for the practitioner: Blueprints for common stream processing use cases with Apache Flink - Aljoscha Krettek (data Artisans) 00:36:55
    8. Improving ad hoc and production workflows at Stitch Fix - Neelesh Srinivas Salian (Stitch Fix) 00:39:21
    9. Setting up a lightweight distributed caching layer using Apache Arrow - Jacques Nadeau (Dremio) 00:42:08
    10. Deep learning with TensorFlow and Spark using GPUs and Docker containers - Nanda Vijaydev (BlueData), Thomas Phelan (BlueData) 00:40:36
    11. Autonomous ETL with materialized views - Adesh Rao (Qubole), Abhishek Somani (Qubole) 00:35:25
    12. The Data Intelligence Hub: On-demand Hadoop resource provisioning in Europe’s Industrial Data Space using Cloudera Altus - Sven Löffler (Deustche Telecom) 00:41:04
    13. ClickFox: Customer journey analytics powered by OpenStack and Cloudera - Alvin HEIB (Cloudera), Guy Leroux (Atos) 00:39:10
    14. Radically modular data ingestion APIs in Apache Beam - Eugene Kirpichov (Google) 00:45:33
    15. You call it data lake; we call it Data Historian. - Naghman Waheed (Monsanto), Brian Arnold (Monsanto) 00:47:17
  14. Streaming systems & real-time applications
    1. Processing fast data with Apache Spark: A tale of two APIs - Gerard Maas (Lightbend) 00:42:00
    2. Using a global data fabric to run a mixed cloud deployment - Jim Scott (MapR Technologies) 00:41:29
    3. Unlocking the world of stream processing with KSQL, the streaming SQL engine for Apache Kafka - Michael Noll (Confluent) 00:39:24
    4. StreamDM: Advanced data science with Spark Streaming - Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Huawei) 00:33:48
    5. Real-time deep learning on video streams - Eran Avidan (Intel) 00:34:56
    6. Machine-learned model quality monitoring in fast data and streaming applications - Emre Velipasaoglu (Lightbend) 00:35:54
    7. Understanding Spark tuning with auto-tuning; or, Magical spells to stop your pager going off at 2:00am - Holden Karau (Google), Rachel Warren (Salesforce Einstein) 00:35:59
    8. You’re doing it wrong: How Zoomdata rearchitected streaming - Erin Recachinas (Zoomdata) 00:35:56
    9. Big data at speed - Mark Grover (Lyft), Ted Malaska (Blizzard Entertainment) 00:39:52
    10. Machine learning at Intuit: Five delightful use cases - Calum Murray (Intuit) 00:31:04
    11. A high-performance system for deep learning inference and visual inspection - Moty Fania (Intel) 00:37:27
    12. Complex event processing with Apache Flink - Kostas Kloudas (data Artisans) 00:19:40
    13. Learning how to design automatically updating AI with Apache Kafka and Deeplearning4j - Jason Bell (MastodonC) 00:36:19
    14. DevOps at ING Analytics: Combining data engineering with data operations - Giuseppe D'alessio (ING Group) 00:29:53