Strata Data Conference 2019 - London, United Kingdom

Video description

The Strata Data Conference, the world's largest gathering of the data community, came to London April 29th to May 2nd, 2019. Many of Big Data's best practitioners, strategists, and decision makers spoke, providing expert guidance on the skills, technologies, and processes required to build successful data-driven projects and organizations. This video compilation gives you a front row view of the best keynotes, tutorials, and technical sessions delivered at the conference.

It contains keynote speeches from data visionaries such as Shingai Manjengwa (CEO, Fireside Analytics Inc.), Michael Tidmarsh (CTO, Ogilvy), Cassie Kozyrkov (Chief Decision Scientist, Google), David Boyle (Customer Insights Director, Harrods), Cait O'Riordan (CIO, Financial Times), Sandra Wachter (University of Oxford), and from historian and futurist James Burke. It includes Sandeep Uttamchandani's (Intuit) talk on the three patterns to best manage data dictionaries; Simona Meriam's (Nielsen) review of the best techniques to manage Spark-Kafka consumer offsets in a relational database; Guoqiong Song's (Intel) talk on LSTM-based time series anomaly detection using Analytics Zoo for Spark and BigDL; Wolff Dobson's (Google) update on TensorFlow's best practices; and a host of other talks covering the latest in data engineering and architecture; data science, machine learning and AI; data law and ethics; data streaming and Iot; and data visualization and UX.

Highlights include:

  • Hours of Strata London 2019's best keynotes, tutorials, and technical sessions to study and absorb on your own schedule.
  • The Strata Business Summit: total access to all of the Summit's tutorials, tech talks, and exclusive Executive Briefings, including Pete Skomoroch (Workday) on the must-do requirements for creating future proof AI/machine learning focused businesses; Lidia Crespo (Santander UK) on how to use Hadoop to defend privacy; and Dean Wampler (Lightbend) on what it takes to use machine learning in fast data pipelines.
  • Findata Day: hours of data-meets-finance sessions delivered by some of the world's top bankers, analysts, entrepreneurs, financiers, and technologists, including Martin Leijen (Rabobank) on Rabobank's AI-enabling Data and Intelligence Lab; Charlotte Werger (Van Lanschot Kempen) on transforming a traditional wealth manager to a cutting-edge data-driven company; and Daniel First (QuantumBlack) on how to operationalize risk management with machine learning.
  • Data Case Studies Day: sessions describing data in action across a wide range of companies and verticals led by the directors of those efforts, such as Marc Rind (ADP), Juan Bengochea (Royal Caribbean Cruise Lines), Semih Kumluk (Turkcell), and Simon Moritz (Ericsson).
  • Deep-dive big data tutorials into must-know technologies, such as how to do time series forecasting with Azure ML; how to use AWS serverless technologies to analyze large datasets; how to design and build machine learning models using TensorFlow, how to do real-time SQL stream processing at scale with Apache Kafka and KSQL, and how to get ready for CCPA and GDPR regulations governing data security and governance.

Table of contents

  1. Keynotes
    1. The enterprise data cloud - Mick Hollison (Cloudera)
    2. Making data science useful - Cassie Kozyrkov (Google)
    3. Sustaining Machine Learning in the Enterprise - Ben Lorica (O'Reilly Media)
    4. Finding your North Star - Cait O'Riordan (Financial Times)
    5. Making the future - James Burke
    6. The Unstoppable Rise of White Box data - Chris Taggart (OpenCorporates)
    7. Building data science capacity in your organization - Shingai Manjengwa (Fireside Analytics Inc.)
    8. Combining creativity and analytics - David Boyle (Harrods)
    9. Rise of the (advertising) machines - Michael Tidmarsh (Ogilvy)
    10. Privacy, identity, and autonomy in the age of big data and AI - Sandra Wachter (University of Oxford)
  2. Sponsored
    1. Oracle's second-generation cloud: Optimized for the partner ecosystem (sponsored by Oracle Cloud Infrastructure) - Ben Lackey (Oracle)
    2. Augment your recommender system with transfer learning on images (sponsored by Dataiku) - Larry Orimoloye (Dataiku)
    3. Data catalogs are changing the nature of working with data (sponsored by Alation) - Debora Seys (Independent)
    4. How a LiveData strategy breaks down barriers to overcome data gravity (sponsored by WANdisco) - Joel Horwitz (WANdisco)
    5. How retailers can leverage data to stay competitive in an ever-changing digital landscape (sponsored by Data Reply) - Luca Piccolo (Data Reply), Michele Miraglia (Data Reply)
    6. Is it possible to regulate machine learning? Dream versus R (sponsored by AXA) - Marcin Detyniecki (AXA)
    7. Augmented OLAP for big data from on-premises to multicloud (sponsored by Kyligence) - Luke Han (Kyligence)
    8. Intelligent design patterns for cloud-based analytics and BI (sponsored by Arcadia Data) - Shant Hovsepian (Arcadia Data)
  3. Data Engineering and Architecture
    1. Model governance and model ops in the enterprise - Harish Doddi (Datatron Technologies), Jerry Xu (Datatron Technologies)
    2. The future of cloud native data warehousing: Emerging trends and technologies - Greg Rahn (Cloudera)
    3. Improving Spark downscaling; Or, Not throwing away all of our work - Holden Karau (Google), Mikayla Konst (Google), Ben Sidhom (Google)
    4. Picking Parquet: Improved performance for selective queries in Impala, Hive, and Spark - Anna Szonyi (Cloudera), Zoltán Borók-Nagy (Cloudera)
    5. Running SQL-based workloads in the cloud at 20x–200x lower cost using Apache Arrow - Jacques Nadeau (Dremio)
    6. Half-correct and half-wrong collective data wisdom: 3 patterns to sanity - Sandeep Uttamchandani (Intuit)
    7. Transforming a financial services data infrastructure for the modern era by building a PCI DSS-compliant data platform from the ground up, on AWS - Eoin O'Flanagan (NewDay), Darragh McConville (Kainos)
    8. Deep learning with TensorFlow and Spark using GPUs and Docker containers - Thomas Phelan (BlueData)
    9. Continuous intelligence: Keeping your AI application in production - Arif Wider (ThoughtWorks), Emily Gorcenski (ThoughtWorks)
    10. Application intelligence: Bridging the gap between human expertise and machine learning - Rebecca Simmonds (Red Hat), Michael McCune (Red Hat)
    11. Unlocking insights in AI by building a feature store - Willem Pienaar (GOJEK), Zhi Ling Chen (GOJEK)
    12. How do you evolve your data infrastructure? - Neelesh Salian (Stitch Fix)
    13. Disrupting data discovery - Mark Grover (Lyft)
    14. Scaling Impala: Common mistakes and best practices - Manish Maheshwari (Cloudera)
    15. Mutant tests too: The SQL - Jaydene Green (, Elliot West (
    16. Model serving via Pulsar functions - Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
    17. Mastering data with Spark and machine learning - Sonal Goyal (Nube)
    18. Mass production of AI solutions - Nate Keating (Google)
    19. Leveraging metadata for automating delivery and operations of advanced data platforms - Peter Billen (Accenture)
  4. Data Science, Machine Learning AI
    1. Learning with limited labeled data - Shioulin Sam (Cloudera Fast Forward Labs)
    2. Solving data cleaning and unification using human-guided machine learning - Ihab Ilyas (University of Waterloo | Tamr)
    3. Using machine learning for stock picking - Alun Biffin (Van Lanschot Kempen), David Dogon (Van Lanschot Kempen)
    4. Improving infrastructure efficiency with unsupervised algorithms - Alexandre Hubert (Dataiku)
    5. A Magic 8 Ball for optimal cost and resource allocation for the big data stack - Shivnath Babu (Unravel Data Systems | Duke University), Alkis Simitsis (Micro Focus)
    6. Dealing with data scarcity in natural language processing - Yves Peirsman (NLP Town)
    7. How to mitigate mobile fraud risk by data analytics - Seonmin Kim (LINE)
    8. Explainable machine learning in fintech - Eitan Anzenberg (Flowcast AI)
    9. Reinforcement learning: A gentle introduction and an industrial application - Christian Hidber (bSquare)
    10. Inclusive design: Deep learning on audio in Azure, identifying sounds in real time - Xiaoyong Zhu (Microsoft), Swetha Machanavajhala (Microsoft)
    11. 8 prerequisites of a graph query language - Mingxi Wu (TigerGraph)
  5. Findata Day
    1. Data science transformation: Transforming a traditional wealth manager to a cutting-edge data-driven company - Charlotte Werger (Van Lanschot Kempen)
    2. Operationalizing risk management for machine learning - Daniel First (QuantumBlack)
    3. How NLP is helping a European financial institution enhance customer experience - Tal Doron (GigaSpaces)
    4. Insurance and the gig economy - Alistair Croll (Solve For Interesting)
    5. Designing the foundation for a data-driven future in financial services - Nicolette Bullivant (Santander UK Technology)
    6. On the accountability of black boxes: How we can control what we can’t exactly measure - Yiannis Kanellopoulos (Code4Thought)
    7. Real estate, real AI: Insights and decisions in the world's largest asset class - Romi Mahajan (Quantarium)
    8. Modeling the Tesla narrative - Rashed Iqbal (Investment and Development Office)
  6. Security and Privacy
    1. India's data dilemma with India Stack - Sundeep Reddy Mallu (Gramener)
    2. Building a secure and transparent ML pipeline using open source technologies - Nick Pentreath (IBM)
    3. Executive Briefing: Big data in the era of heavy worldwide privacy regulations - Mark Donsky (Okera), Nikki Rouda (Amazon Web Services)
    4. Fair, privacy-preserving, and secure ML - Mikio Braun (Zalando SE)
    5. The Lyft data platform: Now and in the future - Mark Grover (Lyft), Deepak Tiwari (Lyft)
    6. The vindication of big data: How Santander UK uses Hadoop to defend privacy - Maurício Lins (everis consultancy UK), Lidia Crespo (Santander UK)
    7. Opening the black box: Explainable AI (XAI) - Maren Eckhoff (QuantumBlack)
    8. Federated learning: Machine learning with privacy on the edge - Chris Wallace (Cloudera)
    9. Deep learning for speech synthesis: The good news, the bad news, and the fake news - Scott Stevenson (Faculty)
    10. Data science at Deutsche Telekom: Predicting global travel patterns and network demand - Vaclav Surovec (Deutsche Telekom), Gabor Kotalik (Deutsche Telekom)
    11. Executive Briefing: The intelligent edge and the demise of big data? - Alasdair Allan (Babilim Light Industries)
    12. The vegan data diet: How Wikipedia cuts down privacy issues while keeping data fit - Marcel Ruiz Forns (Wikimedia Foundation)
    13. Evaluating cybersecurity defenses with a data science approach - Brennan Lodge (Goldman Sachs), Jay Kesavan (Bowery Analytics LLC)
  7. Culture and organization
    1. An Innovation Architecture industrializes AI from PoCs to production - Teresa Tung (Accenture Labs), Jean-Luc Chatelain (Accenture)
    2. Data-driven digital transformation and jobs: The new software hierarchy and ML - Robert Cohen (Economic Strategy Institute)
  8. Executive Briefing and best practices
    1. Implementing enterprise data management in industrial and scientific organizations - Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)
    2. Executive Briefing: From the edge to AI—Taking control of your data for fun and profit - Mick Hollison (Cloudera)
    3. Executive Briefing: Why managing machines is harder than you think - Pete Skomoroch (Workday)
    4. Executive Briefing: Overview of data governance - Paco Nathan (
    5. Executive Briefing: 5 things every executive should NOT know - Ellen Friedman (MapR Technologies)
    6. Executive Briefing: Using a domain knowledge graph to manage AI at scale - Teresa Tung (Accenture Labs), Jean-Luc Chatelain (Accenture)
    7. Executive Briefing: Analytics for executives - Brandy Freitas (Pitney Bowes)
    8. Starting with the end in mind: Lessons learned from data strategies that work - Vidya Raman (Cloudera)
    9. Executive Briefing: The hidden data scientists lurking in your company - Jack Norris (MapR Technologies)
    10. Executive Briefing: AWS technology trends—Data lakes and analytics - Nikki Rouda (Amazon Web Services)
    11. Executive Briefing: What it takes to use machine learning in fast data pipelines - Dean Wampler (Lightbend)
  9. Data Case Studies
    1. AI for social good: Saving the planet through data science - Ganes Kesari (Gramener Inc)
    2. Data-intense profiling of points of consumption to increase sales and marketing effectiveness - Cecilia Marchi (Jakala)
    3. The power of merging multifunctional expertise to create innovative, data-driven products - Marc Rind (ADP)
    4. Building custom machine learning models for production, without ML expertise - Alicia Williams (Google)
    5. Data transformation of Turkcell - Semih Kumluk (Turkcell)
    6. When you don’t really know what to do with this huge pile of strategic data - Caroline Goulard (Dataveyes)
    7. The digital truth and the physical twin - Simon Moritz (Ericsson)
    8. Machine learning in aviation is finally taking off - Samuel Cristóbal (Innaxis)
    9. Using electronic health records to predict health risks associated with obesity - Volker Schnecke (Novo Nordisk)
    10. From data to data-driven to an AI-ready company: The culture change makes the difference - Julia Butter (Scout24)
    11. How easyJet transformed to create a listening enterprise data hub in the cloud - Aaronpal Dhanda (EasyJet)
    12. Practicing data science: A collection of case studies - Rosaria Silipo (KNIME)
    13. Insightful health: Amplifying intelligence in healthcare patient flow execution - Fabio Ferraretto (Accenture), Claudia Regina Laselva (Albert Einstein Jewish Hospital)
    14. Insights from engineering Europe's largest marketing platform for fashion - Dirk Petzoldt (Zalando SE)
  10. Media, Marketing, and Advertising
    1. Stream, stream, stream: Different streaming methods with Spark and Kafka - Itai Yaffe (Nielsen)
    2. Spark NLP in action: How Indeed applies NLP to standardize résumé content at scale - Alexander Thomas (Indeed), Alexis Yelton (Indeed)
    3. Recommending and searching at Spotify - Mounia Lalmas (Spotify)
    4. The evolution of data science skill sets: An analysis using exponential family embeddings - Maryam Jahanshahi (TapRecruit)
    5. Nielsen presents: Fun with Kafka, Spark, and offset management - Simona Meriam (Nielsen)
    6. Synthetic video generation: Why seeing should not always be believing - Alexander Adam (Faculty)
    7. Learning "learning to rank" - Sophie Watson (Red Hat)
  11. Law and Ethics
    1. Responsible AI innovation - Laila Paszti (GTC Law Group PC Affiliates)
    2. Using data for evil V: The AI strikes back - Duncan Ross (Times Higher Education), Francine Bennett (Mastodon C)
    3. Integrated Business Intelligence Suite: How Uber built a platform to convert raw data into knowledge - Shailesh Chauhan (Uber)
    4. Why is it so hard to do AI for good? - Duncan Ross (Times Higher Education), Giselle Cory (DataKind UK)
  12. Text and Language processing and analysis
    1. Building a sales AI platform: Key principles and lessons learned - Moty Fania (Intel)
    2. Agile NLP workflows with spaCy and Prodigy - Matthew Honnibal (Explosion AI)
    3. The unreasonable effectiveness of transfer learning on NLP - David Low (
    4. Reading China: Predicting policy change with machine learning - Weifeng Zhong (Mercatus Center at George Mason University)
    5. Fraud detection at a financial institution using unsupervised learning and text mining - David Dogon (Van Lanschot Kempen)
    6. NLP Architect by Intel's AI Lab - Moshe Wasserblat (Intel)
  13. Data Integration and Data Pipelines
    1. The changing face of ETL: Event-driven architectures for data engineers - Robin Moffatt (Confluent)
    2. Scalability-aware autoscaling of a Spark application - Anirudha Beria (Qubole), Rohit Karlupia (Qubole)
    3. Schema on read and the new logging way - David Josephsen (Sparkpost)
    4. AI for good at scale in real time: Challenges in machine learning and deep learning - Alex Jaimes (Dataminr)
    5. Learning how to perform ETL data migrations with open source tool Embulk - Jason Bell (DeskHoppa)
    6. Architecting a data platform to support analytic workflows for scientific data - Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)
  14. Deep Learning
    1. Predicting real-time transaction fraud using supervised learning - Sami Niemi (Barclays)
    2. Sequence-to-sequence modeling for time series - Arun Kejariwal (Independent), Ira Cohen (Anodot)
    3. TensorFlow for everyone - Wolff Dobson (Google)
    4. Deep learning for recommender systems - Oliver Gindele (Datatonic)
    5. Deep learning for fonts - Raghotham Sripadraj (Ericsson), Nischal Harohalli Padmanabha (Omnius)
    6. A deep learning approach to automatic call routing - Tal Doron (GigaSpaces)
  15. AI and Data technologies in the cloud
    1. The Presto Cost-Based Optimizer for interactive SQL on anything - Wojciech Biela (Starburst), Piotr Findeisen (Starburst)
    2. Serverless for data and AI - Avner Braverman (Binaris)
    3. Processing 10M samples a second to drive smart maintenance in complex IIoT systems - Geir Engdahl (Cognite), Daniel Bergqvist (Google)
    4. Deploying your real-time apps on thousands of servers and still being able to breathe - Constantin Muraru (Adobe), Dan Popescu (Adobe)
    5. Unleashing Apache Kafka and TensorFlow in hybrid architectures - Kai Wähner (Confluent)
    6. Herding elephants: Seamless data access in a multicluster clouds - Pradeep Bhadani (, Elliot West (
    7. Autoscaling Spark on Kubernetes - Holden Karau (Google), Kris Nova (VMware)
    8. From legacy to cloud: An end-to-end data integration journey - Max Schultze (Zalando SE)
  16. Visualization, Design, and UX
    1. Visually communicating statistical and machine learning methods - Michael Freeman (University of Washington)
    2. Empathy: The secret ingredient in the design of engaging data products and analytics tools - Brian O'Neill (Designing for Analytics)
    3. Science-fictional user interfaces - Mars Geldard (University of Tasmania), Paris Buttfield-Addison (Secret Lab Pty. Ltd.)
  17. Streaming and realtime analytics
    1. Report card on streaming microservices - Ted Dunning (MapR)
    2. Streaming at Lyft - Thomas Weise (Lyft)
    3. Performant time series data management and analytics with PostgreSQL - Michael Freedman (TimescaleDB)
  18. Tutorials
    1. Learning Presto: SQL on anything - Matt Fuller (Starburst) - Part 1
    2. Learning Presto: SQL on anything - Matt Fuller (Starburst) - Part 2
    3. Architecting a data platform for enterprise use - Mark Madsen (Think Big Analytics), Todd Walter (Teradata) - Part 1
    4. Architecting a data platform for enterprise use - Mark Madsen (Think Big Analytics), Todd Walter (Teradata) - Part 2
    5. Serverless machine learning with TensorFlow: Part I - Melinda King (ROI Training)
    6. Serverless machine learning with TensorFlow: Part II - Melinda King (ROI Training)
    7. Getting ready for GDPR and CCPA: Securing and governing hybrid, cloud, and on-premises big data deployments - Mark Donsky (Okera), Ifigeneia Derekli (Cloudera), Lars George (Okera), Michael Ernest (Okera)
    8. Using AWS serverless technologies to analyze large datasets - Krishnan Saidapet (REAN Cloud, A Hitachi Vantara company) - Part 1
    9. Using AWS serverless technologies to analyze large datasets - Krishnan Saidapet (REAN Cloud, A Hitachi Vantara company) - Part 2
    10. Hands-on machine learning with Kafka-based streaming pipelines - Boris Lublinsky (Lightbend), Dean Wampler (Lightbend) - Part 1
    11. Hands-on machine learning with Kafka-based streaming pipelines - Boris Lublinsky (Lightbend), Dean Wampler (Lightbend) - Part 2
    12. Cross-cloud model training and serving with Kubeflow - Holden Karau (Google), Trevor Grant (IBM), Francesca Lazzeri (Microsoft)
    13. Continuous intelligence: Moving machine learning into production reliably - Danilo Sato (ThoughtWorks), Christoph Windheuser (ThoughtWorks) - Part 1
    14. Continuous intelligence: Moving machine learning into production reliably - Danilo Sato (ThoughtWorks), Christoph Windheuser (ThoughtWorks) - Part 2
    15. Your data strategy: It should be concise, actionable, and understandable by business and IT - Peter Aiken (Data BluePrint | DAMA International | Virginia Commonwealth University) - Part 1
    16. Your data strategy: It should be concise, actionable, and understandable by business and IT - Peter Aiken (Data BluePrint | DAMA International | Virginia Commonwealth University) - Part 2
    17. Architecture and algorithms for end-to-end streaming data processing - Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio) - Part 1
    18. Architecture and algorithms for end-to-end streaming data processing - Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio) - Part 2

Product information

  • Title: Strata Data Conference 2019 - London, United Kingdom
  • Author(s): O'Reilly Media Inc.
  • Release date: May 2019
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492050551