O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Strata Data Conference - New York, NY 2018

Video Description

The chief data officer for Goldman Sachs, a cofounder of the blockchain computing platform Ethereum, Google Cloud's chief decision scientist, an expert in brain-based human-machine interfaces, and dozens of senior-level data engineers from companies like American Express, Netflix, Uber, Cloudera, Amazon Web Services, and Intel; these are just a few of the 300+ experts in data science, data technology, and data-driven business strategy who spoke at the Strata Data Conference New York 2018. Obtain this video compilation and you'll have the opportunity to see virtually everyone who presented. If you're looking for guidance on how to achieve demonstrable ROI for your data projects, architect data applications that achieve your specific goals, or just want to solve those irritating data problems that interfere with your workflows, this video compilation is for you. It contains some of the best presentations delivered at the conference, offering you an insider's perspective on data's latest strategies, tools and technologies.

Highlights include:

  • Deep dive tutorials from big data's most experienced practitioners, including Karthik Ramasamy (Streamlio) on how to build an end-to-end data processing pipeline; Vartika Singh (Cloudera) on how to leverage Spark and deep learning frameworks to understand data at scale; and Joshua Poduska (Domino Data Lab) on how to design and run data science organizations that have a sustained, scalable, and predictable impact on business outcomes.
  • Thought-provoking keynotes from data visionaries such as Jeffrey Wecker (Goldman Sachs), Joseph Lubin (Consensus Systems), Jacob Ward (CNN), Julia Angwin (ProPublica), Amber Case (MIT Media Lab), Ziya Ma (Intel), and Amanda Pustilnik (Center for Law, Brain & Behavior, Massachusetts General Hospital).
  • The Strata Business Summit: Sessions and executive briefings detailing how companies like Cerner, Fidelity Investments, Deere & Company, Munich Re, and Navistar built their successful data strategies. Includes talks by Erin Coffman (Airbnb) on how Airbnb drives data driven decision making with its Data University; Jennifer Prendki on how data science managers use Agile techniques to make their teams more efficient; and others by Cassie Kozyrkov (Google), Kimberly Nevala (SAS Institute), Jonathan Seidman (Cloudera), and more.
  • Findata Day: Data-meets-finance sessions designed for bankers, analysts, entrepreneurs, financiers, and technologists. Includes talks such as Theresa Johnson (Airbnb) on Airbnb's revenue forecasting platform; Patrick Angeles (Cloudera) on the data platforms big banks use to comply with post-2008 financial crisis regulations, and Amro Alkhatib (Daman) on using real-time ML-based systems to automate and speed insurance claims processing.
  • Data engineering and architecture sessions, including Michael Freedman (TimescaleDB) on the time-series data management and analytics features of open-source TimescaleDB; Zhe Zhang (LinkedIn) on new frameworks that can run TensorFlow on managed clusters (Kubernetes, Mesos, Hadoop, etc.); and William Benton (Red Hat) on why data scientists should love Linux containers.
  • Data science and machine learning sessions, including Alberto Andreotti (John Snow Labs) on using AI and Spark NLP to extract facts from patient records; Archana Anandakrishnan (American Express) on DataQC Studio, AMEX's automated data quality assurance tool; and Joshua Patterson's (NVIDIA) overview of the GPU Open Analytics Initiative (GoAi).
  • Sessions devoted to streaming systems and real-time applications; sessions on big data and data science in the cloud; sessions on law, ethics, governance; sessions on visualization and user experience; and sessions on platform security and cybersecurity.
  • 100+ hours from the Strata Data Conference NY 2018 to view at your own pace.

Table of Contents

  1. Keynotes
    1. The future of data warehousing - Anupam Singh (Cloudera), Brian Coyne (PNC) 00:14:54
    2. Managing risk in machine learning - Ben Lorica (O'Reilly Media) 00:10:07
    3. The answer to life, the universe, and everything: But can you get that into production? (sponsored by MapR) - Ted Dunning (MapR) 00:08:02
    4. Von Neumann to Deep Learning: Data Revolutionizing the Future - Jeffrey Wecker (Goldman Sachs) 00:13:13
    5. AI, ML, and the IoT will destroy the data center and the cloud (just not in the way you think) (sponsored by Cisco) - DD Dasgupta (Cisco) 00:06:36
    6. The Missing Piece - Cassie Kozyrkov (Google) 00:20:31
    7. Leveraging the best of the past to power a better future (sponsored by MemSQL) - Drew Paroski (MemSQL) 00:12:20
    8. The power of Ethereum - Joseph Lubin (Consensus Systems) 00:15:39
    9. Sound design and the future of experience - Amber Case (MIT Media Lab) 00:13:50
    10. Wait. . .pizza is a vegetable? Decoding regulations using machine learning (sponsored by IBM) - Dinesh Nirmal (IBM) 00:04:55
    11. Practical ML today and tomorrow - Hilary Mason (Cloudera Fast Forward Labs) 00:09:37
    12. Derive value from analytics and AI at scale (sponsored by Intel) - (Ziya Ma) (Intel) 00:06:00
    13. Quantifying forgiveness - Julia Angwin (ProPublica) 00:14:16
    14. Smarter cities through Geotab with BigQuery ML and geospatial analytics (sponsored by Google Cloud) - Chad W. Jennings (Google) 00:07:15
    15. Brain-based human-machine interfaces: New developments, legal and ethical issues, and potential uses - Amanda Pustilnik (University of Maryland School of Law | Center for Law, Brain & Behavior, Mass. General Hospital) 00:13:47
    16. The data imperative - Ben Sharma (Zaloni) 00:06:33
    17. Black box: How AI will amplify the best and worst of humanity - Jacob Ward (CNN | Al Jazeera | PBS) 00:19:05
  2. Sponsored
    1. Feet on the ground, head in the clouds (sponsored by AtScale) - Mark Stange-Tregear (Ebates) 00:38:09
    2. The importance of experimental iteration: A data-centric approach to an AI project (sponsored by Globant) - Antonio Fragoso (Globant) 00:38:09
    3. Augmented data engineering: Leveraging machine learning in data profiling and discovery (sponsored by Io-Tahoe) - Arun Murugan (GE Digital), Jeff Miller (GE) 00:42:10
    4. On the road to digital transformation, AI is a team sport (sponsored by Oracle + DataScience.com) - Ian Swanson (Oracle) 00:30:37
    5. Interactive business intelligence and OLAP on big data lakes using a Spark-native fast data mart (sponsored by Oracle + DataScience.com) - Srikanth Desikan (Oracle) 00:35:22
    6. Quick, reliable, and cost-effective ways to operationalize big data apps (sponsored by Unravel) - Shivnath Babu (Unravel Data Systems, Duke University), Madhusudan Tumma (TIAA) 00:30:33
    7. A developer's guide to building AI applications (sponsored by Microsoft) - Wee Hyong Tok (Microsoft) 00:40:59
    8. From data lakes to the data fabric: Our vision for digital strategy (sponsored by Cambridge Semantics) - Ben Szekely (Cambridge Semantics) 00:35:02
    9. Best practices for migrating big data workloads to Amazon Web Services (sponsored by Amazon Web Services) - Bruno Faria (Amazon Web Services) 00:40:23
    10. Commercial software in an increasingly open source ecosystem (sponsored by SAS) - Paul Kent (SAS) 00:33:53
    11. How Bell Canada increased the scale of BI exponentially with OLAP on big data (sponsored by Kyvos Insights) - Mark Huang (Bell Canada) 00:31:43
    12. Guidebook to unwind the enterprise "data hairball" and get ready for AI (sponsored by IBM) - Tim Davis (IBM) 00:47:03
    13. Bringing together machine and human intelligence (sponsored by SAP) - Richard Mooney (SAP) 00:37:50
    14. Accelerate big data analytics and AI with NetApp hybrid cloud architecture (sponsored by NetApp) - Karthikeyan Nagalingam (NetApp) 00:40:04
    15. From two weeks in Python to two hours in Pentaho: Building modern big data pipelines for machine learning (sponsored by Hitachi Vantara) - David Huh and Kevin Haas (Hitachi Vantara) 00:31:13
    16. Refactor your data warehouse with mobile analytics products (sponsored by Kyligence) - Zhi Zhu (China Construction Bank ), Luke Han (Kyligence) 00:40:29
    17. A tale of two BI standards: Data warehouses and data lakes (sponsored by Arcadia Data) - Randy Lea (Arcadia Data) 00:23:33
    18. Hadoop-compatible filesystems: The limits of "compatible" (sponsored by WANdisco) - Paul Scott-Murphy (WANdisco) 00:34:10
    19. How the blurring of memory and storage is revolutionizing the data era (sponsored by Intel) - Arakere Ramesh (Intel), Bharath Yadla (Aerospike) 00:35:17
    20. Kubernetes plays Cupid for data scientists and IT (sponsored by MapR) - Skyler Thomas (MapR) 00:36:19
    21. How to avoid drowning in logs: Streaming 80 billion events and batch processing 40 TB/hour (sponsored by Pure Storage) - Ivan Jibaja (Pure Storage) 00:44:01
    22. Speed, scale, smarts: GPU-powered analytics for the extreme data economy (sponsored by Kinetica) - Michael Mahoney (Kinetica) 00:39:19
    23. Assumptions, constraints, and risks: How the wrong assumptions can jeopardize any model (sponsored by IBM) - Jennifer Shin (8 Path Solutions | NYU Stern | IBM) 00:27:50
    24. Building the Bridge from Big Data to ML, featuring Geotab (sponsored by Google Cloud) - Bob Bradley (Geotab), Chad W. Jennings (Google) 00:38:18
    25. Data operations problems created by deep learning and how to fix them (sponsored by MapR) - Jim Scott (MapR Technologies) 00:41:33
    26. Redis for velocity and volume: Fast data ingest and probabilistic data structures (sponsored by Redi Labs) - Kyle Davis (Redis Labs) 00:35:47
    27. Enabling predictive maintenance using automated IoT data pipelines (sponsored by BMC) - Basil Faruqui (BMC Software) 00:37:40
    28. Getting the most out of advanced analytics with people (sponsored by Alteryx) - Patrick Nussbaumer (Alteryx) 00:35:34
    29. Conda, Docker, and Kubernetes: The cloud-native future of data science (sponsored by Anaconda) - Mathew Lodge (Anaconda) 00:35:44
    30. Simplifying AI infrastructure: Lessons in scaling a deep learning enterprise (sponsored by NVIDIA) - Darrin Johnson (NVIDIA) 00:36:51
    31. Deep learning: Assessing analytics project feasibility and requirements (sponsored by NVIDIA) - Ward Eldred (NVIDIA) 00:36:24
    32. Kubernetes on GPUs (sponsored by NVIDIA) - Michael Balint (NVIDIA) 00:29:43
  3. Data science and machine learning
    1. Programming by input-output examples - Sumit Gulwani (Microsoft) 00:38:37
    2. Breaking the rules: End-stage renal disease prediction - Olga Cuznetova (Optum), Manna Chang (Optum) 00:39:38
    3. A roadmap for open data science and AI for business: Panel discussion with State Street - Bethann Noble (Cloudera), Daniel Huss (State Street), Abhishek Kodi (State Street) 00:40:30
    4. Perverse incentives in metrics: Inequality in the like economy - Bonnie Barrilleaux (LinkedIn) 00:35:07
    5. Semantic recommendations - Shioulin Sam (Cloudera Fast Forward Labs) 00:39:39
    6. Deploying machine learning models in the enterprise - Diego Oppenheimer (Algorithmia) 00:48:45
    7. 50 reasons to learn the shell for doing data science - Jeroen Janssens (Data Science Workshops B.V.) 00:38:44
    8. Why data scientists should love Linux containers - William Benton (Red Hat) 00:37:23
    9. Diversification in recommender systems: Using topical variety to increase user satisfaction - Ahsan Ashraf (Pinterest) 00:40:24
    10. When Tiramisu meets online fashion retail - Patty Ryan (Microsoft), CY Yam (Microsoft), Elena Terenzi (Microsoft) 00:25:45
    11. Scalable machine learning for data cleaning - Ihab Ilyas (University of Waterloo | Tamr) 00:35:12
    12. Solving the cold start problem: Data and model aggregation using differential privacy - Chang Liu (Georgian Partners ) 00:43:15
    13. Machine learning for time series: What works and what doesn't - Mikio Braun (Zalando SE) 00:39:55
    14. BlazeIt: An exploratory video analytics engine - Daniel Kang (Stanford University) 00:36:46
    15. Predicting residential occupancy and hot water usage from high-frequency, multivector utilities data - Cristobal Lowery (Baringa), Marc Warner (ASI) 00:45:21
    16. Achieving personalization with LSTMs - Ankit Jain (Uber) 00:44:19
    17. Harnessing and customizing state-of-the-art recommendation solutions with OpenRec - Longqi Yang (Cornell Tech, Cornell University) 00:31:50
    18. Democratizing deep learning with transfer learning - Lars Hulstaert (Microsoft) 00:40:22
  4. Emerging technologies & case studies
    1. What's the Hadoop-la about Kubernetes? - Anant Chintamaneni (BlueData), Nanda Vijaydev (BlueData) 00:41:32
    2. Progress for big data in Kubernetes - Ted Dunning (MapR) 00:38:32
    3. High-performance messaging with Apache Pulsar - Karthik Ramasamy (Streamlio), Matteo Merli (Streamlio) 00:39:50
    4. Driving predictive analytics for the IoT and connected vehicles - Steve Otto (Navistar) 00:21:40
    5. Sewers can talk: Understanding the language of sewers - Greg Quist (SmartCover Systems) 00:23:09
  5. Data engineering and architecture
    1. The move to a modern data platform in the cloud: Pitfalls to avoid and best practices to follow - Amandeep Khurana (Okera) 00:38:58
    2. Data governance: A big job that's getting bigger - Andrew J Brust (ZDNet | Blue Badge Insights) 00:40:30
    3. Machine learning for nonstationary streaming data using Structured Streaming and StreamDM - Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech) 00:36:09
    4. Marmaray: A generic, scalable, and pluggable Hadoop data ingestion and dispersal framework - Danny Chen (Uber Technologies), Omkar Joshi (Uber Technologies), Eric Sayle (Uber Technologies) 00:39:13
    5. Setting up a lightweight distributed caching layer using Apache Arrow - Jacques Nadeau (Dremio) 00:37:01
    6. How Komatsu is improving mining efficiencies using the IoT and machine learning - Shawn Terry (Komatsu) 00:39:36
    7. From flat files to deconstructed database: The evolution and future of the big data ecosystem - Julien Le Dem (WeWork) 00:46:09
    8. Using machine learning to drive intelligence at the edge - Dave Shuman (Cloudera), Bryan Dean (Red Hat) 00:42:01
    9. Performant time series data management and analytics with Postgres - Michael Freedman (TimescaleDB) 00:42:15
    10. Case study: A Spark-based distributed simulation optimization architecture for portfolio optimization in retail banking - Kaushik Deka (Novantas), Ted Gibson (Novantas) 00:31:27
    11. The future of ETL isn’t what it used to be. - Gwen Shapira (Confluent) 00:37:54
    12. Tracking data lineage at Stitch Fix - Neelesh Srinivas Salian (Stitch Fix) 00:38:48
    13. Real-time analytics and BI with data lakes and data warehouses using Kudu, HBase, Spark, and Kafka: Lessons learned - Mauricio Aristizabal (Impact) 00:37:51
    14. Clouds and containers: Case studies for big data - Paul Curtis (MapR Technologies) 00:40:23
    15. Lessons learned building a scalable and extendable data pipeline for Call of Duty - Yaroslav Tkachenko (Activision) 00:45:05
    16. A comparative analysis of the fundamentals of AWS and Azure - Jason Wang (Cloudera), Suraj Acharya (Cloudera), Tony Wu (Cloudera) 00:37:18
    17. Apache Kafka and the four challenges of production machine learning systems - Jay Kreps (Confluent) 00:41:59
    18. TuneIn: How to get your jobs tuned while you are sleeping - Manoj Kumar (LinkedIn), Pralabh Kumar (LinkedIn), Arpan Agrawal (LinkedIn) 00:30:00
    19. Building a high-performance model serving engine from scratch using Kubernetes, GPUs, Docker, Istio, and TensorFlow - Chris Fregly (PipelineAI) 00:38:55
    20. Big data at speed - Ted Malaska (Capital One), Mark Grover (Lyft) 00:34:55
    21. IoT edge processing with Apache NiFi, Apache MiniFi, and multiple deep learning libraries - Timothy Spann (DZone) 00:39:52
    22. MLflow: An open platform to simplify the machine learning lifecycle - Mani Parkhe (Databricks), Andrew Chen (Databricks) 00:40:25
  6. Text and Language processing and analysis
    1. Document vectors in the wild: Building a content recommendation system for Reuters.com - James Dreiss (Reuters) 00:42:59
    2. Anxiety at scale: How Investopedia used readership data to track market volatility - Masha Westerlund (Investopedia) 00:30:30
    3. From emotion analysis and topic extraction to narrative modeling - Andreea Kremm (Netex Group), Mohammed Ibraaz Syed (UCLA) 00:39:41
    4. Automating business processes with large-scale knowledge graphs - Mike Tung (Diffbot) 00:53:25
    5. Applying petabyte-scale analytics and machine learning to billions of news reading sessions - Andrew Montalenti (Parse.ly ) 00:40:19
  7. Data Platforms
    1. DIY versus designer approaches to deploying data center infrastructure for machine learning and analytics - Cory Minton (Dell EMC), Colm Moynihan (Cloudera) 00:47:26
    2. Bighead: Airbnb's end-to-end machine learning platform - Atul Kale (Airbnb), Xiaohan Zeng (Airbnb) 00:40:15
    3. Zipline: Airbnb's data management platform for machine learning - Varant Zanoyan (Airbnb) 1:03:05
    4. How to cost-effectively and reliably build infrastructure for machine learning - Osman Sarood (Mist Systems) 00:40:41
    5. TonY: Native support of TensorFlow on Hadoop - Jonathan Hung (LinkedIn), Keqiu Hu (LinkedIn), Zhe Zhang (LinkedIn) 00:36:04
    6. Data at Netflix: See what’s next - Michelle Ufford (Netflix) 00:37:49
    7. Deep learning on YARN: Running distributed TensorFlow, MXNet, Caffe, and XGBoost on Hadoop clusters - Wangda Tan (Hortonworks inc) 00:38:14
    8. A high-performance system for deep learning inference and visual inspection - Moty Fania (Intel), Sergei Kom (Intel) 00:39:30
    9. A/B testing at Uber: How we built a BYOM (bring your own metrics) platform - Milene Darnis (Uber) 00:40:29
    10. Using Alluxio as a fault-tolerant pluggable optimization component of JD.com's compute frameworks - tao huang (JD.com), mang zhang (JD.com), 冰 白 (JD.com) 00:20:40
    11. Aetna's advanced analytics platform, Data Fabric - Occhio Orsini (Aetna) 00:52:47
    12. Scaling data infrastructure in the fashion world; or, “What is this? Business intelligence for ants?” - Francesco Mucio (Zalando SE) 00:40:40
  8. Big data and data science in the cloud
    1. Circuit breakers to safeguard for garbage in, garbage out - Sandeep Uttamchandani (Intuit) 00:43:21
    2. Job recommendations leveraging Deep Learning using Analytics Zoo on Apache Spark and BigDL - Guoqiong Song (Intel), Wenjing Zhan (Talroo), Jacob Eisinger (Talroo ) 00:48:10
    3. Cassandra versus cloud databases - Jonathan Ellis (DataStax) 00:40:14
    4. Optimizing Apache Impala for a cloud-based data warehouse - Greg Rahn (Cloudera) 00:41:37
    5. Deep learning on audio in Azure to detect sounds in real time - Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft) 00:41:02
    6. Building turnkey recommendations for 5% of internet video - Nir Yungster (JW Player), Kamil Sindi (JW Player) 00:42:51
    7. Best practices for developing an enterprise data hub to collect and analyze 1 TB of data a day from a multiple services with Apache Kafka and Google Cloud Platform - Kenji Hayashida (Recruit Lifestyle co., ltd.), Toru Sasaki (NTT DATA Corporation) 00:38:19
  9. Strata Business Summit
    1. From data governance to AI governance: The CIO's new role - JF Gagne (Element AI) 00:41:45
    2. Agile for data science teams - Jennifer Prendki (Figure Eight) 00:41:00
    3. Executive Briefing: Profit from AI and machine learning—The best practices for people and process - Tony Baer (Ovum), Florian Douetteau (DATAIKU) 00:38:26
    4. Realizing the true value in your data: Data-drivenness assessment - Lawrence Cowan (Cicero Group) 00:34:31
    5. Executive Briefing: Managing successful data projects—Technology selection and team building - Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera) 00:40:10
    6. The lure of "the one metric that matters" - Adil Aijaz (Split Software) 00:38:22
    7. Executive Briefing: Enhance your data lake with comprehensive data governance to improve adoption and meet compliance needs - Sanjeev Mohan (Gartner) 00:39:17
    8. Rationalizing risk in AI and ML - Kimberly Nevala (SAS Institute) 00:39:33
    9. The care and feeding of data scientists: Concrete tips for retaining your data science team - Michelangelo D'Agostino (ShopRunner) 00:42:06
    10. Data and privacy at scale at Wikipedia - Nuria Ruiz (Wikimedia) 00:34:10
    11. Executive Briefing: Why machine-learned models crash and burn in production and what to do about it - David Talby (Pacific AI) 00:38:22
    12. Executive Briefing: From Business to AI—The missing pieces in becoming "AI ready" - Mikio Braun (Zalando SE) 00:42:32
    13. Enacting Data Subject Access Rights for GDPR with data services and data management - Jean-Michel Franco (Talend) 00:36:02
    14. Executive Briefing: Analytics for executives—Building an approachable language to drive data science in your organization - Brandy Freitas (Pitney Bowes) 00:38:35
    15. Building it beautiful: Analyzing the effectiveness of platform products and marketing at scale - Joshua Laurito (Squarespace) 00:40:35
    16. Executive Briefing: Best practices for human in the loop—The business case for active learning - Paco Nathan (derwen.ai) 00:42:09
    17. Analytics maturity: Industry trends and financial impacts - Bill Franks (International Institute For Analytics) 00:34:31
    18. Real-time machine intelligence in IndyCar and Tour de France - Yasuyuki Kataoka (NTT Innovation Institute, Inc.) 00:37:51
    19. A day in the life of a data scientist: How do we train our teams to get started with AI? - Francesca Lazzeri (Microsoft), Jaya Mathew (Microsoft) 00:39:07
  10. Data-driven business management
    1. Visualize AI to spot new trading opportunities - Paul Lashmet (Arcadia Data) 00:28:26
    2. The business case for messy data - James Psota (Panjiva ) 00:29:07
    3. Modernizing operational architecture with big data: Creating and implementing a modern data strategy - Jennifer Lim (Cerner) 00:30:30
    4. How to be aggressively tone-deaf using data; or, We should all be "for-benefits." - Ann Nguyen (Whole Whale) 00:32:17
    5. "Moneyballing" recruiting: A data-driven approach to battling bottlenecks and biases in hiring - Maryam Jahanshahi (TapRecruit) 00:28:07
    6. Real-time automated claim processing: The surprising utility of NLP methods on non-text data - Amro Alkhatib (National Health Insurance Company-Daman) 00:35:39
    7. From chaos to insight: Automatically derive value from your user-generated content - Stephanie Fischer (datanizing GmbH) 00:30:57
    8. Cataloging the data lake for distributed analytics innovation at Munich Re - Andreas Kohlmaier (Munich Re) 00:30:39
    9. Why the internet of things doesn’t exist but will still reshape your business - Ajay Kulkarni (TimescaleDB) 00:26:02
    10. Data science in an Agile environment: Methods and organization for success - Sam Helmich (Deere & Company) 00:30:08
    11. Too big data to fail: How banks use big data to prevent the next financial crisis - Patrick Angeles (Cloudera) 00:27:48
    12. Decision-centricity: Operationalizing analytics and data science in health systems - Mike Berger (Mount Sinai Health System) 00:28:30
    13. The revenue forecasting platform at Airbnb - Theresa Johnson (Airbnb) 00:26:11
    14. Self-reliant, secure, end-to-end data, activity, and revenue analytics: A roadmap for the airline industry - Katharina Warzel (EveryMundo) 00:27:45
    15. Improving patient screening by applying predictive analytics to electronic medical records. - Ian Brooks (Hortonworks) 00:25:11
    16. From strategy to implementation: Putting data to work at USA for UNHCR - Friederike Schuur (Cloudera), Rita Ko (USA for UNHCR) 00:36:38
    17. Executive Briefing: Most data-driven cultures aren’t - Cassie Kozyrkov (Google) 00:41:31
  11. Law, ethics, governance
    1. Beyond explainability: Regulating machine learning in practice - Andrew Burt (Immuta) 00:42:39
    2. If you thought politics was dirty, you should see the analytics behind it. - John Thuma (Arcadia Data) 00:45:50
    3. Explainable artificial intelligence (XAI): Why, when, and how? - Mridul Mishra (Fidelity Investments) 00:24:36
    4. Balancing stakeholder interests in personal data governance technology - LaVonne Reimer (Lumenous) 00:37:47
    5. Mapping India - Anand S (Gramener) 00:29:53
    6. Democratizing artificial intelligence: Lessons from the real world - Swatee Singh (American Express) 00:27:04
  12. Platform security and cybersecurity
    1. Protecting sensitive data in huge datasets: Cloud tools you can use - Felipe Hoffa (Google), Damien Desfontaines (Google / ETH Zürich) 00:38:52
    2. Privacy by design: Building in data privacy and protection versus bolting it on later - Les McMonagle (BlueTalon) 00:37:16
  13. Visualization and user experience
    1. The Vega project: Building an ecosystem of tools for interactive visualization - Jeffrey Heer (Trifacta | University of Washington) 00:42:20
    2. Augmented reality: Going beyond plots in 3D - Bob Levy (Virtual Cove, Inc.) 00:33:16
    3. Stories beat statistics: How to master the art and science of data storytelling - Brent Dykes (Domo) 00:45:15
    4. Data visualization in mixed reality with Python - Anna Nicanorova (Annalect) 00:22:55
    5. UX strategies for underperforming analytics services and data products - Brian O'Neill (Designing for Analytics) 00:45:23
  14. Streaming systems & real-time applications
    1. Processing fast data with Apache Spark: A tale of two APIs - Gerard Maas (Lightbend) 00:40:26
    2. Building Fabric Answers using Apache Heron - Karthik Ramasamy (Streamlio), Andrew Jorgensen (Google) 00:40:48
    3. Why and how to leverage the power and simplicity of SQL on Apache Flink - Fabian Hueske (Apache Flink project) 00:41:35
    4. A deep dive into Kafka controller - Jun Rao (Confluent) 00:42:02
    5. Streaming big data in the cloud: What to consider and why - William Chambers (Databricks) 00:56:02
    6. AppNexus's stream-based control system for automated buying of digital ads - Brian Wu (AppNexus) 00:41:47
    7. Architectural principles for building trusted, real-time, distributed IoT systems - Dan Harple (Context Labs) 00:38:03
    8. Hudi: Unifying storage and serving for batch and near-real-time analytics - Nishith Agarwal (Uber), Balaji Varadarajan (Uber), Vinoth Chandar (Uber) 00:37:40
    9. Near-real-time anomaly detection at Lyft - Thomas Weise (Lyft), Mark Grover (Lyft) 00:40:00
    10. Executive Briefing: What you need to know about fast data - Dean Wampler (Lightbend) 00:42:51
    11. Kafka at PayPal: Enabling 400 billion messages a day - Kevin Lu (PayPal), Maulin Vasavada (PayPal), Na Yang (PayPal) 00:39:03
  15. Financial Services
    1. Using the blockchain in the enterprise - Jim Scott (MapR Technologies) 00:38:19
    2. Accelerating financial data science workflows with GPUs - Joshua Patterson (NVIDIA), Onur Yilmaz (NVIDIA) 00:35:16
    3. Using big data to unlock the delivery of personalized, multilingual real-time chat services for global financial service organizations - Tim Walpole (BJSS) 00:36:46
    4. The balancing act: Building business-relevant data solutions for the front line - Jane Tran (Unqork) 00:15:31
    5. Stochastic field theory for time series - Revant Nayar (FMI Technologies LLC ) 00:36:50
  16. Tutorials
    1. Deep learning-based search and recommendation systems using TensorFlow - Dr. Vijay Srinivas Agneeswaran (SapientRazorfish), Abhishek Kumar (SapientRazorfish) - Part 1 00:41:59
    2. Deep learning-based search and recommendation systems using TensorFlow - Dr. Vijay Srinivas Agneeswaran (SapientRazorfish), Abhishek Kumar (SapientRazorfish) - Part 2 00:47:51
    3. Deep learning-based search and recommendation systems using TensorFlow - Dr. Vijay Srinivas Agneeswaran (SapientRazorfish), Abhishek Kumar (SapientRazorfish) - Part 3 00:45:41
    4. Deep learning-based search and recommendation systems using TensorFlow - Dr. Vijay Srinivas Agneeswaran (SapientRazorfish), Abhishek Kumar (SapientRazorfish) - Part 4 00:46:20
    5. Model serving and management at scale using open source tools - Dan Crankshaw (UC Berkeley RISELab) - Part 1 00:24:24
    6. Model serving and management at scale using open source tools - Dan Crankshaw (UC Berkeley RISELab) - Part 2 00:49:40
    7. Architecting a data platform for enterprise use - Mark Madsen (Think Big Analytics), Todd Walter (Teradata) - Part 1 00:40:23
    8. Architecting a data platform for enterprise use - Mark Madsen (Think Big Analytics), Todd Walter (Teradata) - Part 2 00:48:39
    9. Architecting a data platform for enterprise use - Mark Madsen (Think Big Analytics), Todd Walter (Teradata) - Part 3 00:49:08
    10. Architecting a data platform for enterprise use - Mark Madsen (Think Big Analytics), Todd Walter (Teradata) - Part 4 00:47:05
    11. Running multidisciplinary big data workloads in the cloud - Sudhanshu Arora (Cloudera), Stefan Salandy (Cloudera), Suraj Acharya (Cloudera), Brandon Freeman (Cloudera), Jason Wang (Cloudera), Shravan Pabba (Cloudera) - Part 1 1:05:21
    12. Running multidisciplinary big data workloads in the cloud - Sudhanshu Arora (Cloudera), Stefan Salandy (Cloudera), Suraj Acharya (Cloudera), Brandon Freeman (Cloudera), Jason Wang (Cloudera), Shravan Pabba (Cloudera) - Part 2 00:35:11
    13. Designing modern streaming data applications - Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio) - Part 1 00:49:37
    14. Designing modern streaming data applications - Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio) - Part 2 00:54:50
    15. Designing modern streaming data applications - Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio) - Part 3 00:31:25
    16. Designing modern streaming data applications - Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio) - Part 4 1:04:29
    17. How to be fair: A tutorial for beginners - Aileen Nielsen (Skillman Consulting) - Part 1 00:51:26
    18. How to be fair: A tutorial for beginners - Aileen Nielsen (Skillman Consulting) - Part 2 00:35:33
    19. How to be fair: A tutorial for beginners - Aileen Nielsen (Skillman Consulting) - Part 3 00:40:14
    20. How to be fair: A tutorial for beginners - Aileen Nielsen (Skillman Consulting) - Part 4 00:37:00
    21. Apache Metron: Open source cybersecurity at scale - Carolyn Duby (Hortonworks) 1:05:23
    22. Making interactive browser-based visualizations easy in Python - James Bednar (Anaconda) - Part 1 1:05:10
    23. Making interactive browser-based visualizations easy in Python - James Bednar (Anaconda) - Part 2 1:15:59
    24. Making interactive browser-based visualizations easy in Python - James Bednar (Anaconda) - Part 3 00:43:36
    25. Data science with Unix power tools - Jeroen Janssens (Data Science Workshops B.V.) - Part 1 00:46:44
    26. Data science with Unix power tools - Jeroen Janssens (Data Science Workshops B.V.) - Part 2 00:40:32
    27. Data science with Unix power tools - Jeroen Janssens (Data Science Workshops B.V.) - Part 3 00:36:07
    28. Data science with Unix power tools - Jeroen Janssens (Data Science Workshops B.V.) - Part 4 00:44:32
    29. Learning machine learning using astronomy datasets - Viviana Acquaviva (CUNY New York City College of Technology) - Part 1 00:47:16
    30. Learning machine learning using astronomy datasets - Viviana Acquaviva (CUNY New York City College of Technology) - Part 2 00:59:52
    31. Learning machine learning using astronomy datasets - Viviana Acquaviva (CUNY New York City College of Technology) - Part 3 1:11:45
    32. From theory to data product: Applying data science methods to effect business change - Janet Forbes (T4G), Danielle Leighton (T4G), Lindsay Brin (T4G) - Part 1 00:50:06
    33. From theory to data product: Applying data science methods to effect business change - Janet Forbes (T4G), Danielle Leighton (T4G), Lindsay Brin (T4G) - Part 2 00:38:14
    34. From theory to data product: Applying data science methods to effect business change - Janet Forbes (T4G), Danielle Leighton (T4G), Lindsay Brin (T4G) - Part 3 1:21:48
    35. Recurrent neural networks for time series analysis - Bruno Gonçalves (New York University) - Part 1 00:31:50
    36. Recurrent neural networks for time series analysis - Bruno Gonçalves (New York University) - Part 2 00:40:40
    37. Recurrent neural networks for time series analysis - Bruno Gonçalves (New York University) - Part 3 00:39:16
    38. Recurrent neural networks for time series analysis - Bruno Gonçalves (New York University) - Part 4 00:29:22
    39. Architecting a next-generation data platform - Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera) - Part 1 00:31:21
    40. Architecting a next-generation data platform - Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera) - Part 2 00:56:15
    41. Architecting a next-generation data platform - Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera) - Part 3 00:33:46
    42. Architecting a next-generation data platform - Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera) - Part 4 00:49:20
    43. Hands-on Kafka streaming microservices with Akka Streams and Kafka Streams - Dean Wampler (Lightbend), Boris Lublinsky (Lightbend) - Part 1 00:44:02
    44. Hands-on Kafka streaming microservices with Akka Streams and Kafka Streams - Dean Wampler (Lightbend), Boris Lublinsky (Lightbend) - Part 2 00:41:23
    45. Hands-on Kafka streaming microservices with Akka Streams and Kafka Streams - Dean Wampler (Lightbend), Boris Lublinsky (Lightbend) - Part 3 00:56:30
    46. Managing data science in the enterprise - Joshua Poduska (Domino Data Lab), Patrick Harrison (S&P Global) - Part 1 00:43:58
    47. Managing data science in the enterprise - Joshua Poduska (Domino Data Lab), Patrick Harrison (S&P Global) - Part 2 00:51:09
    48. Managing data science in the enterprise - Joshua Poduska (Domino Data Lab), Patrick Harrison (S&P Global) - Part 3 00:34:15
    49. Managing data science in the enterprise - Joshua Poduska (Domino Data Lab), Patrick Harrison (S&P Global) - Part 4 00:30:25
    50. Building a large-scale machine learning application using Amazon SageMaker and Spark - David Arpin (Amazon Web Services) - Part 1 00:42:13
    51. Building a large-scale machine learning application using Amazon SageMaker and Spark - David Arpin (Amazon Web Services) - Part 2 00:46:40
    52. Natural language understanding at scale with Spark NLP - David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed) - Part 1 00:43:04
    53. Natural language understanding at scale with Spark NLP - David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed) - Part 2 00:44:39
    54. Natural language understanding at scale with Spark NLP - David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed) - Part 3 00:41:30
    55. Natural language understanding at scale with Spark NLP - David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed) - Part 4 00:33:33
    56. Leveraging Spark and deep learning frameworks to understand data at scale - Vartika Singh (Cloudera), Alan Silva (Cloudera), Alex Bleakley (Cloudera), Steven Totman (Cloudera), Mirko Kämpf (Cloudera), Syed Nasar (Cloudera) - Part 1 00:46:46
    57. Leveraging Spark and deep learning frameworks to understand data at scale - Vartika Singh (Cloudera), Alan Silva (Cloudera), Alex Bleakley (Cloudera), Steven Totman (Cloudera), Mirko Kämpf (Cloudera), Syed Nasar (Cloudera) - Part 2 00:43:21
    58. Leveraging Spark and deep learning frameworks to understand data at scale - Vartika Singh (Cloudera), Alan Silva (Cloudera), Alex Bleakley (Cloudera), Steven Totman (Cloudera), Mirko Kämpf (Cloudera), Syed Nasar (Cloudera) - Part 3 00:32:14
    59. Leveraging Spark and deep learning frameworks to understand data at scale - Vartika Singh (Cloudera), Alan Silva (Cloudera), Alex Bleakley (Cloudera), Steven Totman (Cloudera), Mirko Kämpf (Cloudera), Syed Nasar (Cloudera) - Part 4 00:51:07
    60. Deep learning methods for natural language processing - Garrett Hoffman (StockTwits) - Part 1 00:42:26
    61. Deep learning methods for natural language processing - Garrett Hoffman (StockTwits) - Part 2 00:50:09
    62. Deep learning methods for natural language processing - Garrett Hoffman (StockTwits) - Part 3 00:44:15
    63. Deep learning methods for natural language processing - Garrett Hoffman (StockTwits) - Part 4 00:47:10