Strata Data Conference - San Jose 2018

Video description

Strata San Jose 2018 offered thousands of top data scientists, analysts, engineers, and executives from around North America and the world with an opportunity to examine and absorb the best technologies and practices related to data engineering, architecture, machine learning, and AI. This video compilation provides a complete recording of the conference's keynote speeches, tutorials, and sessions, including unfettered access to the exclusive Strata Business Summit ("the missing MBA for data-driven business") and its Executive Briefings on how to turn data and algorithms into business advantage.

You'll learn from featured speakers such as Google Brain Team leader Jeff Dean, Pinterest Senior VP for Engineering Li Fan, Cloudera CSO Mike Olson, Streamlio Cofounder Karthik Ramasamy, Atlassian Data Science Head Jennifer Prendki, and Jetlore Director of Algorithms Dorna Bandari. You'll hear keynotes from MapR Technologies' Anoop Dawar, Amazon's Alex Smola, O'Reilly Media's Ben Lorica, IBM's Dinesh Nirmal, and Cloudera's Amr Awadallah (Cloudera). And you'll pick up real world wisdom from in the trenches of big data engineering and analysis practitioners at General Mills, Kaiser Permanente, Ryanair, Procter & Gamble, BMW, Disney, ING, GE Digital, and more.

Need more reasons to buy this compilation? Take a look at the breadth of topics covered at Strata (listed below) and remember: you can view it all (hundreds of speakers, 100+ hours of material) on your own schedule and at your own pace.

  • Data Engineering and Architecture: 50+ sessions led by senior data engineers at Cloudera, AWS, Streamlio, and Confluent, and others that help you navigate the pitfalls of designing robust data pipelines, includes tutorials on building big data applications on AWS; time series data architecture; and using Impala to fix performance issues.
  • Data Science and Machine Learning: 60+ sessions delivered by data scientists from Teradata, UC Berkeley RISE Lab, Microsoft, and more on the technologies that discover the hidden insights in your data, includes tutorials on building PyTorch based recommender systems; using R and Python for scalable data science, machine learning, and AI; and how to get started with TensorFlow.
  • Big Data and Data Science in the Cloud (30+ sessions), includes a tutorial on running data analytic workloads in the cloud; Streaming Systems and Real-time Applications (20+ sessions), includes a tutorial on streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams; and Data-Driven Business Management (20+ sessions), includes a Booz Allen Hamilton talk on using machine intelligence to drive strategy.
  • Law, Ethics, and Governance (5+ sessions), includes a tutorial on how to prepare for the European Union's GDPR regulations; Visualization and User Experience, includes a tutorial on how to create interactive visualizations of billions of datapoints with just 30 lines of Python code; Media, Entertainment, and Advertising (5+ sessions) includes talks on ad tech, measurement, automation, and audience engagement; and Platform Security and Cybersecurity (multiple sessions) includes practical solutions for protecting big data in containerized environments and on how to best debug a security data science system.

Table of contents

  1. Keynotes
    1. Merging human and machine learning for everyday solutions - Li Fan (Pinterest)
    2. To a hammer, everything is a nail: Choosing the right tool for your business problems (sponsored by Microsoft) - Tobias Ternstrom (Microsoft)
    3. Privacy in the age of machine learning - Ben Lorica (O'Reilly Media)
    4. Crisis Text Line data usage and insights - Nancy Lublin and Bob Filbin (Crisis Text Line)
    5. Defining responsible data practices: A community-driven approach - Natalie Evans Harris (BrightHive)
    6. Operationalizing machine learning (sponsored by IBM) - Dinesh Nirmal (IBM)
    7. Data Science in the Cloud - Alex Smola (Amazon)
    8. Sprouted clams and stanky bean: When machine learning makes mistakes - Janelle Shane (aiweirdness.com)
    9. The case for a deliberate data strategy in today’s attention-deficit economy (sponsored by MapR) - Anoop Dawar (MapR Technologies)
    10. Differentiating via data science - Eric Colson (Stitch Fix)
    11. Automating decisions with data in the cloud - Amr Awadallah (Cloudera)
    12. What separates the clouds? (sponsored by Google Cloud) - William Vambenepe (Google)
    13. Inclusivity for the greater good - Ajey Gore (GO-JEK)
    14. Lessons in Google Search data - Seth Stephens-Davidowitz (Everybody Lies | NY Times)
  2. Data science and machine learning
    1. Approaching the pricing problem at Lyft - Ashivni Shekhawat (Lyft)
    2. Graph analysis of 200,000 tweets from Russian Twitter trolls - Ryan Boyd (Neo4j)
    3. Small pieces, loosely joined: A skater's code - Rodney Mullen (Almost Skateboards)
    4. Lessons learned deploying machine learning and deep learning models in production at major tech companies - Harish Doddi (Datatron Technologies), Jerry Xu (Datatron Technologies)
    5. Spark NLP in action: Improving patient flow forecasting at Kaiser Permanente - David Talby (Pacific AI), Santosh Kulkarni (Kaiser Permanente)
    6. Deep credit risk ranking with LSTM - Kyle Grove (Teradata)
    7. Who are we? The largest-scale study of professional data scientists - Miryung Kim (UCLA), Muhammad Gulzar (UCLA)
    8. The current state of TensorFlow and where it's headed in 2018 - Rajat Monga (Google)
    9. Using deep learning to solve challenging problems - Jeff Dean (Google)
    10. Detecting time series anomalies at Uber scale with recurrent neural networks - Andrea Pasqua (Uber), Anny Chen (Uber)
    11. Data science at Slack - Josh Wills (Slack)
    12. Breaking up the block: Using heterogenous population modeling to drive growth - Daniel Lurie (Pinterest)
    13. Machine learning applications for the industrial internet - Joseph Richards (GE Digital)
    14. Explaining machine learning models - Evan Kriminger (ZestFinance)
    15. sparklyr, implyr, and more: dplyr interfaces to large-scale data - Ian Cook (Cloudera)
    16. Word embeddings under the hood: How neural networks learn from language - Patrick Harrison (S Global)
    17. Cataloging the visible universe through Bayesian inference at petascale in Julia - Keno Fischer (Julia Computing)
    18. Building career advisory tools for the tech sector using machine learning - Simon Hughes (Dice.com), Yuri Bykov (Dice.com)
    19. Enough data engineering for a data scientist; or, How I learned to stop worrying and love the data scientists - Stephen O'Sullivan (Data Whisperers)
    20. Fast and effective natural language understanding - Mike Conover (SkipFlag)
    21. Transforming a machine learning prototype to a deployable solution leveraging Spark in healthcare - Rachita Chandra (IBM Watson Health)
    22. Interpretable machine learning products - Mike Lee Williams (Cloudera Fast Forward Labs)
    23. Being smarter than dinosaurs: How NASA uses deep learning for planetary defense - Siddha Ganju (Deep Vision)
  3. Data Case Studies
    1. Supply chain evolution from horseless buggies to driverless cars - Valentin Bercovici (Pencil Data Inc.)
    2. Automation and analytics enablement in life insurance - Divya Ramachandran (Captricity)
    3. Building a flu predictor model for improved patient care - Jennie Shin (Kaiser Permanente)
    4. From the presidential campaign trail to the enterprise: Building effective data-driven teams - Katie Malone (Civis Analytics)
    5. Your enterprise AI is only as good as your data. - Joe Dumoulin (Next IT)
    6. Smart agriculture: Blending IoT sensor data with visual analytics - Mike Prorock (mesur.io)
    7. Automating business insights through artificial intelligence - Wayde Fleener (General Mills)
    8. Using ML to improve UX and literacy for young poets - Ann Nguyen (Whole Whale)
    9. Working with the data of sports - Thomas Miller (Northwestern University)
  4. Sponsored
    1. The changing role of the CDO: Three keys for success (sponsored by MapR) - Jim Scott (MapR Technologies)
    2. The four elements of modern analytics (sponsored by MicroStrategy) - Vijay Kotu (Oath)
    3. Speed up mission-critical analytics in the cloud (sponsored by Kyligence) - Billy Liu (Kyligence)
    4. Analytics in real time, the (Grey's) anatomy of event streaming (sponsored by MemSQL) - Adam Ahringer (Disney-ABC TV Digital Media)
    5. The Snowflake data warehouse: How Sharethrough analyzes petabytes of event data in a SQL database (sponsored by Snowflake) - Dave Abercrombie (Sharethrough)
    6. Building the bridge from big data to machine learning and artificial intelligence (sponsored by Google Cloud) - Ryan Lippert (Google Cloud)
    7. Harnessing the cloud to enable connected systems and self-service and accelerate business growth (sponsored by Talend) - Jeff Smits (RingCentral)
    8. Building machine learning systems for scale: Amazon insights and best practices (sponsored by Amazon Web Services) - Guy Ernest (Amazon Web Services)
    9. Focus on your business: Case studies on building data solutions that meet your needs (sponsored by Microsoft) - Tobias Ternstrom (Microsoft)
    10. Data at scale and speed: Real-world use cases (sponsored by MapR) - Ted Dunning (MapR Technologies)
    11. When tests cry wolf (sponsored by Pure Storage) - Ivan Jibaja (Pure Storage)
    12. Managing the intelligent data pipeline and the connected enterprise (sponsored by Hitachi Vantara) - Chuck Yarbrough (Hitachi Vantara)
    13. Journey to digital (sponsored by IBM) - Seth Dobrin, PhD (IBM)
    14. Architecting an edge-to-cloud data pipeline to unify multiple data sources and processing engines (sponsored by NetApp) - Santosh Rao (NetApp)
    15. Digital transformation demands faster, more productive data science (sponsored by DataScience.com) - Ian Swanson (DataScience.com)
    16. Get a farm-to-table view of your data: Track data lineage from source to analytics (sponsored by Syncsort) - Tendu Yogurtcu (Syncsort)
    17. Accelerating analytics and AI from the edge to the cloud (sponsored by Intel) - Kevin Huiskes (Intel), Radhika Rangarajan (Intel)
    18. On-device deep learning: Trends, technologies, and challenges (sponsored by TalkingData) - Andreas Pfadler (TalkingData)
  5. Data engineering and architecture
    1. The state of Postgres - Umur Cubukcu (Citus Data)
    2. Building​ ​a​ ​flexible​ ​ML​ ​pipeline​ ​at​ ​a​ ​B2B​ ​AI​ ​start​up - Dorna Bandari (Jetlore)
    3. Accelerating development velocity of production ML systems with Docker - Kinnary Jangla (Pinterest)
    4. What's new in Hadoop 3.0 - Daniel Templeton (Cloudera), Andrew Wang (Cloudera)
    5. DataOps: An Agile methodology for data-driven organizations - Ellen Friedman (MapR Technologies)
    6. Better machine learning logistics with the rendezvous architecture - Ted Dunning (MapR Technologies)
    7. Taming deep learning - Evan Sparks (Determined AI)
    8. Metrics-driven tuning of Apache Spark at scale - Edwina Lu (LinkedIn), Ye Zhou (LinkedIn), Min Shen (LinkedIn)
    9. NoSQL no more: SQL on Druid with Apache Calcite - Gian Merlino (Imply)
    10. Operationalize deep learning: How to deploy and consume your LSTM networks for predictive maintenance scenarios - Francesca Lazzeri (Microsoft), Fidan Boylu Uz (Microsoft)
    11. TimescaleDB: Reengineering PostgreSQL as a time series database - Michael Freedman (TimescaleDB | Princeton)
    12. The secret sauce behind LinkedIn's self-managing Kafka clusters - Jiangjie Qin (LinkedIn)
    13. Building a contacts graph from activity data - Alexis Roos (Salesforce), Noah Burbank (Salesforce)
    14. Crafting data products for the augmented writing experience - Chris Harland (Textio)
    15. Classifying job execution using deep learning - Ash Munshi (Pepperdata)
    16. 20 Netflix-style principles and practices to get the most out of your data platform - Kurt Brown (Netflix)
  6. Strata Business Summit
    1. Workplace culture in the age of algorithmic management: The information networks Uber drivers built - Alex Rosenblat (Data Society Research Institute )
    2. How to avoid pitfalls when reasoning with data - Derek Ruths (CAI)
    3. Executive Briefing: Artificial intelligence—The next digital frontier? - Michael Chui (McKinsey Global Institute)
    4. Executive Briefing: Machine learning—Why you need it, why it's hard, and what to do about it - Mike Olson (Cloudera)
    5. Executive Briefing: Legal best practices for making data work - Alysa Z. Hutnik (Kelley Drye Warren LLP), Crystal Skelton (Kelley Drye Warren LLP)
    6. Reinventing healthcare: Early detection of Alzheimer’s disease with deep learning - Ayin Vala (Foundation for Precision Medicine)
    7. Executive Briefing: BI on big data - Mark Madsen (Third Nature), Shant Hovsepian (Arcadia Data)
    8. Executive Briefing: The conversational AI revolution - Yishay Carmiel (IntelligentWire | Spoken Labs)
    9. Bladder cancer diagnosis using deep learning - Mauro Damo (Dell EMC), Wei Lin (Dell EMC)
  7. Streaming systems and real-time applications
    1. Using machine learning to simplify Kafka operations - Shivnath Babu (Duke University | Unravel Data Systems), Dhruv Goel (Microsoft)
    2. Deploying and monitoring interactive machine learning applications with Clipper - Dan Crankshaw (UC Berkeley RISELab)
    3. Approximation data structures in streaming data processing - Debasish Ghosh (Lightbend )
    4. How to build leakproof stream processing pipelines with Apache Kafka and Apache Spark​ - Jordan Hambleton (Cloudera), Guru Medasani (Domino Data Lab)
    5. Stream storage with Apache BookKeeper - Sijie Guo (Streamlio)
    6. Streaming SQL to unify batch and stream processing: Theory and practice with Apache Flink at Uber - Fabian Hueske (data Artisans), Shuyi Chen (Uber)
    7. Foundations of streaming SQL; or, How I learned to love stream and table theory - Tyler Akidau (Google)
    8. Kafka streaming applications with Akka Streams and Kafka Streams - Dean Wampler (Lightbend)
    9. Effectively once, exactly once, and more in Heron - Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio)
    10. Machine-learned model quality monitoring in fast data and streaming applications - Emre Velipasaoglu (Lightbend)
    11. Playing well together: Big data beyond the JVM with Spark and friends - Holden Karau (Google), Rachel Warren (Salesforce Einstein)
    12. The real-time journey from raw streaming data to AI-based analytics - Roy Ben-Alta (Amazon Web Services), Ira Cohen (Anodot)
    13. Unified and elastic batch and stream processing with Pravega and Apache Flink - Fabian Hueske (data Artisans), Flavio Junqueira (Dell EMC)
    14. HDFS on Kubernetes: Tech deep dive on locality and security - Kimoon Kim (Pepperdata), Ilan Filonenko (Bloomberg LP)
    15. Effectively once in Apache Pulsar, the next-generation messaging system - Matteo Merli (Streamlio)
  8. Media and Ad Tech
    1. The data hero’s journey - Amanda Gerdes (Blizzard Entertainment)
    2. The golden age of data and analytics - David Boyle (MasterClass)
    3. Improving the customer experience via clickthru analytics - Sridhar Alla (Comcast)
    4. Big data applicability to the gaming industry - Rizwan Patel (Caesars Entertainment)
    5. Cohort analysis at scale - Blake Irvine (Netflix)
    6. Show me the money: Understanding causality for ad attribution - April Chen (Civis Analytics)
    7. Marketing at future speed - Kevin Lyons (Nielsen Marketing Cloud)
    8. Data science in practice: Examining events in social media - Jennifer Webb (SuprFanz)
    9. What is the relationship between social influence and the NBA? - Noah Gift (UC Davis)
  9. Big data and data science in the cloud
    1. Cloud, multicloud, and the data refinery - Tom Fisher (MapR Technologies)
    2. How does a big data professional get started with AI? - Wee Hyong Tok (Microsoft), Danielle Dean (Microsoft)
    3. Powering robotics clouds with Alluxio - Bin Fan (Alluxio), Shaoshan Liu (PerceptIn)
    4. Machine learning versus machine learning in production - Manu Mukerji (8x8)
    5. Streaming big data in the cloud: What to consider and why - William Chambers (Databricks), Michael Armbrust (Databricks)
    6. Accelerating deep learning on Apache Spark using BigDL with coarse-grained scheduling - Shivaram Venkataraman (Microsoft Research), Sergey Ermolin (Intel)
    7. Big data, big problems: Predicting climate change - Ari Gesher (Kairos Aerospace)
    8. Radically modular data ingestion APIs in Apache Beam - Eugene Kirpichov (Google)
    9. Machine learning to tackle industrial data fusion - Alexandra Gunderson (Arundo Analytics)
    10. Vectorized query processing using Apache Arrow - Siddharth Teotia (Dremio)
    11. Moving the needle of the pin: Streaming hundreds of terabytes of pins from MySQL to S3/Hadoop continuously - Henry Cai (Pinterest), Yi Yin (Pinterest)
    12. Best practices for productionizing Apache Spark MLlib models - Joseph Bradley (Databricks)
    13. Code Property Graph: A modern, queryable data storage for source code - Vlad A Ionescu (ShiftLeft), Fabian Yamaguchi (ShiftLeft)
    14. Improving user-merchant propensity modeling using neural collaborative filtering and wide and deep models on Spark BigDL at scale - Sergey Ermolin (Intel), Suqiang Song (Mastercard)
    15. Pipeline testing with Great Expectations - Abe Gong (Superconductive Health), James Campbell (USG)
    16. Analytics in the cloud: Building a modern cloud-based big data warehouse - Greg Rahn (Cloudera)
    17. Distributed deep learning with containers on heterogeneous GPU clusters - dong meng (MapR)
    18. Hive as a service - Szehon Ho (Criteo), Pawel Szostek (Criteo)
    19. Magellan: Scalable and fast geospatial analytics - Ram Sriharsha (Databricks)
    20. Continuous delivery for NLP on Kubernetes: Lessons learned - Michelle Casbon (Google Cloud Platform Developer Relations)
    21. Data reflections: Making data fast and easy to use without making copies - Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
    22. Building ML and AI pipelines with Spark and TensorFlow - Chris Fregly (PipelineAI)
    23. Lyft's analytics pipeline: From Redshift to Apache Hive and Presto - Shenghu Yang (Lyft)
    24. Not your parents' machine learning: How to ship an XGBoost churn prediction app in under four weeks - Goodman Gu (Atlassian)
    25. Cuttlefish: Lightweight primitives for online tuning - Tomer Kaftan (University of Washington)
  10. Data-driven business management
    1. Managing data science at scale - Matthew Granade (Domino Data Lab)
    2. Executive Briefing: Building effective heterogeneous data communities—Driving organizational outcomes with broad-based data science - Frances Haugen (Pinterest), Patrick Phelps (Pinterest)
    3. Executive Briefing: Managing successful data projects—Technology selection and team building - Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera)
    4. If you can’t measure it, you can’t improve it: How reporting and experimentation fuel product innovation at LinkedIn - Kapil Surlaker (LinkedIn), Ya Xu (LinkedIn)
    5. The limits of inference: What data scientists can learn from the reproducibility crisis in science - Clare Gollnick (Terbium Labs)
    6. Understanding metadata - Michael Schrenk (Self-Employed)
    7. Executive Briefing: Why machine-learned models crash and burn in production and what to do about it - David Talby (Pacific AI)
    8. Human in the loop: A design pattern for managing teams working with machine learning - Paco Nathan (O'Reilly Media)
    9. Architecting an open source enterprise data lake - Sagar Kewalramani (Meijer)
    10. Humans versus the machines: Using human-based computation to improve machine learning - Veronica Mapes (Pinterest), Garner Chung (Pinterest)
    11. Lessons on driving data science and analytics transformation - Chris Chapo (Gap Inc.)
    12. Trapped by the present: Estimating long-term impact from A/B experiments - Brian Karfunkel (Pinterest)
    13. Detecting retail fraud with data wrangling and machine learning - Matt Derda (Trifacta), Harrison Lynch (Consensus Corporation)
    14. Executive Briefing: What does an exec need to know about architecture and why - Jesse Anderson (Big Data Institute)
    15. Big data insights equal big money: Stories from the trenches at GoDaddy - Felix Gorodishter (GoDaddy)
    16. Data-driven fuel management at Ryanair - Marcin Pilarczyk (Ryanair)
  11. Law, ethics, and governance
    1. Progressive data governance for emerging technologies - Anne Buff (SAS Institute)
    2. The rise of big data governance: Insight on this emerging trend from active open source initiatives - John Mertic (The Linux Foundation), Maryna Strelchuk (ING)
    3. The science of patchy data - Jennifer Prendki (Atlassian)
    4. Achieving GDPR compliance and data privacy using blockchain technology - Ajay Mothukuri (Sapient), Dr. Vijay Srinivas Agneeswaran (SapientRazorfish)
    5. Human in the loop: Bayesian rules enabling explainable AI - Pramit Choudhary (DataScience.com)
    6. Data and ethics : Brainstorming Session - Natalie Evans Harris (BrightHive)
  12. Visualization and user experience
    1. Spark on Kubernetes: A case study from JD.com - Zhen Fan (JD.com), Wei Ting Chen (Intel)
    2. Why nobody cares about your anomaly detection - Baron Schwartz (VividCortex)
    3. Semi-automated analytic pipeline creation and validation using active learning - Sean Ma (Trifacta)
    4. Personalization at scale: Mastering the challenges of personalization to create compelling user experiences - Rahim Daya (Pinterest)
  13. Platform security cybersecurity
    1. Distributed clinical models: Inference without sharing patient data - Balasubramanian Narasimhan (Stanford University), John-Mark Agosta (Microsoft), Philip Lavori (Stanford University)
  14. Ask Me Anything
    1. Ask Me Anything: Streaming architectures and applications (Kafka, Spark, Akka, and microservices) - Dean Wampler (Lightbend), Boris Lublinsky (Lightbend)
    2. Ask Me Anything: Big data and machine learning techniques to drive and grow business - Burcu Baran (LinkedIn), Wei Di (LinkedIn)
    3. Ask Me Anything: Deep learning-based search and recommendation systems using TensorFlow - Dr. Vijay Srinivas Agneeswaran (SapientRazorfish), Abhishek Kumar (SapientRazorfish)
    4. Ask Me Anything: Managing data science in the enterprise - Nick Elprin (Domino Data Lab)
  15. Tutorials
    1. Big data analytics and machine learning techniques to drive and grow business - Burcu Baran (LinkedIn), Wei Di (LinkedIn), Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn) - Part 1
    2. Big data analytics and machine learning techniques to drive and grow business - Burcu Baran (LinkedIn), Wei Di (LinkedIn), Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn) - Part 2
    3. Big data analytics and machine learning techniques to drive and grow business - Burcu Baran (LinkedIn), Wei Di (LinkedIn), Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn) - Part 3
    4. Big data analytics and machine learning techniques to drive and grow business - Burcu Baran (LinkedIn), Wei Di (LinkedIn), Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn) - Part 4
    5. Getting ready for GDPR: Securing and governing hybrid, cloud, and on-premises big data deployments - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera) - Part 1
    6. Getting ready for GDPR: Securing and governing hybrid, cloud, and on-premises big data deployments - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera) - Part 2
    7. Getting ready for GDPR: Securing and governing hybrid, cloud, and on-premises big data deployments - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera) - Part 3
    8. Getting ready for GDPR: Securing and governing hybrid, cloud, and on-premises big data deployments - Mark Donsky (Cloudera), Andre Araujo (Cloudera), Syed Rafice (Cloudera), Mubashir Kazia (Cloudera) - Part 4
    9. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Joseph Kambourakis (databricks) - Part 1
    10. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Joseph Kambourakis (databricks) - Part 2
    11. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Joseph Kambourakis (databricks) - Part 3
    12. Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML - Joseph Kambourakis (databricks) - Part 4
    13. Learning PyTorch by building a recommender system - Mo Patel (Independent), Neejole Patel (Virginia Tech) - Part 1
    14. Learning PyTorch by building a recommender system - Mo Patel (Independent), Neejole Patel (Virginia Tech) - Part 2
    15. Learning PyTorch by building a recommender system - Mo Patel (Independent), Neejole Patel (Virginia Tech) - Part 3
    16. Learning PyTorch by building a recommender system - Mo Patel (Independent), Neejole Patel (Virginia Tech) - Part 4
    17. Managing data science in the enterprise - Nick Elprin (Domino Data Lab) - Part 1
    18. Managing data science in the enterprise - Nick Elprin (Domino Data Lab) - Part 2
    19. Managing data science in the enterprise - Nick Elprin (Domino Data Lab) - Part 3
    20. Managing data science in the enterprise - Nick Elprin (Domino Data Lab) - Part 4
    21. How to use Impala's query plan and profile to fix performance issues - Juan Yu (Cloudera) - Part 1
    22. How to use Impala's query plan and profile to fix performance issues - Juan Yu (Cloudera) - Part 2
    23. How to use Impala's query plan and profile to fix performance issues - Juan Yu (Cloudera) - Part 3
    24. How to use Impala's query plan and profile to fix performance issues - Juan Yu (Cloudera) - Part 4
    25. Modern real-time streaming architectures - Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (Streamlio), Arun Kejariwal (MZ) - Part 1
    26. Modern real-time streaming architectures - Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (Streamlio), Arun Kejariwal (MZ) - Part 2
    27. Modern real-time streaming architectures - Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (Streamlio), Arun Kejariwal (MZ) - Part 3
    28. Modern real-time streaming architectures - Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (Streamlio), Arun Kejariwal (MZ) - Part 4
    29. Using R and Python for scalable data science, machine learning, and AI - Mario Inchiosa, Vanja Paunic, Robert Horton, Debraj GuhaThakurta, Ali Zaidi, Tomas Singliar, and John-Mark Agosta (Microsoft) - Part 1
    30. Using R and Python for scalable data science, machine learning, and AI - Mario Inchiosa, Vanja Paunic, Robert Horton, Debraj GuhaThakurta, Ali Zaidi, Tomas Singliar, John-Mark Agosta (Microsoft) - Part 2
    31. Using R and Python for scalable data science, machine learning, and AI - Mario Inchiosa, Vanja Paunic, Robert Horton, Debraj GuhaThakurta, Ali Zaidi, Tomas Singliar, and John-Mark Agosta (Microsoft) - Part 3
    32. Using R and Python for scalable data science, machine learning, and AI - Mario Inchiosa, Vanja Paunic, Robert Horton, Debraj GuhaThakurta, Ali Zaidi, Tomas Singliar, and John-Mark Agosta (Microsoft) - Part 4
    33. Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams - Dean Wampler (Lightbend), Boris Lublinsky (Lightbend) - Part 1
    34. Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams - Dean Wampler (Lightbend), Boris Lublinsky (Lightbend) - Part 2
    35. Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams - Dean Wampler (Lightbend), Boris Lublinsky (Lightbend) - Part 3
    36. Streaming applications as microservices using Kafka, Akka Streams, and Kafka Streams - Dean Wampler (Lightbend), Boris Lublinsky (Lightbend) - Part 4
    37. Deep learning-based search and recommendation systems using TensorFlow - Abhishek Kumar (SapientRazorfish), Dr. Vijay Srinivas Agneeswaran (SapientRazorfish) - Part 1
    38. Deep learning-based search and recommendation systems using TensorFlow - Abhishek Kumar (SapientRazorfish), Dr. Vijay Srinivas Agneeswaran (SapientRazorfish) - Part 2
    39. Deep learning-based search and recommendation systems using TensorFlow - Abhishek Kumar (SapientRazorfish), Dr. Vijay Srinivas Agneeswaran (SapientRazorfish) - Part 3
    40. Deep learning-based search and recommendation systems using TensorFlow - Abhishek Kumar (SapientRazorfish), Dr. Vijay Srinivas Agneeswaran (SapientRazorfish) - Part 4
    41. Natural language understanding at scale with spaCy and Spark NLP - David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed) - Part 1
    42. Natural language understanding at scale with spaCy and Spark NLP - David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed) - Part 2
    43. Natural language understanding at scale with spaCy and Spark NLP - David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed) - Part 3
    44. Natural language understanding at scale with spaCy and Spark NLP - David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed) - Part 4
    45. Time series data: Architecture and use cases - Ted Malaska (Blizzard Entertainment) - Part 1
    46. Time series data: Architecture and use cases - Ted Malaska (Blizzard Entertainment) - Part 2
    47. Time series data: Architecture and use cases - Ted Malaska (Blizzard Entertainment) - Part 3
    48. Time series data: Architecture and use cases - Ted Malaska (Blizzard Entertainment) - Part 4

Product information

  • Title: Strata Data Conference - San Jose 2018
  • Author(s): O'Reilly Media, Inc.
  • Release date: March 2018
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492025948