Strata + Hadoop World 2016 - San Jose, California: Video Compilation

Video Description

Make data work, a simple phrase a mile deep, was the theme of Strata+ Hadoop San Jose 2016. The conference delivered on the theme by offering inspiration, guidance, and practical know-how from 363 experts on virtually every aspect of the big data compendium, including: Long-form tutorials on Hadoop operations, machine learning, visualization, and data platform architecture taught by pros at Cloudera and Silicon Valley Data Science. Skill building sessions on Python, R, Apache Spark, Kafka, Kudu, and Cassandra delivered by Hadoop specialists at Dato, Google, Continuum Analytics, and Confluent. Explorations of data innovations like Druid, NoLamda, Ground, Apache Drill, and the Azure Data Factory conducted by visionaries from Yahoo, Tuplejump, UC Berkeley AMP Lab, MapR Technologies, and Microsoft. MBA intensives on the approaches data-oriented startups and venture capitalists use to boost innovation and disrupt incumbent business models delivered by thought-leaders at Zetta Venture Partners,, 3D Robotics, and Orbital Insight. Download the videos or view them through our HD player. It’s a big river.

  • All access pass to 16 keynotes, 18 tutorials, and 174 individual sessions
  • The future of Hadoop by Doug Cutting and Mike Cafarella, cofounders of Apache Hadoop
  • Specialized content tracks in security, finance, media, retail, transportation, and health care
  • 51 sessions covering real time analytics; 20 on data innovations; 38 on machine learning and AI
  • Case studies from Cigna, LinkedIn, Eventbrite, Quora, Twitter, Google, and more
  • Plus a round of great jokes about data science by comedian Paula Poundstone

Table of Contents

  1. Keynotes
    1. Apache Hadoop at 10 - Doug Cutting (Cloudera) 00:15:09
    2. Driving the on-demand economy with predictive analytics - Eric Frenkiel (MemSQL) 00:05:16
    3. Machine learning for human rights advocacy: Big benefits, serious consequences - Megan Price (Human Rights Data Analysis Group) 00:11:39
    4. Let's get real: Acting on data in real time - Jack Norris (MapR Technologies) 00:09:12
    5. Delivering information in context - Ian Andrews (Pivotal) 00:06:41
    6. Using commerce data to fuel innovation - Bruce Andrews (US Department of Commerce) 00:13:35
    7. Summoning the demon: My perspective from the belly of the beast of AI - Jana Eggers (Nara Logics) 00:13:25
    8. Using computer vision to understand big visual data - Alyosha Efros (UC Berkeley) 00:13:23
    9. Apache Hadoop meets cybersecurity - Tom Reilly (Cloudera) and Alan Ross (Intel Corporation) 00:09:36
    10. Thinking like a Bayesian - Julia Galef (Center for Applied Rationality) 00:14:01
    11. Connected brains - Joseph Sirosh (Microsoft) 00:11:19
    12. Building practical AI systems - Adam Cheyer (Viv) 00:14:31
    13. Advanced analytics and the mystery of the missing jeans - Bob Rogers (Intel) 00:07:03
    14. What's next for BDAS (the Berkeley Data Analytics Stack)? - Michael Franklin (AMPLab/UC Berkeley) 00:10:45
    15. Open by design, open for data - Adam Kocoloski (IBM) 00:06:21
    16. Nonsense science - Paula Poundstone (Star of NPR's #1 radio show, "Wait Wait...Don't Tell Me") 00:23:57
  2. Cultivate
    1. The 21st century leader: Shaping the future - Eric McNulty 00:54:07
    2. Build to lead: Solve leadership challenges using the Lego Serious Play methodology - Dieter Reuther, Donna Denio (Team Dynamics Boston) 00:41:45
    3. Culture is your company's operation system - Dave Gray (XPLANE) 00:57:40
    4. Cross-functional leadership for high-performance product teams - Dan Olsen (The Lean Product Playbook) 00:40:30
    5. The proven ROI of designing culture - Kristi Woolsey (MAYA) 00:42:46
    6. There's nothing basic about the basics of leadership - Michael Lopp (Pinterest) 00:50:46
    7. The importance of technical onboarding, training, and mentoring - Kate Heddleston (Kate Heddleston LLC) 00:32:42
    8. Radical candor: Be a better boss - Kim Scott (Radical Candor, Inc.) 00:50:26
    9. Accomplish big goals with objectives and key results - Christina Wodtke (Wodtke Consulting) 00:32:35
    10. Hiring engineers shouldn't hurt - Erin Ptacek ( 00:41:14
    11. Scaling Teams - David Loftesness 00:35:23
    12. Ask the CTO: Hard questions, honest answers - Camille Fournier (Formerly Rent the Runway), Michael Lopp (Pinterest) 00:48:31
    13. How to eat change for breakfast: Building an experimental enterprise - Sanjay Mathur (Silicon Valley Data Science) 00:35:51
  3. Data Innovations
    1. Analyzing billions of users with Druid and Theta Sketches - Eric Tschetter (Yahoo) 00:37:22
    2. Grounding big data: A meta-imperative - Joe Hellerstein (UC Berkeley), Vikram Sreekanti (Berkeley AMP Lab) 00:41:04
    3. Unified namespace and tiered storage in Alluxio - Calvin Jia (Alluxio), Jiri Simsa (Alluxio) 00:40:13
    4. Building the data infrastructure of the future with persistent memory - Derrick Harris (Mesosphere), Rob Peglar (Micron Technology, Inc), Milind Bhandarkar (Ampool, Inc.), Anil Goel (SAP), Todd Lipcon (Cloudera, Inc.) 00:41:43
    5. Just-in-time optimizing a database - Ted Dunning (MapR Technologies) 00:37:15
    6. Putting Kafka into overdrive - Todd Palino (LinkedIn), Gwen Shapira (Confluent) 00:46:35
    7. Streaming architecture: Why flow instead of state? - Ted Dunning (MapR Technologies) 00:41:23
    8. Elasticsearch and Apache Lucene for Apache Spark and MLlib - Costin Leau (Elastic) 00:43:08
    9. Deploying Hadoop on user namespace containers - Abin Shahab (Altiscale) 00:41:10
    10. Netflix: Making big data small - Daniel Weeks (Netflix) 00:39:43
    11. Lessons learned building a scalable self-serve, real-time, multitenant monitoring service at Yahoo - Sumeet Singh (Yahoo), Mridul Jain (Yahoo) 00:41:36
    12. Data applications and infrastructure at Coursera - Roshan Sumbaly (Coursera Inc), Pierre Barthelemy (Coursera) 00:37:16
    13. When one data center is not enough: Building large-scale stream infrastructure across multiple data centers with Apache Kafka - Guozhang Wang (Confluent) 00:27:16
    14. Toppling the mainframe: Enterprise-grade streaming under 2 ms on Hadoop - Ilya Ganelin (Capital One Data Innovation Lab) 00:44:15
    15. Architecting immediacy: The design of a high-performance, portable wrangling engine - Joe Hellerstein (UC Berkeley), Seshadri Mahalingam (Trifacta) 00:43:47
    16. Building DistributedLog, a high-performance replicated log service - Sijie Guo (Twitter) 00:40:13
    17. Architecting distributed systems for failure: How Druid guarantees data availability - Fangjin Yang (Imply) 00:35:48
    18. Did you accidentally build a database? - Spencer Kimball (Cockroach Labs) 00:45:09
    19. Secrets of natural language UIs: Translating English into computer actions - Joseph Turian (Workday), Alex Nisnevich (Bayes Impact) 00:38:16
  4. Data Science & Advanced Analytics
    1. Data wrangling and intro to pandas - Part 1 - T.J. Alumbaugh (Continuum Analytics), James Powell (NumFOCUS) 00:57:37
    2. Data wrangling and intro to pandas - Part 2 - T.J. Alumbaugh (Continuum Analytics), James Powell (NumFOCUS) 00:54:47
    3. Intro to data visualization with Bokeh - Part 1 - Bryan Van de Ven (Continuum Analytics), Sarah Bird (Aptivate) 1:06:12
    4. Intro to data visualization with Bokeh - Part 2 - Bryan Van de Ven (Continuum Analytics), Sarah Bird (Aptivate) 00:46:22
    5. Intro to machine learning with scikit-learn - Part 1 - Jake Vanderplas (eScience Institute, University of Washington), Katrina Riehl (Continuum Analytics) 00:58:36
    6. Intro to machine learning with scikit-learn - Part 2 - Jake Vanderplas (eScience Institute, University of Washington), Katrina Riehl (Continuum Analytics) 00:54:49
    7. R quickstart: Transform and visualize data - Garrett Grolemund (RStudio, Inc.) 1:07:14
    8. Validating models in R - Part 1 - Nina Zumel (Win-Vector LLC), John Mount (Win Vector LLC) 00:49:13
    9. Validating models in R - Part 2 - Nina Zumel (Win-Vector LLC), John Mount (Win Vector LLC) 00:38:44
    10. Scaling R: Analytics for big data - Stephen Elston (Quantia Analytics, LLC) 1:04:29
    11. Reproducible reports with big data - Garrett Grolemund (RStudio, Inc.) 1:02:59
    12. A year of anomalies: Building shared infrastructure for anomaly detection - Chris Sanden (Netflix), Christopher Colburn (Netflix) 00:42:01
    13. Augmenting machine learning with human computation for better personalization - Eric Colson (Stitch Fix) 00:47:33
    14. Real-time fraud detection using process mining with Spark Streaming - Hylke Hendriksen (ING) 00:37:15
    15. Building a marketplace: Eventbrite's approach to search and recommendation - John Berryman (Eventbrite) 00:42:18
    16. Docker for data scientists - Michelangelo D'Agostino (Civis Analytics) 00:42:49
    17. How to make analytic operations look more like DevOps: Lessons learned moving machine-learning algorithms to production environments - Robert Grossman (University of Chicago) 00:41:29
    18. Analyzing time series data with Spark - Sandy Ryza (Cloudera) 00:31:38
    19. Faster conclusions using in-memory columnar SQL and machine learning - Wes McKinney (Cloudera), Jacques Nadeau (Dremio) 00:47:23
    20. Putting the “science” into data science: The importance of reproducibility and peer review for quantitative research - Erik Andrejko (The Climate Corporation) 00:38:27
    21. Can deep neural networks save your neural network? Artificial intelligence, sensors, and strokes - Brandon Ballinger (Cardiogram), Johnson Hsieh (Cardiogram) 00:44:30
    22. Deep learning and recurrent neural networks applied to electronic health records - Josh Patterson (Patterson Consulting), David Kale (University of Southern California), Zachary Lipton (University of California, San Diego) 00:45:34
    23. Data science teams: Hold out for the unicorn or build bands of steeds? - Michael Dauber (Amplify), Yael Garten (LinkedIn), Monica Rogati (Data Natives), Daniel Tunkelang (Various) 00:43:20
    24. How LinkedIn built a text analytics platform at scale - Chi-Yi Kuan (LinkedIn), Weidong Zhang (LinkedIn), Yongzheng Zhang (LinkedIn) 00:40:10
    25. Python scalability: A convenient truth - Travis Oliphant (Continuum Analytics) 00:41:28
    26. Data modeling for data science: Simplify your workload with complex types - Marcel Kornacker (Cloudera) 00:38:15
    27. Atom smashing using machine learning at CERN - Siddha Ganju (Carnegie Mellon University) 00:37:54
    28. Large-scale product classification via text and image-based signals using a fusion of discriminative and deep learning-based classifiers - Sreeni Iyer (quadanalytix), Anurag Bhardwaj (Quad Analytix) 00:49:22
    29. Vowpal Wabbit: The essence of speed in machine learning - Jeroen Janssens (Tilburg University) 00:36:00
    30. The polyglot Beaker notebook - Scott Draves (Two Sigma Open Source) 00:40:26
  5. Data-driven Business
    1. What's gone horribly wrong. . .and how you can protect yourself - Farrah Bostic (The Difference Engine), Paul Soldera (Equation Research) 00:43:35
    2. The rise of the data selfie - Trina Chiasson (Tableau Software) 00:15:38
    3. The future of data and culture - Leah Hunter (Tech Journalist), Amber Case (Esri), Todd Harple (Intel), Claire Michell (Temboo) 00:27:35
    4. Big data sustainability: An environmental management systems analogy - Jonathan King (Ericsson) 00:24:20
    5. Kosher collection: Best practices in data handling - Charles Givre (Booz | Allen | Hamilton) 00:18:23
    6. Three rules every mobile product needs to follow to be successful - Sophie-Charlotte Moatti (Products That Count) 00:23:16
    7. Mapping the matrix: Open cartography with scientific and spatial data - Aurelia Moser (Mozilla Science) 00:24:23
    8. US EPA: A data-driven decision-making agency - Robin Thottungal (US Environmental Protection Agency) 00:16:44
    9. My AlgorithmicMe: Our representation in data - Joerg Blumtritt (Datarella), Majken Sander ( 00:17:40
    10. Stream science: Measuring the new currency of the music industry - Jonathan Gosier (AuDigent) 00:15:56
    11. Making on-demand grocery delivery profitable with data science - Jeremy Stanley (Instacart) 00:21:38
    12. Virtual reality for immersive data visualization - Bob Levy (Virtual Cove) 00:18:32
    13. You have more data than you think. Time to put it to work - Jana Eggers (Nara Logics) 00:13:51
    14. The power of personalization in the travel industry using big data - Sara Ahmadian (Seamless Planet) 00:06:50
    15. How cognitive computing is changing data science for the better - Michael Ludden (IBM Watson) 00:23:29
    16. Afraid of the future? You should be. Deep learning is eating your lunch—and mine. - Arno Candel ( 00:25:33
    17. From drop to deluge: The upcoming wave of enterprise drone data - Keith Bigelow (3D Robotics) 00:25:50
    18. Machine vision is making sense of the explosion of data from space - James Crawford (Orbital Insight) 00:30:34
    19. Opportunities for hardware acceleration in data analytics - Kanu Gulati (Zetta Venture Partners) 00:25:48
    20. Deploying deep learning at scale - Naveen Rao (Nervana) 00:29:21
    21. Virtual reality in 2016 and in the future - Timoni West (Unity Labs) 00:25:38
    22. Network intelligence at LinkedIn - Michael Conover (LinkedIn) 00:27:51
    23. Data science 3.0: Empowering common end users with integrated solutions in a world of tools for engineers and scientists - Faisal Farooq (IBM Watson Health), Balaji Krishnapuram (IBM Watson Health) 00:27:28
    24. Big science problems, big data solutions - Mr Prabhat (Berkeley Lab) 00:32:21
    25. Of market makers and middlemen: How technology is transforming global trade - Renee DiResta (Haven) 00:31:28
    26. Enabling smart consumer health decisions using prediction and personalization - Matt Butner (Stride Health) 00:31:17
    27. Engineering industrial biology with data - Joshua Hoffman (Zymergen) 00:27:40
    28. The business case for Spark, Kafka, and friends - Edd Dumbill (Silicon Valley Data Science) 00:30:17
    29. Distributed systems in one lesson - Tim Berglund (DataStax) 00:25:55
    30. How to use your data science team: Becoming a data-driven organization - Yael Garten (LinkedIn) 00:29:23
    31. Cloud computing and big data - Ben Sharma (Zaloni) 00:26:44
    32. Data visualizations decoded - Julie Rodriguez (Sapient Global Markets) 00:21:16
    33. Developing a modern enterprise data strategy - Part 1 - Edd Dumbill (Silicon Valley Data Science), John Akred (Silicon Valley Data Science) 00:56:19
    34. Developing a modern enterprise data strategy - Part 2 - Edd Dumbill (Silicon Valley Data Science), John Akred (Silicon Valley Data Science) 00:29:47
    35. Developing a modern enterprise data strategy - Part 3 - Edd Dumbill (Silicon Valley Data Science), John Akred (Silicon Valley Data Science) 00:44:28
    36. Developing a modern enterprise data strategy - Part 4 - Edd Dumbill (Silicon Valley Data Science), John Akred (Silicon Valley Data Science) 00:40:25
    37. Empowering business users to lead with data - Denise McInerney (Intuit) 00:40:53
    38. Why a data career is a great choice, now more than ever - Jin Zhang (CA Technologies), Jerry Overton (CSC), Michele Chambers (Continuum Analytics) 00:39:44
    39. Automating decision making with big data: How to make it work - Andreas Schmidt (Blue Yonder) 00:36:36
    40. Best practices for achieving customer 360 - Steven Totman (Cloudera), Nick Curcuru (MasterCard Advisors), Robert Bagley (ClickFox), Lori Bieda (Bank of Montreal) 00:44:57
    41. Working on the blockchain gang: Crunching and visualizing bitcoin data - Benedikt Koehler (DataLion) 00:38:58
    42. Adopting analytics: The Autodesk journey - Adam Sugano (Autodesk) 00:39:11
    43. Inside Cigna's big data journey - Jeffrey Shmain (Cloudera), Mohammad Quraishi (Cigna) 00:41:11
    44. Data scientists, you can help save lives - Jeremy Howard (Enlitic) 00:42:18
    45. How big data is helping to save babies around the world - Linus Liang (Embrace), Brad Allen (Silicon Valley Data Science) 00:39:11
    46. Publicly broadcasting data exhaust at a public broadcaster - Christopher Berry (Canadian Broadcasting Corporation) 00:30:10
    47. Transforming Telefónica - John Belchamber (Telefónica), Arturo Canales (Telefónica) 00:39:50
  6. Enterprise Adoption
    1. Apache Hadoop operations for production systems - Part 1 - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.) 00:40:31
    2. Apache Hadoop operations for production systems - Part 2 - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.) 00:48:35
    3. Apache Hadoop operations for production systems - Part 3 - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.) 00:36:37
    4. Apache Hadoop operations for production systems - Part 4 - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.) 00:33:41
    5. Apache Hadoop operations for production systems: Troubleshooting - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.) 00:57:24
    6. Apache Hadoop operations for production systems: Enterprise Considerations Part 1 - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.) 00:33:15
    7. Apache Hadoop operations for production systems: Enterprise Considerations Part 2 - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Darren Lo (Cloudera), Jordan Hambleton (Cloudera, Inc.) 00:47:46
    8. Developing a big data business strategy - Bill Schmarzo (EMC) 00:39:05
    9. How to build a successful data lake - Alex Gorelik (Waterline Data) 00:30:18
    10. Bringing the Apache Hadoop ecosystem to the Google Cloud Platform - Jennifer Wu (Cloudera), James Malone (Google) 00:35:19
    11. eBay analysts and governed self-service analysis: Delivering “turn-by-turn” smart suggestions - Debora Seys (eBay) 00:37:22
    12. An introduction to Transamerica's product recommendation platform - Vishal Bamba (Transamerica), Nitin Prabhu (Transamerica) 00:34:12
    13. Not your father's database: How to use Apache Spark properly in your big data architecture - Vida Ha (Databricks) 00:38:20
    14. Amazon for information: Building a modern data catalog - Aaron Kalb (Alation) 00:35:07
    15. 10 concepts the enterprise decision maker needs to understand about Hadoop - Donald Miner (Miner & Kasch) 00:38:11
    16. Old industries, sexy data: How machine learning is reshaping the world's backbone industries - David Beyer (Amplify Partners) 00:36:19
    17. Best practices for enterprise adoption of big data in the cloud - Prat Moghe (Cazena) 00:47:09
    18. Self-service, interactive analytics at multipetabyte scale in capital markets regulation on the cloud - Scott Donaldson (FINRA), Matt Cardillo (FINRA) 00:44:03
    19. Netflix's big leap from Oracle to Cassandra - Roopa Tangirala (Netflix) 00:43:22
    20. Strategies for agile instrumentation, ingestion, and analytics across many platforms and products - Yann Landrin (Autodesk), Charlie Crocker (Autodesk) 00:38:10
    21. BI on Hadoop: What are your options? - Jacques Nadeau (Dremio) 00:40:50
    22. Analyzing drivers of Net Promoter Score and their impact on customer engagement in the OTA industry - Krishnan Venkata (LatentView Analytics), Jose Abelenda (Hotwire) 00:42:00
    23. Building a scalable, secure data platform: If I knew then what I know now - Bill Loconzolo (Intuit) 00:42:36
  7. Hadoop Internals & Development
    1. Hadoop application architectures: Fraud detection - Part 1 - Jonathan Seidman (Cloudera), Ted Malaska (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent) 00:42:52
    2. Hadoop application architectures: Fraud detection - Part 2 - Jonathan Seidman (Cloudera), Ted Malaska (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent) 00:43:53
    3. Hadoop application architectures: Fraud detection - Part 3 - Jonathan Seidman (Cloudera), Ted Malaska (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent) 00:53:19
    4. Hadoop application architectures: Fraud detection - Part 4 - Jonathan Seidman (Cloudera), Ted Malaska (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent) 00:37:21
    5. The next 10 years of Apache Hadoop - Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Mike Cafarella (University of Michigan) 00:39:20
    6. Hadoop's storage gap: Resolving transactional-access and analytic-performance tradeoffs with Apache Kudu (incubating) - Todd Lipon (Cloudera, Inc.) 00:42:25
    7. Format wars: From VHS and Beta to Avro and Parquet - Silvia Oliveros (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) 00:40:44
  8. Hadoop Use Cases
    1. Hadoop without borders: Building on-prem, cloud, and hybrid data flows - Hiren Shah (Microsoft), Anand Subbaraj (Microsoft) 00:44:22
    2. Uber, your Hadoop has arrived: Powering intelligence for Uber’s real-time marketplace - Vinoth Chandar (Uber) 00:50:15
    3. Hadoop in the cloud: Good fit or round peg in a square hole? 00:42:20
    4. Successful enterprise data hub design patterns at BT - Phillip Radley (BT) 00:39:48
    5. Subject-matter experts and access to rich data: A case study in protecting a network from the Brobot distributed denial of service attacks. - John Omernik (Secureworks) 00:39:58
    6. Architecting HBase in the field - Jean-Marc Spaggiari (Cloudera), Kevin O'Dell (Rocana) 00:56:01
    7. In search of database nirvana: The challenges of delivering HTAP - Rohit Jain (Esgyn) 00:39:31
    8. Big data for telcos: A trio of use cases - Amy O'Connor (Cloudera) 00:55:26
    9. Scalable schema management for Hadoop and Spark applications - Kelvin Chu (Uber), Evan Richards (Uber) 00:41:35
    10. How the oil and gas industry is igniting a spark with information fusion and metadata analytics - Brian Clark (Objectivity), Marco Ippolito (CGG GeoSoftware) 00:37:19
    11. High-performance clickstream analytics with Apache Phoenix and HBase - Arun Thangamani (CDK) 00:48:40
  9. Hardcore Data Science
    1. Lessons learned from building real-life machine-learning systems - Xavier Amatriain (Quora) 00:32:06
    2. The how and why of feature engineering - Alice Zheng (Dato) 00:26:22
    3. A scalable implementation of deep learning on Spark - Alexander Ulanov (Hewlett-Packard Labs) 00:26:39
    4. BIDMach on Spark: Machine learning at the outer limits - John Canny (UC Berkeley) 00:34:25
    5. Dynamic memory networks for visual and textual question answering - Stephen Merity (MetaMind) 00:25:59
    6. Scalable ensemble learning with H2O - Erin Ledell ( 00:27:14
    7. Detecting and scoring anomalies with calibrated probabilistic models - Alexander Gray (Skytree, Inc.) 00:27:41
    8. Scalable collective reasoning in graphs - Lise Getoor (University of California, Santa Cruz) 00:29:57
    9. Phase retrieval algorithms for gigapixel microscopy - Laura Waller (UC Berkeley) 00:29:52
    10. TensorFlow: Machine learning for everyone - Rajat Monga (Google) 00:24:54
    11. A deep dive into DeepDive - Mike Cafarella (University of Michigan) 00:27:55
  10. IoT & Real-time
    1. An introduction to time series with Team Apache - Part 1 - Patrick McFadin (DataStax) 00:31:12
    2. An introduction to time series with Team Apache - Part 2 - Patrick McFadin (DataStax) 00:30:03
    3. An introduction to time series with Team Apache - Part 3 - Patrick McFadin (DataStax) 00:32:18
    4. An introduction to time series with Team Apache - Part 4 - Patrick McFadin (DataStax) 00:44:39
    5. Distributed stream processing with Apache Kafka - Jay Kreps (Confluent) 00:39:54
    6. Real-time Hadoop: What an ideal messaging system should bring to Hadoop - Ted Dunning (MapR Technologies) 00:42:16
    7. How to turn your house into a robot: An adaptive-learning algorithm for the Internet of Things - Brandon Rohrer (Microsoft) 00:42:19
    8. IoT in the enterprise: A look at Intel (IoT) Inside - Moty Fania (Intel) 00:35:55
    9. Fast data made easy with Apache Kafka and Apache Kudu (incubating) - Ted Malaska (Cloudera), Jeff Holoman (Cloudera) 00:38:54
    10. Embeddable data transformation for real-time streams - Joey Echeverria (Rocana) 00:40:26
    11. Twitter Heron at scale - Karthik Ramasamy (Twitter) 00:43:47
    12. Transforming industrial enterprises with data science: From deterministic machines to probabilistic systems - Sean Murphy (PingThings) 00:39:19
    13. Apache Flink: Streaming done right - Kostas Tzoumas (data Artisans) 00:35:40
    14. Scaling your business with a messaging platform on the Zeta Architecture - Jim Scott (MapR Technologies, Inc.) 00:46:24
    15. Pulsar: Real-time analytics at scale leveraging Kafka, Kylin, and Druid - Tony Ng (eBay, Inc.) 00:36:38
    16. Overcoming the top 5 hurdles to real-time analytics - Pat McGarry (Ryft) 00:34:05
  11. Law, Ethics, Governance
    1. We enhance privilege with supervised machine learning - Michael Williams (Fast Forward Labs) 00:41:58
    2. Data ethics (not what you think) - Louis Suarez-Potts (Age of Peers, Inc.) 00:46:36
    3. Big data ethics and a future for privacy - Jonathan King (Ericsson) 00:42:33
    4. It’s a brave new world: Avoiding legal privacy and security snafus with big data and the IoT - Alysa Z. Hutnik (Kelley Drye & Warren LLP), Kristi Wolff (Kelley Drye) 00:33:30
  12. Security
    1. A practitioner’s guide to securing your Hadoop cluster - Mubashir Kazia (Cloudera), Ben Spivey (Cloudera), Sravya Tirukkovalur (Cloudera), Michael Yoder (Cloudera) 00:32:58
    2. A practitioner’s guide to securing your Hadoop cluster: Authorization - Sravya Tirukkovalur (Cloudera) 00:29:45
    3. A practitioner’s guide to securing your Hadoop cluster: Encryption of Data in Transit - Michael Yoder (Cloudera) 00:21:43
    4. A practitioner’s guide to securing your Hadoop cluster: Data Governance - Ben Spivey (Cloudera) 00:34:59
    5. A practitioner’s guide to securing your Hadoop cluster - HDFS Encryption at Rest - Mubashir Kazia (Cloudera) 00:50:43
    6. Attack graphs: Visually exploring 300M alerts per day - Leo Meyerovich (Graphistry), Joshua Patterson (Accenture Technology Labs), Michael Wendt (Accenture Technology Labs) 00:45:22
    7. Securing Apache Kafka - Jun Rao (Confluent) 00:44:55
    8. Simplifying Hadoop with RecordService, a secure and unified data access path for compute frameworks - Chao Sun (Cloudera), Alex Leblang (Cloudera) 00:39:35
    9. Leveraging Spark to analyze billions of user actions to reveal hidden fraudsters - Yinglian Xie (DataVisor, Inc.) 00:37:46
    10. Protecting enterprise data in Apache Hadoop - Don Bosco Durai (Hortonworks, Inc.) 00:38:17
    11. Governance for custom Hadoop applications via the enterprise (meta)data hub - Chang She (Cloudera) 00:32:00
  13. Spark & Beyond
    1. Guest talk: Choosing an optimal storage backend for your Spark use case - Sameer Farooqui and Vida Ha (Databricks) 00:14:01
    2. Architecting a data platform - Part 1 - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science) 00:48:02
    3. Architecting a data platform - Part 2 - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science) 00:46:32
    4. Architecting a data platform - Part 3 - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science) 00:51:38
    5. Architecting a data platform - Part 4 - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science) 00:21:54
    6. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Part 1 - Jayant Shekhar (Cloudera), Amandeep Khurana (Cloudera), Krishna Sankar (Volvo Cars), Vartika Singh (Cloudera) 00:53:03
    7. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Part 2 - Jayant Shekhar (Cloudera), Amandeep Khurana (Cloudera), Krishna Sankar (Volvo Cars), Vartika Singh (Cloudera) 00:49:18
    8. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Part 3 - Jayant Shekhar (Cloudera), Amandeep Khurana (Cloudera), Krishna Sankar (Volvo Cars), Vartika Singh (Cloudera) 00:48:52
    9. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Part 4 - Jayant Shekhar (Cloudera), Amandeep Khurana (Cloudera), Krishna Sankar (Volvo Cars), Vartika Singh (Cloudera) 00:29:02
    10. The state of Spark and where it is going in 2016 - Reynold Xin (Databricks) 00:39:02
    11. SparkNet: Training deep networks in Spark - Robert Nishihara (UC Berkeley) 00:44:51
    12. Fast big data analytics and machine learning using Alluxio and Spark in Baidu - Bin Fan (Alluxio), Haojun Wang (Baidu) 00:27:37
    13. Scala and the JVM as a big data platform: Lessons from Apache Spark - Dean Wampler (Lightbend) 00:39:44
    14. Designing a scalable real-time data platform using Akka, Spark Streaming, and Kafka - Alex Silva (Pluralsight) 00:39:32
    15. Testing and validating Spark programs - Holden Karau (IBM) 00:37:42
    16. Apache Spark and real-time analytics: From interactive queries to streaming - Michael Armbrust (Databricks) 00:39:24
    17. Taking Spark Streaming to the next level with DataFrames - Tathagata Das (Databricks) 00:33:33
    18. Breaking Spark: Top 5 mistakes to avoid when using Apache Spark in production - Neelesh Srinivas Salian (Cloudera) 00:34:03
    19. Cancer genomics analysis in the cloud with Spark and ADAM - Timothy Danford (Tamr, Inc.) 00:43:02
  14. Sponsored
    1. Dash forward: From descriptive to predictive analytics with Apache Spark + end-user feature with Kellogg's JR Cahill - Eric Frenkiel (MemSQL), JR Cahill (Kellogg) 00:35:19
    2. Globally distributed hybrid on-premises/cloud big data - Jagane Sundar (WANdisco) 00:33:02
    3. Building a scalable data science platform with R - Mario Inchiosa (Microsoft), Roni Burd (Microsoft) 00:39:44
    4. How Siemens handles complexity in streaming data from millions of sensors - Yvonne Quacken (Siemens), Allen Hoem (Teradata) 00:38:14
    5. The Internet of Things: How to do it. Seriously! - Chris Rawles (Pivotal) 00:33:40
    6. Tame that beast: How to bring operations, governance, and reliability to Hadoop - Keith Manthey (EMC) 00:32:08
    7. How GE created a pervasive culture of data-driven insights at scale - Don Perigo (GE Power) 00:34:02
    8. Transactional streaming: If you can compute it, you can probably stream it - John Hugg (VoltDB) 00:35:42
    9. Can you afford to drop ACID? Understanding real-world SQL requirements in the big data era - Emma McGrattan (Actian) 00:41:07
    10. From X-ray to MRI: New insights on data about data - Dave Wells (Paxata), Nenshad Bardoliwalla (Paxata), Travis Ringger (PwC), Conrad Mulcahy (K2 Intelligence) 00:40:25
    11. Creating intelligence: An applications-first approach to machine learning - Carlos Guestrin (Dato Inc.) 00:37:18
    12. The emerging data imperative - Wei Wang (Hortonworks), Scott Gnau (Hortonworks) 00:36:21
    13. Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks - Grega Kespret (Celtra Inc.) and Matthew J. Glickman (Snowflake) 00:39:40
    14. How TD Bank is using Hadoop to create IT 3.0 and launch the next-generation bank - Mok Choe (TD Bank Group ), Paul Barth (Podium Data) 00:40:50
    15. What it takes to develop enterprise-grade Hadoop SQL Analytics - Bob Hansen (HPE) 00:26:29
    16. A survival guide for machine learning: Top 10 tips from a battle-tested solution - Patrick Hall (SAS), Paul Kent (SAS) 00:37:48
    17. Moving beyond the enterprise: Data sharing as the next big idea - Sandy Steier (1010data), Dennis Gleeson (1010data) 00:30:28
    18. Containers: The natural platform for data applications - Partha Seetala (Robin Systems) 00:19:41
    19. Remedying the accounts receivable reporting gap for a large multinational imaging and electronics company using a Hadoop-based open source platform - Ganesan Pandurangan (Infosys Limited) 00:13:08
    20. How we Hadoop: Inmar’s transformation from a business-services outsourcing company to a data-driven enterprise - Kevin Goode (Inmar) 00:28:01
    21. High-frequency decisioning - Steve Wooledge (MapR Technologies) 00:39:27
    22. Master the Internet of Things with integrated analytics - Bob Rogers & Bridget Karlin (Intel) 00:39:34
    23. Automated model selection and tuning at scale with Spark - Peter Prettenhofer (DataRobot), Owen Zhang (DataRobot) 00:45:02
    24. Building a modern data architecture - Ben Sharma (Zaloni) 00:40:06
    25. Solr as a SparkSQL datasource - Timothy Potter (Lucidworks) 00:40:09
    26. Delivering "DARPA hard" - Matthew Van Adelsberg (CACI) 00:36:20
    27. Big data-fueled feedback loops leveraging streaming data in SDN/NFV - Matt Olson (CenturyLink) 00:43:01
    28. TensorFlow: Large-scale analytics and distributed machine learning with TensorFlow, BigQuery, and Dataflow (Apache Beam) - Kazunori Sato (Google), Amy Unruh (Google) 00:38:32
    29. Virtualizing big data: Effective approaches from real-world deployments - Martin Yip (VMware), Justin Murray (VMware) 00:46:36
    30. Turn big data into big results - Jeff Pohlmann (Oracle) 00:37:51
    31. Transforming core business operations with SAP HANA Vora on Hadoop and Apache Spark - Amit Satoor (SAP), Balalji Krishna (SAP) 00:44:34
    32. Batch is back: Critical for agile application adoption - Joe Goldberg (BMC Software Inc.) 00:38:19
  15. Visualization & User Experience
    1. Introduction to visualizations using D3 - Part 1 - Brian Suda ( 00:54:17
    2. Introduction to visualizations using D3 - Part 2 - Brian Suda ( 00:31:58
    3. Introduction to visualizations using D3 - Part 3 - Brian Suda ( 00:36:16
    4. Introduction to visualizations using D3 - Part 4 - Brian Suda ( 00:51:09
    5. Panoramix: An open source data visualization platform - Maxime Beauchemin (Airbnb) 00:43:01
    6. Delivering big data insight at Markerstudy - Nick Turner (Markerstudy) 00:42:54
    7. Visualization as data and data as visualization: Building insights in a data-flow world - Christopher Nguyen (Arimo, Inc.), Anh Trinh (Arimo, Inc.) 00:30:13
    8. What can user-centered design do for visualizing your data? - Irene Ros (Bocoup) 00:40:40
    9. A few things engineers can learn from designers - Sébastien Pierre (FFunction) 00:38:59
    10. The state of visualization: Application and practice - Noah Iliinsky (Amazon Web Services) 00:39:47
    11. Visualization is distortion: How to lie less - Aneesh Karve (Quilt Data, Inc) 00:33:37
    12. Building responsive data visualization for the Web - Bill Hinderman (Expedia, Inc.) 00:37:21
  16. Ask Me Anything
    1. Ask me anything: Apache Hadoop operations for production systems - Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera, Inc.), Jordan Hambleton (Cloudera, Inc.) 00:41:28
    2. Ask me anything: Hadoop application architectures - Mark Grover (Cloudera), Jonathan Seidman (Cloudera), Ted Malaska (Cloudera), Gwen Shapira 00:37:12
    3. Ask me anything: Apache Spark - Reynold Xin (Databricks), Tathagata Das (Databricks), Michael Armbrust (Databricks) 00:38:13
    4. Ask me anything: Developing a modern enterprise data strategy - John Akred (Silicon Valley Data Science), Colette Glaeser (Silicon Valley Data Science) 00:40:59
  17. Solutions Showcase Theater
    1. Empowering Self-Service Data Science at Autodesk - Daniel Rose (Qubole) 00:07:55
    2. Taking the Complexity Out of Big Data Visualization - Priyank Patel (Arcadia Data) 00:11:19
    3. When Results Matter - Darin Jones (CACI) 00:08:12
    4. R.E.A.L. Big Data -- Now with 1TB for Free in Every Box - Joey Echeverria (Rocana) 00:10:09
    5. Is Your Data Ready for In-Memory Analytics? Think Again. - Juthika Khargharia, Ph.D. (SAS) 00:11:41
    6. Managing the Data Lake: Creating Actionable Insights and Value - Suntosh Murthy (Zaloni) 00:09:09
    7. Data Marshalling: An approach to Optimizing Your Data Lake - Jennifer Reed (Novetta) 00:10:12
    8. Enterprise Transformation with Solix big data suite: data lake, analytics and archiving use cases - Vikram Gaitonde (Solix) 00:11:06
    9. How Big Data and IoT Are Helping to Feed the World - Ashley Stirrup (Talend) 00:10:05
    10. What you need to know about addressing Data Quality within Hadoop - Scott Arnett (Pitney Bowes) 00:10:31
    11. Streaming Analytics - Sean Baseman (FICO) 00:10:10
    12. Growing the sharing economy, by sharing data - Jeremy Sokolic (SiSense) 00:08:00
    13. Advanced Cyber Threat Detection with Securonix Snyper - Tanuj Gulati (Securonix) 00:11:01
    14. AI meets Cyber Security... - Greg Martin (Jask) 00:10:11
    15. Leveraging the power of automation to enable the creation of more accurate, easy to use machine-learning models in less time - Alexander Gray (Skytree) 00:12:18
    16. Data Deluge in Digital Advertising - Teddy Rusli (DataTorrent) 00:08:11
    17. Make Big Data and Enterprise Data Work Together in Retail, Healthcare, and Agriculture - Karen Sun (SAP) 00:08:18
    18. Network Reconnaissance Solution - Paul Hahn (Cray) 00:07:42
    19. Real Time Fraud Detection using IBM z/OS Platform for Apache Spark - Mythili Venkatakrishnan & Sreeram Nudurupati (DataFactz) 00:09:14
    20. Why BI tools fail the Hadoop test, and how to become the BI Hadoop hero - Eric Sit (Quotient) 00:05:58
    21. Unravel the mystery of why your big data applications are slow to deliver business value - Kunal Agarwal, (Unravel) and Jeff Magnusson (Stitchfix) 00:10:28
    22. Big Data, Behavioral Analysis, and Sears - Denise Hemke (Platfora) 00:11:12
    23. Delivering Advanced Analytics Capabilities in Banking - Rohit Balasubramanian (Deloitte) 00:11:02
    24. Using Behavior Analytics on Big Data to Drive New Revenue - John Morrell (Datameer) 00:11:46
    25. In-Memory Data Fabrics for Screaming Fast Big Data - Nikita Ivanov (GridGain) 00:08:05
    26. Cloud driving innovation - Goutham Belliappa and Keith Reid (Capgemini) 00:09:50
    27. Data, Insights, to Action: When Transactions and Analytics Converge - Ali Hodroj (GigaSpaces) 00:10:10
    28. Learn how you can solve big data operations problems through intelligent management - Shivnath Babu (Unravel), Charlie Cocker (Autodesk) 00:07:32
    29. SQL based in-database analytics on Hadoop - Raman Rajasekhar (Fuzzy Logix) 00:10:48
    30. Manulife + Microsoft + KPMG solving big data problems for the insurance industry - Nate Shea-Han (Microsoft) 00:10:59
    31. Lookalike audience among billion devices - Xiatian Zhang (TalkingData) 00:10:12
    32. Connecting SAP HANA and Apache Impala (incubating) - Sunita Sharma and Sreedhar Bolneni (Cloudera) 00:08:08
    33. Gaining Customer Satisfaction Insights using IBM z/OS Platform for Apache Spark - Mythili Venkatakrishnan (IBM) and Sreeram Nudurupati (DataFactZ) 00:08:48
    34. Lambda-B-Gone: Better Answers for Less Money - John Hugg (VoltDB) 00:11:08
    35. Leveraging the power of Hadoop to parallelize Real-Time PCR computations and enable acceleration of genetic discovery - Salil Kumar (Hadoolytics Inc.) 00:10:13
    36. Growing a community around the Trusted Analytics Platform - Chuck Freedman (Intel) 00:10:31
    37. Data Prep and Quality Together – How to Uncover the Truth About Your Customers - Mark Pierce (Trillium) 00:08:17
    38. Make Your Data Strategy Work Through Streaming - Steve Wilkes (Striim) 00:10:16
    39. Enterprise Data Lake Power Tips - Paul Barth (Podium Data) 00:10:20
    40. Advanced Threat Detection on Streaming Data - Carol McDonald (MapR) 00:06:52

Product Information

  • Title: Strata + Hadoop World 2016 - San Jose, California: Video Compilation
  • Author(s): O'Reilly Media, Inc.
  • Release date: April 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491944608