Strata Data Superstream Series: Data Science Fundamentals
Published by O'Reilly Media, Inc.
Feb. 9, 2021
5 - 9:45 p.m. Coordinated Universal Time
This event has ended.
Big data has been with us now for over 10 years, and in that time the tools and techniques have evolved. If you’re new to working with data or looking to understand the latest and greatest, these sessions are the perfect way to become part of the conversation. You’ll learn how AI and the cloud have impacted how we grapple with ever-growing datasets and get better insight and products.
About the Strata Data Superstream Series: This four-part series of half-day online events gives attendees an overarching perspective of key topics that will help your organization maximize the business impact of your data.
What you’ll learn and how you can apply it
- Understand the problems the modern data stack helps solve today and get a glimpse of where we’re headed
- Get an overview of AI and machine learning and see how they can improve your data science work
- Discover how OmniSci and Intel AI are empowering data scientists with a fully integrated set of tools
- Learn how to present your data clearly and articulately
This live event is for you because…
- You want to learn how data science works and understand its impacts, whether you’re swimming in data or just dipping in a toe.
- You need to know the trends in data workflows, techniques, and tools.
- You’re interested in learning the role AI can play in your data analysis and want to find out where to start.
- You want to learn how to best present your data to outside stakeholders.
The timeframes are only estimates and may vary according to how the class is progressing.
EVENT 1: DATA SCIENCE FUNDAMENTALS - FEBRUARY 9, 9:00AM–1:30PM PT | 12:00PM–4:30PM ET | 5:00PM–9:30PM UTC/GMT
Alistair Croll: Introduction (5 minutes) - 9:00am PT | 12:00pm ET | 5:00pm UTC/GMT
- Alistair Croll welcomes you to the Strata Data Superstream.
David McRaney: Why Motivated Reasoning and Cognitive Bias Undermine Even the Best Data Projects—and What to Do About It (30 minutes) - 9:05am PT | 12:05pm ET | 5:05pm UTC/GMT
- David McRaney is a science journalist, author, podcaster, and lecturer. He created You Are Not So Smart, a blog about the psychology of reasoning, which became an internationally best-selling book, now available in 17 languages. His books You Are Now Less Dumb and How Minds Change—about how p_emphasized text_eople do and don’t update their beliefs and attitudes as indiviudals and cultures—will be released in 2021. David hosts a biweekly top 100 podcast about human judgment and decision making, and travels around the planet giving lectures on the topics he covers in his books, blog, and podcast. In 2015, David appeared as himself in a national ad campaign for Reebok, which he cowrote. His writing has also been featured in campaigns for Heineken, Duck Tape, and others. He’s currently working on a documentary about IQ and genius and a television show about how to better predict the impact of technological disruption.
Jason Dai: Simplifying End-to-End Big Data AI (Sponsored by Intel) (30 minutes) - 9:35am PT | 12:35pm ET | 5:35pm UTC/GMT
Applying machine learning to distributed big data analytics plays a central role in today’s intelligent applications and systems. These problem settings have pushed the field to address issues of data scale that were almost inconceivable to AI researchers even a decade ago. To address these challenges, Intel has open-sourced Analytics Zoo, which helps users to build and productionize end-to-end big data AI pipelines. Jason Dai demonstrates how data engineers and scientists can simply build conventional Python notebooks on their laptops (with possible AutoML support), which can then automatically scale out to large clusters and process distributed data using Analytics Zoo.
Jason Dai is an Intel Fellow and Chief Architect of Big Data AI at Intel, responsible for leading the global engineering teams on the development of big data analytics and machine learning. He’s a founding committer and PMC member of Apache Spark, a mentor of Apache MXNet, a member of the Apache Software Foundation, and the creator of the BigDL and Analytics Zoo projects.
Break (5 minutes)
Tristan Handy: The Modern Data Stack—Past, Present, and Future (50 minutes) - 10:10am PT | 1:10pm ET | 6:10pm UTC/GMT
Data products have drawn a fantastic amount of attention, capital, and traction over the past decade. Big trends have played out during that time, including the shift toward horizontal tooling, the rise of SQL, and the empowerment of the data analyst. The net result of these trends has been increasingly empowered organizations staffed by technical-business hybrids, working with state-of-the-art horizontal tooling that all speaks SQL. Compared with where we were in 2010, it’s a great world to live in, but there are still huge problems to solve. Tristan Handy digs into the data problems he’s most fascinated with today, what he’s seeing that gets him excited, and where he thinks things might go from here.
Tristan Handy is the founder and CEO of Fishtown Analytics, a Philadelphia startup pioneering the practice of modern analytics engineering. Over 3,000 companies—including JetBlue, HubSpot, GitLab, and the ACLU—use Fishtown’s product, dbt, to organize, catalog, and distill knowledge from the data in their data warehouses. Tristan has been working in data for two decades in both in-house and consulting roles with both large enterprises and small startups.
Break (5 minutes)
Ayodele Odubela: Demystifying Machine Learning (50 minutes) - 11:05am PT | 2:05pm ET | 7:05pm UTC/GMT
For most, machine learning remains an enigma. But really it’s just a tool to predict events and understand patterns that exist around us. Ayodele Odubela walks you through the basics of machine learning and shows you how to get started using Python. You’ll learn the difference between machine learning and AI, how to apply ML to your projects, some of the math behind ML, and how to evaluate ML models to determine whether or not to use them for decision making.
Ayodele Odubela is a data science advocate for Comet ML. She combines her background in marketing and passion for data and analytics to educate data scientists on model reproducibility and experiment tracking. She earned her master's degree in data science from Regis University after working in various digital marketing roles. She's passionate about data justice, kayaking, and hockey.
Break (5 minutes)
Venkat Krishnamurthy and Alex Baden: Productivity at Scale—Data Science at the Speed of Curiosity (Sponsored by Intel) (30 minutes) - 12:00pm PT | 3:00pm ET | 8:00pm UTC/GMT
Data science is still fundamentally “curiosity driven.” In today's world, what sets apart a good data scientist is the number of insights they can generate in a given unit of time. The one inelastic quantity in all of this is the data scientist’s time. A curious data scientist ideally needs to be able to ask as many questions and identify as many useful insights within their day as possible. So what's preventing them? Unfortunately most of their time is spent preparing to ask questions (assembling tools, data, etc.). Data science needs to focus on the experimentation loop: going from question to an answer and the next question as quickly as possible. In order to do this, the entire workflow (not just the user-facing part) must be brought to the data scientist. This means empowering them with a fully integrated set of tools and the necessary supporting infrastructure at an individual level. The big need to support productivity at scale is to be able to raise the bar on performance at scale for analytic infrastructure. Join Venkat Krishnamurthy and Alex Baden to learn how Intel AI and OmniSci are working together to give data scientists access to incredible, open tools, a variety of techniques—and no shortages.
Venkat Krishnamurthy heads up product management at OmniSci. He joined OmniSci from the CTO office at supercomputing pioneer Cray, where he was responsible for leading the company’s push into analytics and AI. Before that, he was senior director at YarcData, where he bootstrapped product and data science/engineering teams; was a director of product management at Oracle, where he led the launch of the Oracle Financial Services Data Platform; and spent several years at Goldman Sachs, where he led one of the earliest successful projects utilizing machine learning in operational risk incident classification. Venkat is a graduate of Carnegie Mellon University and the Indian Institute of Technology Chennai; he’s also a certified Financial Risk Manager.
Alex Baden is the technical lead for the database team at OmniSci. Previously, Alex worked with researchers at Johns Hopkins University and the Allen Institute for Brain Science, building systems to support the processing, analysis, and visualization of petabyte-scale neuroscience datasets. He has a master’s in computer science from Johns Hopkins University and a BS in mathematics from the University of Maryland.
Break (5 minutes)
Kristi Pelzel: The Art of Data Storytelling (50 minutes) - 12:35pm PT | 3:35pm ET | 8:35pm UTC/GMT
Data visualization is about communicating the substance of your metrics in a visual way. Storytelling with data differs from data visualization because it requires communicators to offer a larger, holistic view of their message. Join Kristi Pelzel to learn an approach to visualizing data that goes beyond statistics, gathering, cleaning, and analyzing to factor in the fundamental laws of human thinking, artistic design, and storytelling. You’ll discover how to best use “human thinking” and storytelling, choose the best visuals, remove clutter, design for attention, and think like a designer. Plus, you’ll practice drawing a three-part story on paper using an example data visualization and leave with a design cheat sheet that will help you connect with resources after the course is through.
Kristi Pelzel is the senior director of global communications and international correspondent for Today News Africa, based in Washington, DC. Her expertise spans broadcast, digital, and social media communications, nested with policy, research, and analysis. A member of the National Press Club, she holds a BA from the Academy of Art University, San Francisco, and an MA from Georgetown University.
Alistair Croll: Closing Remarks (5 minutes) - 1:25pm PT | 4:25pm ET | 9:25pm UTC/GMT
- Alistair Croll closes out today’s event.
Upcoming Strata Data Superstream events:
- Creating Data-Intensive Applications - May 4, 2021
- Data Warehouses, Data Lakes, and Data Lakehouses - August 10, 2021
- Business Analysis - November 9, 2021
Alistair Croll is an entrepreneur, author, and conference organizer. He's written four books on technology and society, including the best-selling Lean Analytics, which has been translated into eight languages. He's the cofounder of web performance startup Coradiant (acquired by BMC), the Year One Labs startup accelerator, and a number of other early-stage companies.
A prolific speaker, Alistair was a visiting executive at Harvard Business School, where he helped create a course on data science and critical thinking. He's founded and chaired a number of the world's leading technology events, including Cloud Connect, Strata, Startupfest, Scaletech, and the FWD50 Digital Government conference. He's currently working on Just Evil Enough, the subversive marketing playbook. Alistair lives in Montreal, Canada, and writes at acroll.substack.com.