Skip to content
  • Sign In
  • Try Now
View all events
Data Engineering

Data Superstream: Becoming a Data Engineer

Published by O'Reilly Media, Inc.

Beginner to advanced content levelBeginner to advanced

Leveraging Data

Dive into the rich career of data engineering! In just four hours, you’ll learn the real-world insights, critical skills, and best practices our lineup of seasoned experts have honed through years of industry experience. From understanding the nuances of data architecture to mastering the art of efficient data management, you'll gain invaluable knowledge to propel your career forward. Whether you're intrigued by the prospect of designing robust data pipelines or excited by the potential of leveraging data for impactful insights, this event offers a comprehensive roadmap to success.

We’re still working on finalizing the schedule for this event. Please check back closer to the event date for more information.

About the Data Superstream Series: This two-part Superstream series is designed to help your organization maximize the business impact of your data. Each day covers different topics, with unique sessions lasting no more than four hours. And they’re packed with insights from key innovators and the latest tools and technologies to help you stay ahead of it all.

What you’ll learn and how you can apply it

  • Discover the key skills data engineers use to design, build, and maintain the infrastructure necessary for data generation, storage, and processing
  • Learn why data quality and data governance are necessary to ensure that data is clean and compliant
  • Find out how data engineers effectively collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions that meet business needs

This live event is for you because...

  • You’re a data professional who wants to discover skills gaps and upskill accordingly to move to the senior or staff level.
  • You want to effectively approach the data lifecycle from ingestion to labeling to solving problems with machine learning.
  • You want to better understand what work matters the most at every stage of your career and learn how to build the skills you need to support your journey.

Prerequisites

  • Come with your questions
  • Have a pen and paper handy to capture notes, insights, and inspiration

Recommended follow-up:

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Matt Housley: Introduction (5 minutes) - 8:00am PT | 11:00am ET | 3:00pm UTC/GMT

  • Matt Housley welcomes you to the Data Superstream.

Andy Petrella–Keynote: The Vital Role of Data Engineering in the Age of Generative AI (15 minutes) - 8:05am PT | 11:05am ET | 3:05pm UTC/GMT

  • While data engineering is frequently viewed as a backend process focused on merely moving data, Andy Petrella argues that it’s the backbone of today’s AI revolution. As the technology becomes more integrated into everyday life, data engineers face increasing responsibility to uphold data quality and governance, feeding AI systems with high-quality, reliable, and timely data needed to avoid AI hallucinations and to ensure the technology delivers meaningful value. He also addresses the broader implications of generative AI, emphasizing the importance of data integrity in preventing misinformation and manipulation. Join him to explore how essential data engineers are in shaping the future of AI.
  • Andy Petrella is a distinguished expert, entrepreneur, and thought leader in the world of big data and analytics. As the founder and CEO of Kensu, he has pioneered innovative approaches to data observability, helping organizations ensure the reliability and trustworthiness of their data pipelines. With a strong background in mathematics and software engineering, Andy has been instrumental in advancing the field of data management, particularly in addressing the challenges of data quality and governance in complex environments. A passionate advocate for data literacy, he frequently shares his expertise through speaking engagements, webinars, and publications, empowering professionals across industries to harness the power of data responsibly and effectively.

Eevamaija Virtanen: Becoming a Data Engineer (30 minutes) - 8:20am PT | 11:20am ET | 3:20pm UTC/GMT

  • Eevamaija Virtanen shares her journey into data engineering and provides authentic, real-world insights into creating your own path in the data field, from getting your first job to mastering essential technical skills. Whether you’re just starting out or refining your expertise, this down-to-earth talk reveals what it really takes to succeed and offers actionable steps and relatable advice to help you on your data engineering path.
  • Eevamaija Virtanen is a senior data engineer and partner at Invinite, founder of the DataTribe community, and cofounder of Helsinki Data Week. With a diverse background spanning data engineering, project management, business development, and photography, she brings an innovative approach to solving complex data and business challenges. Eevamaija is a passionate advocate for continuous learning, cross-pollination, and collaboration, making her a key voice in the Nordic data community.
  • Break (5 minutes)

Adi Polak: Stream All Things—Patterns of Data Stream Processing (30 minutes) - 8:55am PT | 11:55am ET | 3:55pm UTC/GMT

  • The industry has had more than 10 years of attempts to solve the data streaming problem. Nevertheless, 80% of time spent in every project is devoted to optimizing the streaming data and analyzing windows. You want a service that is reliable, can handle all kinds of data and connect with all kinds of systems, and is easy to manage and scale as systems grow. And it should be super low latency too. Is that too much to ask? Adi Polak discusses the basic challenges of data streaming, introduces a few design and architecture patterns that can help, and explores how to implement them using Apache Flink. While there is no silver bullet for the data streaming problem, Adi shares some pragmatic solutions that have helped many organizations build fast, scalable, and manageable data streaming pipelines.
  • Adi Polak is an experienced software engineer, people manager, and author of Scaling Machine Learning with Spark. For most of her professional life, she has dealt with data and machine learning for operations and analytics, developing algorithms to solve real-world problems using ML techniques and leveraging expertise in Apache Spark, Kafka, HDFS, and distributed large-scale systems. Adi has taught Spark to thousands of students and has recently begun a new adventure with data streaming—specifically Flink and ML inference—and is hooked.

Dunith Danushka: Toward a Composable Data Platform (Sponsored by Redpanda) (30 minutes) - 9:25am PT | 12:25pm ET | 4:25pm UTC/GMT

  • Dunith Danushka explains what composable data platform architecture is, how this innovative approach can transform your data infrastructure, and why standardization is crucial for both technical efficiency and business agility. By adopting open standards, engineers can reduce vendor coupling and create more flexible, future-proof data platforms. You’ll explore standards like the Kafka protocol for streaming data, PostgreSQL wire protocol for OLTP and streaming databases, Streaming SQL for real-time ETL, dbt for batch ELT, Apache Iceberg as a table format, and Amazon S3 for static data storage. You’ll see real-world examples of these standards in action and learn how to transform a conventional data platform into a composable architecture.
  • Dunith Danushka is senior developer advocate at Redpanda, where he spends most of his time educating developers on how to build event-driven applications. He has a passion for designing, building, and operating large-scale, real-time event-driven architectures and enjoys sharing his knowledge through blogging, videos, and public speaking.
  • This session will be followed by a 30-minute Q&A in a breakout room. Stop by if you have more questions for Dunith.
  • Break (5 minutes)

Colleen Tartow: Transitioning to a Career in Data Engineering (30 minutes) - 10:00am PT | 1:00pm ET | 5:00pm UTC/GMT

  • There are many valid and nonstandard paths to becoming a data engineer, including academia, consulting, software engineering, and more. Colleen Tartow discusses the career journey as a compilation of emerging skills and explores a nontraditional path to data leadership.
  • Colleen Tartow is field CTO and head of strategy at VAST Data. She’s been obsessed with data her entire life and has over 20 years of experience in data, advanced analytics, engineering, and consulting. Her work on data, engineering, analytics, and diversity issues has led to her speaking and mentoring in a variety of venues. Colleen holds a PhD in astrophysics.

Xinran Waibel: Path to Senior Data Engineer (30 minutes) - 10:30am PT | 1:30pm ET | 5:30pm UTC/GMT

  • Are you a data engineer who wants to level up in your career? Drawing on lessons learned from her own career journey, Xinran Waibel describes the technical and soft skills you need to transition from junior or mid-level data engineer to a senior or staff role. Learn how to capture growth opportunities in your current role and explore strategies for growth beyond the workplace, including continuous learning and networking, to help you build a strong portfolio.
  • Xinran Waibel is the founder of Data Engineer Things, a global community dedicated to creating and sharing learning resources for data engineering. She builds data applications to power ML algorithms and product innovation on the personalization data engineering team at Netflix. Previously, she was a data engineer at Confluent and Target, where she leveraged big data technologies to enable data-driven decision-making in the marketing and membership space.
  • Break (5 minutes)

Holden Karau: Fighting Health Insurance with AI—E2E Model Training to Deployment (30 minutes) - 11:05am PT | 2:05m ET | 6:05pm UTC/GMT

  • If you’ve ever had a health insurance claim denied, Holden Karau knows how you feel and has done something about it. She and others fine-tuned a model to generate health insurance appeals. Learn about her adventures using various cloud resources for fine-tuning and, ultimately, deploying on-premises Kubernetes, including the unexpected challenge of fitting graphics cards into servers.

Session to Come (30 minutes) - 11:35am PT | 2:35pm ET | 6:35pm UTC/GMT

  • Please check back for more information.

Matt Housley: Closing Remarks (5 minutes) - 12:05pm PT | 3:05pm ET | 7:05pm UTC/GMT

  • Matt Housley closes out today’s event.

Your Host

  • Matt Housley

    Matt Housley, a data engineering consultant and cloud specialist, is cofounder of Ternary Data, where he leverages his teaching experience to train future data engineers and advise teams on robust data architecture. After some early programming experience with Logo, Basic, and 6502 assembly, he completed a PhD in mathematics at the University of Utah. Matt then began working in data science, eventually specializing in cloud-based data engineering. Matt and Joe pontificate on all things data on The Monday Morning Data Chat.

    linkedinXlinksearch

Sponsored by

  • Redpanda logo