Infrastructure & Ops Superstream: Observability

Beginner

Gain a deeper understanding of system performance

Observability—a measure of how well internal states of a system can be inferred from its outputs—is crucial for engineering, managing, and improving complex business-critical systems and data streams. Teams that adopt observability are much better equipped to ship code quickly and correctly, identify outliers, and understand user experience.

Join us to learn what a modern observability pipeline looks like and how observability can help any software engineering team gain a deeper understanding of system performance, so you can carry out ongoing maintenance and ship the features your customers need.

About the Infrastructure & Ops Superstream Series: This three-part Superstream series guides you through what you need to know about modernizing your organization’s infrastructure and operations, with each event day covering different topics and lasting no more than four hours. They’re packed with the expert insights, skills, and tools that will help you effectively manage existing legacy systems while migrating to modern, scalable, cost-effective infrastructures—with no interruption to your business.

What you’ll learn and how you can apply it

Learn how to build a solid observability platform to empower your development teams
Understand how to be proactive at work in ways that will instantly prepare you to manage incidents better

This live event is for you because...

You’re a developer or work in operations, and you want to learn how to create an effective observability platform across microservice architectures.
You want to learn how to improve incident management.

Prerequisites

Come with your questions
Have a pen and paper handy to capture notes, insights, and inspiration

Recommended follow-up:

Read Observability Engineering (book)
Read Fundamentals of Data Observability (book)
Read The Future of Observability with OpenTelemetry (report)
Watch How Lightstep Implemented Observability (video)
Attend Data Observability Fundamentals in 2 Weeks (live course with Andy Petrella and Sammy El Khammal)

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Sam Newman: Introduction (5 minutes) - 8:00am PT | 11:00am ET | 3:00pm UTC/GMT

Sam Newman welcomes you to the Infrastructure & Ops Superstream.

Jessica Kerr: Teach Your Software to Talk to You—Observability with OpenTelemetry (45 minutes) - 8:05am PT | 11:05am ET | 3:05pm UTC/GMT

As your software grows, each team needs to care for more and more code, and that code is used in more and more ways. The complexity is more than any one person can track. You need help figuring out what's going on. But how do you scale your understanding as your software scales? You recruit the software to help! Incorporating observability into your growing software teaches it to talk to you—to tell you what’s happening as it happens. This helps dramatically with incident response and debugging, and it helps in changing software too. Here’s the bad news: great observability isn't one-size-fits-all. But community libraries, standards, and tools, in the form of OpenTelemetry, can help. Jessica Kerr illustrates how to make your systems talk to you, with practical details about open source helpers you can use in your software.
Jessica Kerr is a “symmathecist,” in the medium of code, who believes in learning systems made of learning parts: enthusiastic people and evolving software. She’s a principal developer advocate at Honeycomb, where she teaches developers to make their software teach them what’s going on inside. In 20 years of professional software development, she’s programmed in and spoken at conferences about Java, Scala, Clojure, TypeScript, Ruby, and Elm. She lives in St. Louis with two children who invent worlds and draw characters with superpowers, and two cats who meow and knock over water glasses.
Break (5 minutes)

Lesley Cordero: Effective Observability in a Microservices Architecture (45 minutes) - 8:55am PT | 11:55am ET | 3:55pm UTC/GMT

Lesley Cordero discusses a standardized platform-focused approach to building effective observable architectures, including how the approach addresses the new organizational challenges specific to microservices. This approach encompasses three parts: the patterns, the organizational support, and the stack of tools. You'll explore these concepts at a high level, with practical examples, and learn how to address these organizational challenges.
Lesley Cordero is a staff engineer focused on operations engineering at the New York Times. She has spent most of her career as an engineer on edtech teams for Google for Education and edtech startups. She is primarily focused on delivery engineering and reliability management, specifically in the observability space. Lesley also has experience building and leading teams and is an experienced conference speaker.
Break (5 minutes)

Hila Fish: Incident Management, Talk the Talk, Walk the Walk (45 minutes) - 9:45am PT | 12:45pm ET | 4:45pm UTC/GMT

Incident management can be challenging and throw you curveballs, with unexpected issues resulting in data loss, downtime, and wasted resources, BUT! There are practical things you can do to make it a smoother process. Remember when your teachers said, "Actively listening in class guarantees 50% prep for the upcoming test"? The same goes for being proactive at work in ways that will prepare you to manage incidents better (at night or in general). Hila Fish shares some incident management basics and best practices that should give you a clearer vision of how to manage production incidents in the most efficient way possible, and she’ll impart the necessary traits of a successful incident manager and how to perfect them.
Hila Fish is a senior DevOps engineer at Wix with 15 years of experience in the tech industry. She’s also an AWS Community Builder and an international public speaker who believes the DevOps culture is what drives a company to perform at its best. She also believes in giving back to the community by helping to organize DevOps-related conferences, mentoring, and managing programs for the largest technical women’s community in Israel. She enjoys sharing her knowledge wherever she can, including across diverse technology communities, initiatives, and social media. In her spare time, Hila is lead singer in a cover band.
Break (5 minutes)

Fred Moyer: Techniques for SLOs and Error Budgets at Scale (30 minutes) - 10:35am PT | 1:35pm ET | 5:35pm UTC/GMT

What tactics can you use to implement service level objectives and error budgets to operate enterprise-level services with a thousand (or more) engineers? Fred Moyer takes you through approaches he’s developed working with teams in a large production ecosystem where service reliability was a nonnegotiable business requirement. Engineers and operations folks who are putting SLIs, SLOs, and error budgets into practice in high-scale production environments with diverse service architectures should come away with an understanding of how to democratize SLOs, what objectives they should target for enterprise-level reliability, and how to communicate and implement error budgets across multiple teams.
Fred Moyer is an observability-focused SRE whose career has spanned startups to public companies. His focus over the past decade has been on web service monitoring at scale in the areas of service level objectives, architecting and operating large scale systems, and application of data science to operational telemetry. He holds a patent in using histograms to efficiently measure web service performance, received an award from Google's Istio dev team for an observability module, and won a White Camel award from the Perl Foundation. Fred lives in San Francisco with his wife and kids, and works in SRE observability for JPMorgan Chase.

Sam Newman: Closing Remarks (5 minutes) - 11:05am PT | 2:05pm ET | 6:05pm UTC/GMT

Sam Newman closes out today’s event.

Upcoming Infrastructure & Ops Superstream events:

Kubernetes - September 6, 2023

Your Host

Sam Newman
Sam Newman is a technologist focusing on the areas of cloud, microservices, and continuous delivery—three topics which seem to overlap frequently. He provides consulting, training, and advisory services to startups and large multinational enterprises alike, drawing on his more than 20 years in IT as a developer, sysadmin, and architect. Sam is the author of the best-selling Building Microservices and Monolith to Microservices, both from O’Reilly, and is also an experienced conference speaker.

search