Infrastructure & Ops Superstream: AI-Driven Operations and Observability
by Sam Newman, Niall Richard Murphy, Abi Aryan, Austin Parker, Aman Khan, Dylan Patel, Milly Leadley
Overview
AI isn't just playing around the edges of IT; it's fundamentally reshaping how tech is built, run, and managed.
Host Sam Newman and industry experts share practical insights and actionable strategies to deal with the dual transformations triggered by AI: how it optimizes operations, and how infrastructure must evolve to truly harness its power. Our panel addresses AI-driven operations and observability (AIOps) to show how machine learning enhances traditional IT functions, including automating crucial tasks such as incident management and system performance monitoring. You’ll learn how AIOps empowers system reliability, platform, and DevOps teams to find root causes faster, reduce alert fatigue, and implement strong predictive maintenance. You'll also learn the infrastructure essential for AI itself, including the specialized systems needed to reliably power demanding AI and ML workloads, and explore how core principles of operating technology systems are vital for building and maintaining next-generation AI environments. These sessions will leave you better equipped to both optimize your tech operations with AI and confidently deploy it at scale, ultimately driving improved system reliability and efficiency across all your technology.
What you’ll learn and how you can apply it
- Gain a comprehensive understanding of the evolving landscape of AIOps, including the challenges and opportunities for infrastructure and operations to effectively support and manage AI workloads
- Learn practical methods for evaluating and improving LLM agents in AIOps using measurable, testable techniques
- Master strategies for comprehensive observability across the entire LLM pipeline, from logging to error tracking, to build resilient AI systems
- Discover advanced evaluation and monitoring stacks used by top AI teams to identify and prevent AI system failures
Recommended follow-up:
- Read Observability Engineering, second edition (early release book)
- Read LLMOps (book)
Please note that slides or supplemental materials are not available for download from this recording. Resources are only provided at the time of the live event.