Your service just went down. Here’s what you do next.

Incident management experts explain how to quickly restore service and prevent future outages.

November 30, 2016
Fallen trees. Fallen trees. (source: PublicDomainPictures via Pixabay.)

Watch the incident management experts in this recorded AMA discuss how to get your service back online and prevent future outages.


Jason Hand—Jason Hand is the DevOps Evangelist at VictorOps and author of the O’Reilly book ChatOps: Managing Operations from Group Chat as well as ChatOps For Dummies. A core organizer for DevOpsDays Rockies as well as host to many DevOps and IT Management Meetups and events around the country, Jason enjoys speaking on topics such as: Modern Incident Management, Learning From Failure, and of course, ChatOps.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Heather Mickman—Currently working on building the platforms used by software engineers at Target, Heather has 20+ years of IT experience. This includes consulting large Fortune 50 companies on Supply Chain approach, implementing warehouse automation technologies, running large ops organizations, and building enterprise API programs. She has a passion for technology, building high performing teams, driving a culture of innovation, and having fun along the way.

J. Paul Reed—J. Paul Reed is the founder of Release Engineering Approaches, a consultancy incorporating a host of tools and techniques to help organizations “Simply Ship. Every time.” Paul has worked across a number of industries, from financial services to cloud-based infrastructure, with teams from 2 to 2,000 on everything from tooling, operational analysis and improvement, and team culture transformation to business value optimization. He speaks internationally on release engineering, DevOps, operational complexity, and human factors and is currently a Masters of Science candidate in Human Factors & Systems Safety at Lund University.

Andrew Smirnov—Andrew is a performance engineer at Catchpoint and the host of AMAs on HTTP/2 and DevOps & SRE. Prior to Catchpoint, Andrew dealt with real-time media for Skype while working at Microsoft. In addition to being a frequent speaker on the Catchpoint speaking circuit, Andrew helps enterprise clients with their testing strategy and provides visibility into the performance and availability of their applications.

This post is part of a collaboration between O’Reilly and Catchpoint. See our statement of editorial independence.

Post topics: Operations