Chaos debugging: Finding and fixing microservice abnormalities

Video description

Building microservices applications introduces more complexity into your architecture. Highly distributed applications on elastic, ephemeral infrastructure that communicate heavily over the network make for an environment where an application is always in a fluid, partially failing state at all times. To help developers transition from the monolithic way of designing and building software to a more service-oriented approach, we need to bridge the gap in tooling to help diagnose and understand what a normal state looks like and how to recover from a non-normal state.

Mitchell Kelley and Scott Cranton (solo.io) discuss the types of failures that can occur, namely networking, application behavior/code, and storage, and present a systemic workflow for prodding and exploring a system to detect faults and abnormal behavior. This framework builds on the practices known as chaos engineering. Mitchell and Scott take a look at two open source projects that aim to complement this workflow: the Squash project, for step-by-step distributed microservices debugging, and Gloo Shot, a newly created chaos engineering framework.

Prerequisite knowledge

  • Familiarity with debugging applications and building distributed applications
  • A basic understanding of services-oriented applications

What you'll learn

  • Understand chaos engineering
  • Learn how to debug distributed applications with appropriate tooling
  • Investigate the requisite workflow to apply toward system behavior exploration

This session was recorded at the 2019 O'Reilly Open Source Software conference in Portland.

Product information

  • Title: Chaos debugging: Finding and fixing microservice abnormalities
  • Author(s): Mitchell Kelley, Scott Cranton
  • Release date: December 2019
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 0636920335665