Modern Network Troubleshooting
Published by Pearson
Improve Network Resilience and Troubleshoot Problems Faster
- Apply formal troubleshooting methodology and more intuitive techniques to increase the speed of troubleshooting, root cause analysis, and get the network back into operation
- Use tools such as ChatGPT, advanced uses of ping and traceroute, MTR, and other tools to decrease the time required to troubleshoot a problem
- Get the theory and hands-on practice you need for solid troubleshooting skills
The first way to troubleshoot faster is to not troubleshoot at all, or to build resilient networks. The first section of this class considers the nature of resilience, and how design tradeoffs result in different levels of resilience. The class then moves into a theoretical understanding of failures, how network resilience is measured, and how the Mean Time to Repair (MTTR) relates to human and machine-driven factors. One of these factors is the unintended consequences arising from abstractions, covered in the next section of the class.
The class then moves into troubleshooting proper, examining the half-split formal troubleshooting method and how it can be combined with more intuitive methods. This section also examines how network models can be used to guide the troubleshooting process. The class then covers two examples of troubleshooting reachability problems in a small network, and considers using ChaptGPT and other LLMs in the troubleshooting process. A third, more complex example is then covered in a data center fabric.
A short section on proving causation is included, and then a final example of troubleshooting problems in Internet-level systems.
What you’ll learn and how you can apply it
By the end of the live online course, you’ll understand:
- How to blend formal and informal troubleshooting methods to build an effective skill set
- Common measures of resilience and kinds of fixes
- The relationship between design and resilience
And you’ll be able to:
- Use tools, including ChatGPT, in creative ways to increase your troubleshooting effectiveness
- Use standard troubleshooting tools in creative ways to find problems
- Approach network failures with more confidence
This live event is for you because...
- You are a network engineer who would like to improve your troubleshooting skills
- You are an experienced troubleshooter who would like to understand and incorporate formal troubleshooting methods
- You would like to be able to understand formal definitions of resilience, fixes, and troubleshooting
Prerequisites
- Basic knowledge of routing and forwarding
- Basic knowledge of computer network systems
Recommended Preparation
- Watch: Network Basics by Russ White
Recommended Follow-up
- Read: Computer Networking Problems and Solutions by Russ White and Ethan Banks
- Watch: “Understanding Network Transports” by Russ White and Ethan Banks
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Segment 1: Resilience (55 minutes)
- Defining resilience
- Elements of resilience
- MTBF
- MTTR
- Abstractions and troubleshooting
- Q&A (5 minutes)
- Break (5 minutes)
Segment 2: Troubleshooting Process (50 minutes)
- The half-split method
- Small network reachability example
- Small network routing example
- Q&A (5 minutes)
- Break (5 minutes)
Segment 3: Troubleshooting Example (50 minutes)
- Thoughts on using LLMs in troubleshooting
- Flapping BGP in a DC fabric example
- Thoughts on correlation and causation
- Internet slow performance example
- Summary
Q&A and course wrap-up (5 minutes)
Your Instructor
Russ White
Russ White has experience in designing, deploying, breaking, and troubleshooting large scale networks, and is a strong communicator from the white board to the board room. He has co-authored more than forty software patents, participated in the development of several Internet standards, helped develop the CCDE and the CCAr, and worked in Internet governance with the Internet Society. Russ has a background covering a broad spectrum of topics, including radio frequency engineering and graphic design, and is an active student of philosophy and culture. Russ is a co-host of the Hedge podcast, serves on the Routing Area Directorate and the Internet Architecture Board at the IETF, co-chairs the BABEL working group, and serves on the Technical Services Council/as a maintainer on the open source FR Routing project. His most recent works are Computer Networking Problems and Solutions and Unintended Dystopia. Russ regularly teaches live webinars on Internet technology through Safari Books Online, as well. MSIT Capella University MACM Shepherds Theological Seminary PhD, Southeastern Baptist Theological Seminary CCIE #2635, CCDE 2007::1, CCAr