The monitoring landscape of today is vastly different than it was only a few years ago, and even more so than it was 10 years ago. With the widespread popularity of cloud infrastructure came new problems for monitoring, as well as creating new ways to solve old problems.
The rise in popularity of microservices has especially stretched how we think about monitoring. Since there is no longer a monolithic app server, how do we monitor interactions between the dozens, or even hundreds, of small app servers that communicate constantly? A common pattern in microservice architecture is that a server may exist for only hours or even minutes, which has wreaked havoc on the age-old tactics and monitoring tools we once relied on.
Of course, while some things change, some things stay the same. We still worry about web server performance. We still have concerns over root volumes unexpectedly running out of space. Database server performance still keeps many of us awake at night. While some of the problems we have today are similar to (or the same as) the problems we had 10 years ago, the tools and methodologies available to us are much improved. It is my goal to teach you the advances we’ve made and how to leverage them for your purposes.
I believe it is helpful to bear in mind throughout this book what the purpose and goals of monitoring are. To that end, allow me to pass along a definition of monitoring. The best definition I’ve heard comes from Greg Poirier, proposed at the Monitorama 2016 conference:
Monitoring is the action of observing and checking the behavior and outputs of a system and its components over time.
This definition is broad, but rightly so: there’s a lot under the monitoring umbrella, and we’ll be covering it all: metrics, logging, alerting, on-call, incident management, postmortems, statistics, and much, much more.
If you deal with monitoring, this is the book for you. More specifically, this book is geared toward those seeking a foundational understanding of monitoring. It’s suitable for junior staff as well as nontechnical staff looking to beef up their knowledge on monitoring.
If you already have a great grasp on monitoring, this probably is not the book for you. There are no deep dives into specific tools or discussions about monitoring at Google-scale. Instead, you will find practical, real-world examples and immediately actionable advice geared to those new to the world of monitoring.
Those looking for the next hot monitoring tool to implement will be disappointed. As I will discuss later in this book, there’s no magic bullet for solving your monitoring challenges. As such, this book is tool-agnostic, though I certainly will mention specific tools from time to time as examples of what to do or not to do. Likewise, if you want to go deeper into a particular stack of tools, this book will not help.
A minimum level of technical knowledge is assumed for this book. I assume you know the basics of running servers and writing code. My examples all reference Linux, though the topics are still generically applicable for Windows administrators as well.
Throughout my career, I’ve found myself as the unwitting champion for better monitoring. As we all know, the one who points out the problem signs up to fix it, which resulted in my doing more monitoring implementations than I can recall. Over time, I’ve noticed many people have the same questions about monitoring, sometimes phrased in different ways.
My monitoring sucks. What should I do about it?
My monitoring is OK, but I know I can do better. What should I be thinking about?
My monitoring is noisy and no one trusts it. How can I permanently fix it?
What stuff is the most important to monitor? Where do I even start?
These are all complex questions with complex answers. There’s no single correct answer, but there are some great guiding principles that will get you where you want to go. This book will walk through these principles with plenty of examples.
This book is not the final word on monitoring, nor is it meant to be. This is the book I wish had existed when I first started getting serious about improving monitoring. There are plenty of great books that go deep on specific topics that I only touch on, so if you find yourself wanting to go there, I encourage it! I view this book as preparing for you a foundational skill level in the monitoring domain.
Monitoring is a quickly evolving topic. To make things more challenging, monitoring is a topic that will never reach a state of true maturity: every time we get close, our entire world changes. In the late ’90s and early ’00s, Nagios was king, and we were all pretty satisfied with it. Before long, we needed to start automating infrastructure due to the growing size of it all. People began doing interesting things with scaling it (e.g., Gearman, instant failover with custom heartbeats and DRBD) and managing the configuration (e.g., external datasources, custom UIs, and MySQL-backed configuration storage), completely stretching Nagios to its limits and revealing that our ways of thinking about monitoring were beginning to show their age. This has been repeated a few times since then, of course: cloud computing, containers, microservices.
While this constant change may frustrate some, it excites others. Take heart, though: the principles I will talk about are timeless.
I have a companion website for this book at https://www.practicalmonitoring.com, which will contain additional resources and errata.
The following typographical conventions are used in this book:
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values determined by context.
This icon signifies a tip, suggestion, or general note.
This element signifies a general note.
This icon indicates a warning or caution.
This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.
We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Practical Monitoring by Mike Julian (O’Reilly). Copyright 2018, Mike Julian, 978-1-491-95735-6.”
If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at firstname.lastname@example.org.
Safari (formerly Safari Books Online) is a membership-based training and reference platform for enterprise, government, educators, and individuals.
Members have access to thousands of books, training videos, Learning Paths, interactive tutorials, and curated playlists from over 250 publishers, including O’Reilly Media, Harvard Business Review, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Adobe, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, and Course Technology, among others.
For more information, please visit http://oreilly.com/safari.
Please address comments and questions concerning this book to the publisher:
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://shop.oreilly.com/product/0636920050773.do.
To comment or ask technical questions about this book, send email to email@example.com.
For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
This book wouldn’t be here without the help, advice, and support of many.
Many thanks to my technical reviewers, whose feedback made this a much better book than I ever thought it could be: Jess Males, John Wynkoop, Aaron Sachs, Heinrich Hartmann, and Tammy Butow. Thanks to Jason Dixon and Elijah Wright, who reviewed and gave feedback on the first outline of what would become this book and encouraged me to write it.
A huge thank you to my editors at O’Reilly: Brian Anderson, Virginia Wilson, and Angela Rufino. I must have driven you nuts with the many missed deadlines, so thank you for your patience and guidance. As a first-time author, your help was invaluable.
My writing progress seems to be positively correlated with my coffee consumption, leading me to have written the bulk of this book in coffee shops—often while traveling. Therefore, I would like to extend a special thanks to my dealers—er, baristas:
Old City Java, Knoxville, TN
Wild Love Bake House, Knoxville, TN
Workshop Cafe, San Francisco, CA
Hubsy, Paris, France
OR Espresso Bar, Brussels, Belgium
If you ever find yourself in the neighborhood, I recommend stopping in for a great cup of coffee.
Many of the lessons I set out to teach in this book are not new—in fact, some are decades old in concept. Thusly, I cannot take all of the credit, for I’ve reworded and presented in new ways the thoughts and ideas of those who have come before me. That is to say, there is very little new in the world, and ideas in tech have a way of recycling themselves.