book

Practical Monitoring

by Mike Julian

October 2017

Beginner to intermediate

170 pages

3h 58m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Who Should Read This BookWhy I Wrote This BookA Word on Monitoring TodayNavigating This BookOnline ResourcesConventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgments
Anti-Pattern #1: Tool ObsessionMonitoring Is Multiple Complex Problems Under One NameAvoid Cargo-Culting ToolsSometimes, You Really Do Have to Build ItThe Single Pane of Glass Is a MythAnti-Pattern #2: Monitoring-as-a-JobAnti-Pattern #3: Checkbox MonitoringWhat Does “Working” Actually Mean? Monitor That.OS Metrics Aren’t Very Useful—for AlertingCollect Your Metrics More OftenAnti-Pattern #4: Using Monitoring as a CrutchAnti-Pattern #5: Manual ConfigurationWrap-Up
Pattern #1: Composable MonitoringThe Components of a Monitoring ServicePattern #2: Monitor from the User PerspectivePattern #3: Buy, Not BuildIt’s CheaperYou’re (Probably) Not an Expert at Architecting These ToolsSaaS Allows You to Focus on the Company’s ProductNo, Really, SaaS Is Actually BetterPattern #4: Continual ImprovementWrap-Up
What Makes a Good Alert?Stop Using Email for AlertsWrite RunbooksArbitrary Static Thresholds Aren’t the Only WayDelete and Tune AlertsUse Maintenance PeriodsAttempt Automated Self-Healing FirstOn-CallFixing False AlarmsCutting Down on Needless FirefightingBuilding a Better On-Call RotationIncident ManagementPostmortemsWrap-Up
Before Statistics in Systems OperationsMath to the Rescue!Statistics Isn’t MagicMean and AverageMedianSeasonalityQuantilesStandard DeviationWrap-Up
Business KPIsTwo Real-World ExamplesYelpRedditTying Business KPIs to Technical MetricsMy App Doesn’t Have Those Metrics!Finding Your Company’s Business KPIsWrap-Up
The Cost of a Slow AppTwo Approaches to Frontend MonitoringDocument Object Model (DOM)Frontend Performance MetricsOK, That’s Great, but How Do I Use This?LoggingSynthetic MonitoringWrap-Up
Instrumenting Your Apps with MetricsHow It Works Under the HoodMonitoring Build and Release PipelinesHealth Endpoint PatternApplication LoggingWait a Minute…Should I Have a Metric or a Log Entry?What Should I Be Logging?Write to Disk or Write to Network?Serverless / Function-as-a-ServiceMonitoring Microservice ArchitecturesWrap-Up

Standard OS MetricsCPUMemoryNetworkDiskLoadSSL CertificatesSNMPWeb ServersDatabase ServersLoad BalancersMessage QueuesCachingDNSNTPMiscellaneous Corporate InfrastructureDHCPSMTPMonitoring Scheduled JobsLoggingCollectionStorageAnalysisWrap-Up
The Pains of SNMPWhat Is SNMP?How Does It Work?A Word on SecurityHow Do I Use SNMP?Interface MetricsInterface and LoggingRecapConfiguration TrackingVoice and VideoRoutingSpanning Tree Protocol (STP)ChassisCPU and MemoryHardwareFlow MonitoringCapacity PlanningWorking BackwardForecastingWrap-up
Monitoring and ComplianceUser, Command, and Filesystem AuditingSetting Up auditdauditd and Remote LogsHost Intrusion Detection System (HIDS)rkhunterNetwork Intrusion Detection System (NIDS)Wrap-Up
Business KPIsFrontend MonitoringApplication and Server MonitoringSecurity MonitoringAlertingWrap-Up
Demo AppMetadataEscalation ProcedureExternal DependenciesInternal DependenciesTech StackMetrics and LogsAlerts

Content preview from Practical Monitoring

Chapter 1. Monitoring Anti-Patterns

Before we can start off on our journey to great monitoring, we have to identify and correct some bad habits you may have adopted or observed in your environment.

As with many habits, they start off well-meaning. After years of inadequate tools, the realities of keeping legacy applications running, and a general lack of knowledge about modern practices, these bad habits become “the way it’s always been done” and are often taken with people when they leave one job for another. On the surface, they don’t look that harmful. But rest assured—they are ultimately detrimental to a solid monitoring platform. For this reason, we’ll refer to them as anti-patterns.

An anti-pattern is something that looks like a good idea, but which backfires badly when applied.

Jim Coplien

These anti-patterns can often be difficult to fix for various reasons: entrenched practices and culture, legacy infrastructure, or just plain old FUD (fear, uncertainty, and doubt). We’ll work through all of those, too, of course.

Anti-Pattern #1: Tool Obsession

There’s a great quote from Richard Bejtlich in his book The Practice of Network Security Monitoring (No Starch Press, 2013) that underscores the problem with an excessive focus on tools over capabilities:

Too many security organizations put tools before operations. They think “we need to buy a log management system” or “I will assign one analyst to antivirus duty, one to data leakage protection duty.” And so on. A tool-driven ...