Chapter 10. Logging

In his book In Defense of Food (Penguin Press), Michael Pollan explores the evolution of food chains in the United States, from garden-grown vegetables to processed foods created by food scientists. Encouraging readers to return to a focus on nutrition, not nutrients, Pollan distills the 205 pages of this book into the short phrase “Eat food, not too much, mostly plants.”

This phrase came to mind when I was thinking about this chapter. Logging is a distillation of software execution state, ideally striking a balance in which you log just enough information, but not too much, especially in large-scale environments. When it comes to cost-effective logging, I think of a similar phrase: “Write logs, not too much, never sensitive information.”

Done well, logging can provide you with invaluable information about the state and operation of complex systems such as data pipelines. It’s a relief to come across a well-considered log message telling you exactly what went wrong when debugging a problem.

In addition to debugging issues and observing execution, the ability to export log data to query tools like Google BigQuery can turn well-formatted logs into a database. Analyzing logs at scale can give you further insight into performance and system health, in addition to generating metrics.

Done not so well, logging can be a pit of despair. Excessive logging drags down performance and racks up cloud costs. Noisy or poorly considered logging can ruin the ability to use ...

Get Cost-Effective Data Pipelines now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.