I recently sat down with James Turnbull, CTO at Kickstarter, to discuss how the operations role is changing, and what the future may hold for the industry in general and for monitoring and infrastructure tools in particular. Here are some highlights from our talk.
1. What makes a good operations professional in this day and age?
A good operations professional is diverse and flexible. Once upon a time it may have been possible to pigeonhole an “operation’s person,” usually as a cliched stereotype of a bearded white dude, writing 10,000-line Bash scripts in between craft beers and bitterness. Nowadays, when I think about ops folks, I think of business-savvy, deeply technical, thoughtful, polyglot programmers who pride themselves on delivering the right, high quality, automated, measured and managed services. They pride themselves on being part of the engineering community in their organizations, rather than a silo. They think about business requirements, engage with their peers in helping them build deployable and manageable product, and conduct blameless post-mortems when things go wrong.
A good operations professional today cares about user experience, developer happiness, work-life balance, and collaboration. They are learning people who want to make things better in their communities and organizations. They are empathetic to the needs of their colleagues because they own the experience (and the cost!) of putting those colleagues’ hard-built features and product out into the world. They are compassionate and considerate because they have the experience of being on call and of bearing the brunt of customer disappointment if a product or service is unavailable or unacceptable.
A good operations professional looks like Charity Majors, Aaron Suggs, Katherine Daniels, Justin Lintz, Mandi Walls, Nigel Kersten, Alice Goldfuss, and a myriad of other amazing folks who have made me proud to have an operations heritage.
2. You've written extensively on the topic of monitoring. What are the most exciting trends you are seeing lately?
The most exciting trend I’ve seen is the new focus on ensuring the business needs of companies are addressed by monitoring. A lot of folks have started evangelising that instead of monitoring being built from the bottom up, starting with hosts and infrastructure, they are instead built from the top down, starting with business metrics and applications. You identify the metrics that represent the health of your business, for example sales volume or checkout conversion. You then work down the stack, identifying the applications, services, and finally hosts that contribute to that health. Now if something changes in the health of your business then you’ll not only know about it, but you can also follow any anomalies in the stack to any misbehaving infrastructure.
3. You've written several books on the topic of infrastructure, and worked at places as varied as Puppet and Docker Inc. What do you see as the future of infrastructure tools?
I think the future holds some mix of movement of workloads to more services-oriented architecture (and microservices) and movement to more flexible compute platforms like cloud and PaaS. By extension, I think we’ll see more tooling that makes that movement and use of that architecture easier.
But I do think it is a bit nebulous right now. I think a really interesting phenomenon at the moment is the long tail of adoption for infrastructure tools. For every company in the Bay or New York City going all in on Docker and Kubernetes there are a hundred just discovering that there might be solutions to the problems of modern infrastructure management.
4. How does one cultivate and maintain a good engineering culture?
I could write a book on this and still not cover everything! We start with a fair and blameless culture where people can make mistakes and learn. Good, transparent communication between the team, between peers in the design and product communities, and into the business is critical. So is solid training, learning, and mentoring patterns and approaches across all levels of the team; clear understanding of our objectives, and an individual understanding of what you need to do to be successful; and collective ownership of the future of your product, and of important central concerns like availability, security, and performance. Everyone is accountable for making performant, secure, and resilient product.
5. You're one of the chairs for the Velocity Conference in New York this September. What presentations are you looking forward to attending while there?
I think it’s a great program overall. It’s hard to narrow it down to just a few presentations! Some highlights that I am looking forward to are the “Designing large-scale distributed systems” workshop, the “Ops in the time of serverless containerized webscale” panel, the thought-provoking and funny Joe Damato’s “Infrastructure as code might be literally impossible,” and Shopify’s talk on scaling their multi-tenant architecture.