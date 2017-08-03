Distribution Center (source: GRAPHICALBRAIN

​This post is part of a series of Q&A interviews going behind the scenes with speakers from our October 2017 Velocity Conference in New York. Register for the O’Reilly Velocity Conference to join Joseph Breuer, Robert Reta, and other industry experts. Use code ORM20 to save 20% on your conference pass (Gold, Silver, and Bronze passes).

I recently asked Joseph Breuer and Robert Reta, both Senior Software Engineers at Netflix, to discuss what they have learned through implementing a service at scale at Netflix. Joseph and Robert will be presenting a session on Event Sourcing at Global Scale at Netflix at O’Reilly Velocity Conference, taking place October 1-4 in New York. Here are some highlights from our conversation.

What were some of the obstacles you faced while implementing at scale?

The primary challenge when operating a service in a distributed architecture at scale is managing for the behavior of your downstream dependencies. Whether those dependencies are a datastore or a restful API defining timeouts, fallback data, and concurrency of the interactions will be the defining factor of your service. Your service may scale wonderfully, but if a dependency is not accounted for then the overall service can quickly fall over.

What were some of the tradeoffs you had to make along the way?

Implementing a service with Event Sourcing and Cassandra required two major paradigm shifts. First was the mutability of the data model. Second was the relational abstraction of data in a CRUD form. We did benefit from beginning with a clean slate in designing our service, but it was still a tradeoff.

What skills or experience do you need to operate at this scale?

You need a willingness to experiment and to try new solutions until one works. Build a system that allows for rapid changes—meaning, avoid an architectural or design decision that prescribes an update or transition strategy.

You advocate for Cassandra over other tools—why?

There are so many tools (our Velocity presentation goes into a number of them). When considering datastore architectures, your first choice is between relational versus distributed NoSQL. Think CAP theorem. For NoSQL your next choice is between document model versus time series. Cassandra supports time series data modeling, which was necessary for our implementation of Event Sourcing. Cassandra differentiates, for fine tuning, consistency levels of read/writes independently per execution. Finally, it was critical that there was existing infrastructure to manage a multi-region Cassandra schema within Netflix.

What other parts of the program for Velocity NY are of interest to you?

Of particular interest are the sessions focusing on load testing and capacity setting of distributed systems. For instance, Susie Xia and Anant Rao’s How LinkedIn Determines the Capacity Limits of its Services Using Live Traffic and Jeffrey Valeo’s Lessons Learned from Load Testing Distributed Systems.