O'Reilly logo

Creating a Data-Driven Enterprise with DataOps by Joydeep Sen Sarma, Ashish Thusoo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 9. LinkedIn: The Road to Data Craftsmanship

I’ve been with LinkedIn for only 18 months. Yet, what I’ve seen in data operations has amazed me. Like all consumer web companies, there’s always been an enormous amount of data that’s flowed through LinkedIn. But LinkedIn was relatively early to realize the importance of this data.

At LinkedIn, it wasn’t just about getting the analytics right. The company realized early on that infrastructure had to go hand in hand with analytics to support the data ecosystem. Many open source projects, most famously Apache Kafka, were born at LinkedIn to support this ecosystem. Today at LinkedIn, we rely heavily on the scalability and reliability of Kafka, Hadoop, and a surrounding ecosystem of open source and internally developed tools to serve our analytic needs.

Early on, the company found that different teams—such as the Email Team, and the Homepage Team—were using disparate tools when building data pipelines, as illustrated in Figure 9-1.

Different teams built and operated different pipelines (source: LinkedIn)
Figure 9-1. Different teams built and operated different pipelines (source: LinkedIn)1

LinkedIn knew, of course, that it shouldn’t have multiple pipelines for moving and ingesting data, or for computing metrics. It’s inefficient and difficult to manage, and, most important, it leads to inconsistent and unpredictable ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required