Robert AboukhalilMonal DaxiniHomin Lee

Turning big data into knowledge

Date: This event took place live on July 19 2016

Presented by: Robert Aboukhalil, Monal Daxini, Homin Lee

Duration: Approximately 3 hours.

Questions? Please send email to


More than at any time in the past, organizations have access to a wealth of data on virtually any matter relevant to running a business. But are they effectively extracting value from it? Collecting data is easy, but putting it to meaningful use is a larger challenge. In this online conference, three data scientists discuss open source tools, methodologies, and processes for turning big data into valuable insights.

Netflix Keystone—Cloud scale event processing pipeline - 9amPT
Monal Daxini

Keystone processes over 700 billion events per day (1 peta byte) with at-least-once processing semantics in the cloud. Monal Daxini details how they used Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. He'll also share plans on offering a Stream Processing as a Service for all of Netflix use.


About Monal Daxini

Monal Daxini is a senior software engineer at Netflix building a scalable and multi-tenant event processing pipeline. He has worked on Netflix's Cassandra & Dynamite infrastructure, and was instrumental in developing the encoding compute infrastructure for all Netflix content. He has over 15 years of experience building scalable distributed systems at organizations like Netflix,, and Cisco.

A deep dive into R for Python developers - 10amPT
Robert Aboukhalil

Increasingly, R and Python are occupying a large part of the data scientist's toolbox. For Python developers, using R means having access to numerous tools for statistics, data manipulation, machine learning, and graphing. This talk is aimed at Python developers looking for a quick guide to the R language, and will cover R's essential features, its quirks, and how to write efficient R code.


About Robert Aboukhalil

Robert Aboukhalil is a computational biologist at Fluidigm, where he uses R, Python and other data science tools every day to analyze and visualize genomics datasets. Robert holds a PhD in Computational Biology from Cold Spring Harbor Laboratory.

Detecting outliers and anomalies in real-time at Datadog - 11amPT
Homin Lee

Datadog provides outlier and anomaly detection functionality to automatically alert on metrics that are difficult to monitor using thresholds alone. In this presentation, Homin Lee discusses the algorithms and open source tools Datadog uses, lessons they've learned from using these alerts on their own systems, along with some real-life examples on how to avoid false positives and negatives.


About Homin Lee

Homin Lee is a data scientist for Datadog, where he writes algorithms that process hundreds of billions data points a day. Prior to Datadog, Homin built large-scale machine learning systems at several start-ups. Homin has a PhD from Columbia University in computational learning theory, and was a Computing Innovation Fellow at the University of Texas at Austin.