More than at any time in the past, organizations have access to a wealth of data on virtually any matter relevant to running a business. But are they effectively extracting value from it? Collecting data is easy, but putting it to meaningful use is a larger challenge. In this online conference, three data scientists discuss open source tools, methodologies, and processes for turning big data into valuable insights.
Netflix Keystone—Cloud scale event processing pipeline - 9amPT
Keystone processes over 700 billion events per day (1 peta byte) with at-least-once processing semantics in the cloud. Monal Daxini details how they used Kafka, Samza, Docker, and Linux at scale to implement a multi-tenant pipeline in AWS cloud within a year. He'll also share plans on offering a Stream Processing as a Service for all of Netflix use.
A deep dive into R for Python developers - 10amPT
Increasingly, R and Python are occupying a large part of the data scientist's toolbox. For Python developers, using R means having access to numerous tools for statistics, data manipulation, machine learning, and graphing. This talk is aimed at Python developers looking for a quick guide to the R language, and will cover R's essential features, its quirks, and how to write efficient R code.
Detecting outliers and anomalies in real-time at Datadog - 11amPT
Datadog provides outlier and anomaly detection functionality to automatically alert on metrics that are difficult to monitor using thresholds alone. In this presentation, Homin Lee discusses the algorithms and open source tools Datadog uses, lessons they've learned from using these alerts on their own systems, along with some real-life examples on how to avoid false positives and negatives.