Enterprise Data Workflows with Cascading
Date: This event took place live on September 17 2013
Presented by: Paco Nathan
Duration: Approximately 60 minutes.
Questions? Please send email to
In this hands-on webcast presented by Paco Nathan author of Enterprise Data Workflows with Cascading, he will discuss what defines a "workflow", in contrast to notions of "dataflow" and the impact that has on the tools required.
Overall, we're talking about middleware for Big Data — how to integrate Hadoop along with other data frameworks to build applications at scale.
Paco will compare and contrast some workflow platforms such as:
We will also discuss some popular tools that do not fit in this category (are not for workflows) but are commonly confused as such: Apache Pig and Apache Hive in particular. Understanding where those do or don't fit is helpful. Within the context of Cascading, there are also the Scala community (Scalding) and the Clojure community (Cascalog) — which account for most of the new production deployments. Paco will compare and contract both of these as well.
About Paco Nathan
Paco Nathan is a Data Scientist at Concurrent, Inc., and heads up the developer outreach program there. He has a dual background from Stanford in math/stats and distributed computing, with 25+ years experience in the tech industry. As an expert in Hadoop, R, predictive analytics, machine learning, natural language processing, Paco has built and led several expert Data Science teams, with data infrastructure based on large-scale cloud deployments. He has presented twice on the AWS Start-Up Tour, and gives talks often about Hadoop, Data Science, and Cloud Computing.
You may also be interested in: