Ron BodkinMatt Hutton

Sponsored by


Integrating Apache Spark and NiFi for Data Lakes

Date: This event took place live on November 10 2016

Presented by: Ron Bodkin, Matt Hutton

Duration: Approximately 60 minutes.

Cost: Free

Questions? Please send email to


Capturing and processing big data isn't easy. That's why open-source software like Apache Spark, Kafka, Hadoop, and NiFi that scale, process, and manage immense data volumes are so popular. The drawback is that they do not allow business users to access, optimize, and analyze big data like some other enterprise-friendly tools.

This webcast will introduce Kylo, a soon-to-be open-source data lake framework based on Apache Spark and NiFi. Kylo automates many of the tasks associated with data lakes, such as data ingest, preparation, discovery, profiling, and management.

Kylo is used by a number of global enterprises to solve critical problems related to data ingestion and pipeline control across complex big data environments. Kylo integrates on-premise, cloud, and hybrid platforms with an engineering and data science control framework. Typically, the process to implement a data feed is several weeks. Kylo accelerates time-to-value, often from weeks and months to hours and sometimes minutes. Using Kylo, enterprises can empower business analysts, data scientists and others to perform analytics on the data lake.

After this webcast, you'll be able to:

  • Gain an understanding of the three phases of building a foundation for enterprise analytics using open source
  • Examine options to encourage "data democratization" including soon to be open-sourced Kylo, Apache Spark and NiFi and how they all work together in a data lake environment.
  • Learn more about the newly emerging discipline of Analytics Ops and how it enables the continuous delivery of analytics results

About Ron Bodkin, President and Co-Founder – ThinkBig, a Teradata Company

Ron has a successful history building engineering and data science teams to transform businesses using technology. He is a master of using open source technologies to generate value from Big Data and often speaks on the topic in addition to his favorite topics Hadoop, Storm and NoSQL.

Prior to the joining Think Big, Bodkin was Vice President of Engineering at Quantcast where he led the data science and engineer teams that pioneered the use of Hadoop and NoSQL for batch and real-time decision-making. Prior to that, Bodkin was Founder of New Aspects, which provided enterprise consulting for Aspect-oriented programming. Bodkin was also Co-Founder and CTO of B2B applications provider C-Bridge, which he led to team of 900 people and a successful IPO.

Bodkin graduated with honors from McGill University with a B.S. in Math and Computer Science. Bodkin also earned his Master's Degree in Computer Science from MIT, leaving the PhD program after presenting the idea for C-bridge and placing in the finals of the 50k Entrepreneurship Contest.

About Matt Hutton, Director of Research & Development – Think Big, a Teradata company

Matt directs our Research and Development team developing technology assets for Think Big Data Lake solutions. Matt has 20 years of director-level experience building and managing software teams in Silicon Valley that develop large-scale, distributed software and data solutions. Matt provided consulting for early technology leaders in Big Data using Hadoop such as Quantcast. Prior to joining Think Big, Matt led software engineering at Lawrence Livermore National Laboratory for the National Ignition Facility program, a fusion energy research program and the world's largest laser. Matt designed the software and data architecture for the peta-scale data processing dat cluster used for fusion sciences. Before LLNL, Matt was the Director of Software Engineering at ThinkLink (technology purchased by Microsoft), building a unified messaging and IP Telephony application service provider exceeding 500,000 customers. Prior to this, Matt was Director of Software Engineering at Netcom Online, an Internet Service Provider and early pioneer during the emergence of the Internet. Prior to Netcom (now EarthLink), Matt held software engineering positions at Symantec and Delrina Software.