Integrating Apache Spark and NiFi for Data Lakes
Date: This event took place live on November 10 2016
Presented by: Ron Bodkin, Matt Hutton
Duration: Approximately 60 minutes.
Questions? Please send email to
Capturing and processing big data isn't easy. That's why open-source software like Apache Spark, Kafka, Hadoop, and NiFi that scale, process, and manage immense data volumes are so popular. The drawback is that they do not allow business users to access, optimize, and analyze big data like some other enterprise-friendly tools.
This webcast will introduce Kylo, a soon-to-be open-source data lake framework based on Apache Spark and NiFi. Kylo automates many of the tasks associated with data lakes, such as data ingest, preparation, discovery, profiling, and management.
Kylo is used by a number of global enterprises to solve critical problems related to data ingestion and pipeline control across complex big data environments. Kylo integrates on-premise, cloud, and hybrid platforms with an engineering and data science control framework. Typically, the process to implement a data feed is several weeks. Kylo accelerates time-to-value, often from weeks and months to hours and sometimes minutes. Using Kylo, enterprises can empower business analysts, data scientists and others to perform analytics on the data lake.
After this webcast, you'll be able to:
About Ron Bodkin, President and Co-Founder – ThinkBig, a Teradata Company
Ron has a successful history building engineering and data science teams to transform businesses using technology. He is a master of using open source technologies to generate value from Big Data and often speaks on the topic in addition to his favorite topics Hadoop, Storm and NoSQL.
Prior to the joining Think Big, Bodkin was Vice President of Engineering at Quantcast where he led the data science and engineer teams that pioneered the use of Hadoop and NoSQL for batch and real-time decision-making. Prior to that, Bodkin was Founder of New Aspects, which provided enterprise consulting for Aspect-oriented programming. Bodkin was also Co-Founder and CTO of B2B applications provider C-Bridge, which he led to team of 900 people and a successful IPO.
Bodkin graduated with honors from McGill University with a B.S. in Math and Computer Science. Bodkin also earned his Master's Degree in Computer Science from MIT, leaving the PhD program after presenting the idea for C-bridge and placing in the finals of the 50k Entrepreneurship Contest.
About Matt Hutton, Director of Research & Development – Think Big, a Teradata company
Matt directs our Research and Development team developing technology assets for Think Big Data Lake solutions. Matt has 20 years of director-level experience building and managing software teams in Silicon Valley that develop large-scale, distributed software and data solutions. Matt provided consulting for early technology leaders in Big Data using Hadoop such as Quantcast. Prior to joining Think Big, Matt led software engineering at Lawrence Livermore National Laboratory for the National Ignition Facility program, a fusion energy research program and the world's largest laser. Matt designed the software and data architecture for the peta-scale data processing dat cluster used for fusion sciences. Before LLNL, Matt was the Director of Software Engineering at ThinkLink (technology purchased by Microsoft), building a unified messaging and IP Telephony application service provider exceeding 500,000 customers. Prior to this, Matt was Director of Software Engineering at Netcom Online, an Internet Service Provider and early pioneer during the emergence of the Internet. Prior to Netcom (now EarthLink), Matt held software engineering positions at Symantec and Delrina Software.