It’s not just about Hadoop core anymore

For maximum business value, big data applications have to involve multiple Hadoop ecosystem components.

By Amr Awadallah

January 25, 2015

Data is deluging today’s enterprise organizations from ever-expanding sources and in ever-expanding formats. To gain insight from this valuable resource, organizations have been adopting Apache Hadoop with increasing momentum. Now, the most successful players in big data enterprise are no longer only utilizing Hadoop “core” (i.e., batch processing with MapReduce), but are moving toward analyzing and solving real-world problems using the broader set of tools in an enterprise data hub (often interactively) — including components such as Impala, Apache Spark, Apache Kafka, and Search. With this new focus on workload diversity comes an increased demand for developers who are well-versed in using a variety of components across the Hadoop ecosystem.

Due to the size and variety of the data we’re dealing with today, a single use case or tool — no matter how robust — can camouflage the full, game-changing potential of Hadoop in the enterprise. Rather, developing end-to-end applications that incorporate multiple tools from the Hadoop ecosystem, not just the Hadoop core, is the first step toward activating the disparate use cases and analytic capabilities of which an enterprise data hub is capable. Whereas MapReduce code primarily leverages Java skills, developers who want to work on full-scale big data engineering projects need to be able to work with multiple tools, often simultaneously. An authentic big data applications developer can ingest and transform data using Kite SDK, write SQL queries with Impala and Hive, and create an application GUI with Hue.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

For that reason, concurrent with Strata + Hadoop World San Jose 2015 on February 17-20, 2015, and in the same venue, Cloudera University is offering its recently created four-day training course, Designing and Building Big Data Applications for aspiring enterprise data hub professionals who want to learn how to use Hadoop in all its available facets. Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the ecosystem, learning topics such as:

Creating a data set with Kite SDK
Developing custom Flume components for data ingestion
Managing a multi-stage workflow with Apache Oozie
Analyzing data with Apache Crunch
Writing user-defined functions for Apache Hive and Impala
Indexing data with Cloudera Search

Thanks to its integration with the conference, attendees of this course will have access to conference keynotes and networking events during the week. Whether you’re an aspiring developer, or an existing one who needs to brush up on the current state of the ecosystem, it’s a great way to sharpen your skills. I sincerely hope you check it out!

Post topics: Data