O'Reilly logo

Hadoop For Dummies by Dirk deRoos

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 10

Developing and Scheduling Application Workflows with Oozie

In This Chapter

arrow Setting up the Oozie server

arrow Developing and running an Oozie workflow

arrow Scheduling and coordinating Oozie workflows

Moving data and running different kinds of applications in Hadoop is great stuff, but it’s only half the battle. For Hadoop’s efficiencies to truly start paying off for you, start thinking about how you can tie together a number of these actions to form a cohesive workflow. This idea is appealing, especially after you and your colleagues have built a number of Hadoop applications and you need to mix and match them for different purposes. At the same time, you inevitably need to prepare or move data as you progress through your workflows and make decisions based on the output of your jobs or other factors. Of course, you can always write your own logic or hack an existing workflow tool to do this in a Hadoop setting — but that’s a lot of work. Your best bet is to use Apache Oozie, a workflow engine and scheduling facility designed specifically for Hadoop.

As a workflow engine, Oozie enables you to run a set of Hadoop applications in a specified sequence known as a workflow. You define ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required