15. Loading a Data File into Cascalog

In this chapter we cover loading a data file in Cascalog.

Assumptions

In this chapter we assume you have Leiningen set up.

Benefits

The benefit of this chapter is understanding and applying the concept that Hadoop is a batch processing system. In order to process data, Hadoop must load it first. This chapter explains loading data.

The Recipe—Code

So far we’ve been working with a data structure defined in memory. Now we’ll work with one that is defined in a file.

1. Create a new Leiningen project cascalog-load-file in your projects directory, and change to that directory:

lein new app cascalog-load-file cd cascalog-load-file

2. Put the following in your projects.clj file: ...

Get Clojure Recipes now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.