In this chapter we cover loading a data file in Cascalog.
In this chapter we assume you have Leiningen set up.
The benefit of this chapter is understanding and applying the concept that Hadoop is a batch processing system. In order to process data, Hadoop must load it first. This chapter explains loading data.
So far we’ve been working with a data structure defined in memory. Now we’ll work with one that is defined in a file.
1. Create a new Leiningen project
cascalog-load-file in your projects directory, and change to that directory:
lein new app cascalog-load-file cd cascalog-load-file
2. Put the following in your
projects.clj file: ...