Chapter 9. An example batch layer: Implementation

This chapter covers

  • Ingesting new data into the master dataset
  • Managing the details of a batch workflow
  • Integrating Thrift-based graph schemas, Pail, and JCascalog

In the last chapter you saw the architecture and algorithms for the batch layer for SuperWebAnalytics.com. Let’s now translate that to a complete working implementation using the tools you’ve learned about like Thrift, Pail, and JCascalog. In the process, you’ll see that the code matches the pipe diagrams and workflows developed in the previous chapter very closely. This is a sign that the abstractions used are sound, because you can write code similar to how you think about the problems.

As always happens with real-world tools, ...

Get Big Data now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.