Chapter 9. An example batch layer: Implementation

This chapter covers

  • Ingesting new data into the master dataset
  • Managing the details of a batch workflow
  • Integrating Thrift-based graph schemas, Pail, and JCascalog

In the last chapter you saw the architecture and algorithms for the batch layer for SuperWebAnalytics.com. Let’s now translate that to a complete working implementation using the tools you’ve learned about like Thrift, Pail, and JCascalog. In the process, you’ll see that the code matches the pipe diagrams and workflows developed in the previous chapter very closely. This is a sign that the abstractions used are sound, because you can write code similar to how you think about the problems.

As always happens with real-world tools, ...

Get Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.