Chapter 7. Advanced Patterns and Future Work

In the previous chapter, you have studied various Big Data reduction techniques that aim to reduce the amount of data being analyzed or processed. We have explored design patterns that perform dimensionality reduction using the Principal Component Analysis technique and numerosity reduction using clustering, sampling, and histogram techniques.

In this chapter, we will start by discussing design patterns that primarily deal with text data and will explore a wide array of analytics pipelines that can be built using Pig as the key ingestion and processing engine.

We will be delving into the following patterns:

  • Clustering textual data
  • Topic discovery
  • Natural language processing
  • Classification

We will also speculate ...

Get Pig Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.