Ingest and egress patterns for multistructured data

The next sections describe the specific design patterns for ingesting unstructured data (images) and semi-structured text data (Apache log and custom log).The following is a brief overview of the formats:

  • Apache Log formats: Extracting intelligence from this format is a widely used enterprise use case and is relevant across the board.
  • Custom log format: This format represents an arbitrary log that can be parsed through a regex. Understanding this pattern will help you to extend it for many other similar use cases where a custom loader has to be written.
  • Image format: This is the only pattern dealing with nontext data, and the pattern described to ingest images can be tweaked and applied to any type ...

Get Pig Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.