Appendix B. Ingest pipelines
Data that makes its way into Elasticsearch is not always clean. Usually, data requires transformation, enrichment, or formatting. There are options for cleaning data before bringing it into Elasticsearch for ingestion, such as writing custom transformers or using ETL (extract, transform, load) tools. Elasticsearch allows these capabilities via ingest pipelines that provide first-class support for manipulating data—we can split, remove, modify, and enhance data before it is ingested.
B.1 Overview
Data to be indexed into Elasticsearch may need to undergo transformation and manipulation. Consider an example of loading millions of legal documents represented as PDF files into Elasticsearch for searching. Although bulk ...
Get Elasticsearch in Action, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.