Building Your Data-Processing Pipeline

Complex use cases may require a data-processing pipeline with a consumer stage, one or more producers, and several producer-consumers in between. However, the main principles stay the same, so we’re going to start with a two-stage pipeline first and demonstrate how that works.

We will build a fake service that scrapes data from web pages—normally an intensive task, dependent on system resources and a reliable network connection. Our goal is to be able to request a number of URLs to be scraped, and have the data pipeline take care of the workload.

First, let’s create a new application with a supervision tree like we’ve done before. We will name it scraper and pretend we’re going to scrape data from web ...

Get Concurrent Data Processing in Elixir now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.