September 2017
Beginner to intermediate
304 pages
7h 2m
English
Each pipeline specification in Pachyderm can have a corresponding parallelism_spec field. This field, along with the glob patterns in your inputs, lets you parallelize your pipeline stages over their input data. Each pipeline stage is individually scalable independent of all the other pipeline stages.
The parallelism_spec field in the pipeline specification lets you control how many workers Pachyderm will spin up to process data in that pipeline stage. For example, the following parallelism_spec would tell Pachyderm to spin up 10 workers for a pipeline:
"parallelism_spec": { "constant": "10" },
The glob patterns in your inputs tell Pachyderm how it can share your input data over the workers declared by the parallelism_spec ...
Read now
Unlock full access