August 2019
Beginner
482 pages
12h 56m
English
But how can you define a pipeline? What are the best steps to split your code to keep it both cost-efficient and low-maintenance? From our experience, it mainly depends on a combination of two factors:
As a rule of thumb, if the dataset is external and hence unreliable (for example, a Wikipedia page or Open Data Portal), we'd recommend splitting your injection pipeline into distinctive steps. In the first step, you'll collect all of the data the way it is provided—say, store the whole HTML page. For data, use JSON or CSV—something with no strict schema. After raw data is stored, you ...
Read now
Unlock full access