The ETL process
The previous stages in the big data processing field evolved over several decades under the name of data mining, and then adopted the popular name of big data.
One of the best outcomes of these disciplines is the specification of the Extraction, Transform, Load (ETL) process.
This process starts with a mix of many data sources from business systems, then moves to a system that transforms the data into a readable state, and then finishes by generating a data mart with very structured and documented data types.
For the sake of applying this concept, we will mix the elements of this process with the final outcome of a structured dataset, which includes in its final form an additional label column (in the case of supervised learning ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access