Data collection
The first step is always the most generic because it depends on each single context. However, before working with any data, it's necessary to collect it from all the sources where it's stored. The ideal situation is to have a comma separated values (CSV) (or another suitable format) dump that can be immediately loaded, but more often, the engineer has to look for all the database tables, define the right SQL query to collect all the pieces of information, and manage data type conversion and encoding. We're not going to discuss this topic, but it's important not to under-evaluate this stage because it can be much more difficult than expected. I suggest, whenever possible, to extract flattened tables, where all the fields are ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access