Chapter 4

Stage 1: Data Extraction

Summary

This chapter describes the Guerrilla Analytics workflow stage of Data Extraction. It will discuss the pitfalls and risks associated with extracting data from systems. We then make a set of recommendations that apply Guerrilla Analytics principles to reduce these risks, avoid these pitfalls, and maintain data provenance.

Keywords

Data Extraction
File Formats
Checksums

4.1. Guerrilla Analytics workflow

Data Extraction is the first stage in the Guerrilla Analytics workflow (Section 2.1), as illustrated in Figure 9. It involves taking data out of some system or location so it can be brought into the analytics team’s Data Manipulation Environment (DME). The place the data is extracted from is called ...

Get Guerrilla Analytics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.