This chapter describes the Guerrilla Analytics workflow stage of Data Extraction. It will discuss the pitfalls and risks associated with extracting data from systems. We then make a set of recommendations that apply Guerrilla Analytics principles to reduce these risks, avoid these pitfalls, and maintain data provenance.
4.1. Guerrilla Analytics workflow
Data Extraction is the first stage in the Guerrilla Analytics workflow (Section 2.1), as illustrated in Figure 9. It involves taking data out of some system or location so it can be brought into the analytics team’s Data Manipulation Environment (DME). The place the data is extracted from is called ...