In this phase, data integration, selection, cleaning, and pre-processing of the data is performed. This is often the most time-consuming part but perhaps the most important step, as it is important to have high-quality data. The more data you have, the more the data is dirty.
Again, this phase is relatable to a database development project. System integration, query and selection, cleaning, and other data preprocessing steps (to be able to use it in a new database model) is expected. This will often involve aggregating the data, building key-foreign key relationships, cleansing, and so on.