September 2019
Beginner to intermediate
494 pages
13h
English
As the last step of our process, let's combine our two datasets and write our processed dataset to file so that we can start working on this cleaned version in the future, which is achieved using the code in the next cell:
combined_user_df = pd.concat([user_df, user_tappy_df], axis=1)print(combined_user_df.head())combined_user_df.to_csv('data/combined_user.csv')
This is generally a good practice in a given data pipeline. Saving the processed, cleaned version of a dataset can save data engineers a lot of effort if something goes wrong along the way. It also offers flexibility, if and when we want to change or extend our pipeline further.
One interesting note about this cleaned version of our data is that, ...
Read now
Unlock full access