Machine learning solution workflowBest practices in the data preparation stageBest practice 1 – Completely understanding the project goalBest practice 2 – Collecting all fields that are relevantBest practice 3 – Maintaining the consistency of field valuesBest practice 4 – Dealing with missing dataBest practice 5 – Storing large-scale dataBest practices in the training sets generation stageBest practice 6 – Identifying categorical features with numerical valuesBest practice 7 – Deciding whether to encode categorical featuresBest practice 8 – Deciding whether to select features, and if so, how to do soBest practice 9 – Deciding whether to reduce dimensionality, and if so, how to do soBest practice 10 – Deciding whether to rescale featuresBest practice 11 – Performing feature engineering with domain expertiseBest practice 12 – Performing feature engineering without domain expertiseBinarizationDiscretizationInteractionPolynomial transformationBest practice 13 – Documenting how each feature is generatedBest practice 14 – Extracting features from text dataTf and tf-idfWord embeddingWord embedding with pre-trained modelsBest practices in the model training, evaluation, and selection stageBest practice 15 – Choosing the right algorithm(s) to start withNaïve BayesLogistic regressionSVMRandom forest (or decision tree)Neural networksBest practice 16 – Reducing overfittingBest practice 17 – Diagnosing overfitting and underfittingBest practice 18 – Modeling on large-scale datasetsBest practices in the deployment and monitoring stageBest practice 19 – Saving, loading, and reusing modelsSaving and restoring models using pickleSaving and restoring models in TensorFlowBest practice 20 – Monitoring model performanceBest practice 21 – Updating models regularlySummaryExercises