Removing or adding features to a dataset directly impacts the schema and the recipe. The schema is used when creating the datasources, while the recipe is needed to train the model, as it specifies which data transformation will be performed prior to the model training.
Modifying the schema to remove features from the dataset can be done by simply adding the names of the variable to the excludedAttributeNames field. We can take the initial schema, and each time we remove a feature from the initial feature list, we add it to the excludedAttributeNames list. The steps are as follows:
- Open the JSON formatted schema into a schema dict
- Append the feature name to schema ['excludedAttributeNames']
- Save the schema to a ...