So far we have seen how to preprocess images so that features from those images can be extracted and fed into CNNs. Additionally, we have seen how to extract and map metadata and link it with the original images. Now it's time to extract features from those preprocessed images.
We also need to keep in mind the provenance of the metadata of each image. As you can guess, we need three map operations for feature extractions. Essentially, we have three maps. For details see the imageFeatureExtractor.scala script:
- Business mapping with the form imageID → businessID
- Data map of the form imageID → image data
- Label map of the form businessID → labels
We first define a regular expression pattern to extract the .jpg name ...