The diabetic retinopathy challenge, despite the challenges, is not as complicated as it gets. The actual images were provided in JPEG format, but most medical data is not in JPEG format. They are usually within container formats such as DICOM. DICOM stands for Digital Imaging and Communications in Medicine and has a number of versions and variations. It contains the medical image, but also header data. The header data often includes general demographic and study data, but it can contain dozens of other custom fields. If you are lucky, it will also contain a diagnosis, which you can use as a label.
DICOM data adds another step to the pipeline we discussed earlier because we now need to read the DICOM file, extract the ...