The challenge
Before we deep-dive into the code, remember how most machine learning efforts involve one of two simple goals—classification or ranking. In many cases, the classification is itself a ranking because we end up choosing the classification with the greatest rank (often a probability). Our foray into medical imaging will be no different—we will be classifying images into either of these binary categories:
- Disease state/positive
- Normal state/negative
Or, we will classify them into multiple classes or rank them. In the case of the diabetic retinopathy, we'll rank them as follows:
- Class 0: No Diabetic Retinopathy
- Class 1: Mild
- Class 2: Moderate
- Class 3: Severe
- Class 4: Widespread Diabetic Retinopathy
Often, this is called scoring. Kaggle kindly provides participants over 32 GB of training data, which includes over 35,000 images. The test data is even larger—49 GB. The goal is to train on the 35,000+ images using the known scores and propose scores for the test set. The training labels look...