Introduction

A pulmonary nodule is a small round or oval-shaped growth in the lung. It may also be called a “spot on the lung” or a “coin lesion.” Pulmonary nodules are smaller than three centimeters (around 1.2 inches) in diameter. If the growth is larger than that, it is called a pulmonary mass and is more likely to represent a cancer than a nodule [http://my.clevelandclinic.org/health/articles/pulmonary-nodules].

Nodules can be detected in chest CT images as objects with some kind of rounded shape (even though it is not always the case), which have an intensity that is higher (brighter) than the parenchyma tissue in the lungs.

If a nodule is detected, guidelines have to be followed to decide what is the best management for the patient.

For this purpose, the LungRADS guidelines have been released, which describe the type of follow-up analysis based on the type and size of detected nodules.

The main categories of nodules considered in LungRADS are 5:

  • solid nodule
  • ground-glass nodules (also called GGN, non-solid nodules)
  • semi-solid nodules (also called part-solid nodules)
  • calcified nodules
  • spiculated nodules

Solid nodules are characterized by an homogeneous texture, a well-defined shape and an intensity above -450 Housfield Units (HU) on CT. Spiculated nodules appear as solid lesions with characteristics spikes at the border, often considered as an indicator of malignancy. Non-Solid nodules (also called ground-glass opacities) have an intensity on CT lower than solid nodules (above -750 HU). Part-Solid nodules (also called semi-solid nodules) contain both a non-solid and a solid part, the latter normally referred to as the solid core. Compared with solid nodules, non-solid and part-solid nodules have a higher frequency of being malignant lesions. Finally,  calcified nodules are characterized by a high intensity and a well-defined rounded shape on CT. If a nodule is completely calcified, it is a benign lesion.

As you can see, the five categories are mentioned in the table, as well as nodule size.

While nodule size is something that can be easily measured using a segmentation software, the discrimination of nodule types is not trivial. The figure on the right shows examples of pulmonary nodules at different scales. For each nodule, a 2D view in the axial, coronal and sagittal view is shown. 

In this challenge, we are going to develop a system based on machine learning to automatically classify pulmonary nodules detected in chest CT scans. For this purpose, we will use data from the publicly available dataset LIDC-IDRI (https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI). In LIDC-IDRI, nodules have been annotated and labeled by four radiologists. Based on their annotations, we extracted a subset of nodules that will be used in this assignment for training and for test purposes.

The idea of this assignment is to develop a multi-class classification system using machine learning, in particular using neural networks. The goal is to achieve the best classification accuracy on the test set, which contains 50 nodules for each class. For each nodule in both the training and test set, we provide both raw data (cubes of 40x40x40 mm containing nodules) and a representation of nodules, meaning a feature vector of 256 values (more details are provided later).

The purpose of this assignment is using the features provided to develop a system based on neural networks to classify pulmonary nodule type.

Submissions

You will need to submit a .csv file containing your predictions. The csv file should look like:

  nodule_id,label
  1.3.6.1.4.1.14519.5.2.1.6279.6001.325580698241281352835338693869_-61.8072474_75.40607439_-140.7201903_7.554917919, 4
  1.3.6.1.4.1.14519.5.2.1.6279.6001.961063442349005937536597225349_-72.8850919067_0.678575316667_-100.546239347_4.40666871133, 4
  ...
  1.3.6.1.4.1.14519.5.2.1.6279.6001.436403998650924660479049012235_117.59098275_45.37120507_-178.11363635_5.662942113, 1
  1.3.6.1.4.1.14519.5.2.1.6279.6001.669435869708883155232318480131_83.86702735_15.13661035_-90.17647059_5.988521394, 4

There should be 243 predictions in total, the value of the labels should be integers in the range 1 to 5 (inclusive). Please check that your labels are numbers, and not wrapped in brackets eg. 1 and not [1]