|
| 1 | +# Predict-Lung-Disease-through-Chest-X-Ray |
| 2 | +We obtain this repository by refactoring the [code](https://github.com/Azure/AzureChestXRay) for the blog post [Using Microsoft AI to Build a Lung-Disease Prediction Model using Chest X-Ray Images](https://blogs.technet.microsoft.com/machinelearning/2018/03/07/using-microsoft-ai-to-build-a-lung-disease-prediction-model-using-chest-x-ray-images/). This instruction aims to help newcomers build the system in a very short time. |
| 3 | +# Installation |
| 4 | +1. Clone this repository |
| 5 | + ```Shell |
| 6 | + git clone https://github.com/svishwa/crowdcount-mcnn.git |
| 7 | + ``` |
| 8 | + We'll call the directory that you cloned PredictLungDisease `ROOT` |
| 9 | + |
| 10 | +2. All essential dependencies should be installed:pickle, random, re, tqdm, cv2, numpy, pandas, sklearn, keras, tensorflow, keras_contrib, collections.counter. |
| 11 | + |
| 12 | +# Data set up |
| 13 | +1. Download the NIH Chest X-ray Dataset from here: |
| 14 | + https://nihcc.app.box.com/v/ChestXray-NIHCC. |
| 15 | + You need to get all the image files (all the files under `images` folder in NIH Dataset), `Data_Entry_2017.csv` file, as well as the Bounding Box data `BBox_List_2017.csv`. |
| 16 | + |
| 17 | +2. Create Directory |
| 18 | + ```Shell |
| 19 | + mkdir ROOT/azure-share/chestxray/data/ChestX-ray8/ChestXray-NIHCC |
| 20 | + mkdir ROOT/azure-share/chestxray/data/ChestX-ray8/ChestXray-NIHCC_other |
| 21 | + ``` |
| 22 | +3. Save all images under `ROOT/azure-share/chestxray/data/ChestX-ray8/ChestXray-NIHCC` |
| 23 | + |
| 24 | +4. Save `Data_Entry_2017.csv` and `BBox_List_2017.csv` under `ROOT/azure-share/chestxray/data/ChestX-ray8/ChestXray-NIHCC_other` |
| 25 | + |
| 26 | +5. Process the Data |
| 27 | + ```Shell |
| 28 | + mkdir ROOT/azure-share/chestxray/output/data_partitions |
| 29 | + ``` |
| 30 | + Run `000_preprocess.py` to create `*.pickle` files under this directory |
| 31 | +# Test |
| 32 | +1. We have provided the pretrained-model `azure_chest_xray_14_weights_712split_epoch_054_val_loss_191.2588.hdf5` under `ROOT/azure-share/chestxray/output/fully_trained_models`. You can also download it separately from [here](https://chestxray.blob.core.windows.net/chestxraytutorial/tutorial_xray/chexray_14_weights_712split_epoch_054_val_loss_191.2588.hdf5). |
| 33 | + |
| 34 | +2. Run `020_evaluate.py` and it will create `weights_only_azure_chest_xray_14_weights_712split_epoch_054_val_loss_191.2588.hdf5` saving weights of the pretrained-model under the same directory. |
| 35 | + |
| 36 | +3. Below is the result showing the AUC score of all the 14 diseases: |
| 37 | + |
| 38 | + | Disease | Our AUC Score | Stanford AUC Score | Delta |
| 39 | + |--------------------|------------------|--------------------|-----------: |
| 40 | + | Atelectasis | 0.822334 | 0.8094 | -0.012934 |
| 41 | + | Cardiomegaly | 0.933610 | 0.9248 | -0.008810 |
| 42 | + | Effusion | 0.882471 | 0.8638 | -0.018671 |
| 43 | + | Infiltration | 0.744504 | 0.7345 | -0.010004 |
| 44 | + | Mass | 0.858467 | 0.8676 | 0.009133 |
| 45 | + | Nodule | 0.784230 | 0.7802 | -0.004030 |
| 46 | + | Pneumonia | 0.800054 | 0.7680 | -0.032054 |
| 47 | + | Pneumothorax | 0.829764 | 0.8887 | 0.058936 |
| 48 | + | Consolidation | 0.811969 | 0.7901 | -0.021869 |
| 49 | + | Edema | 0.894102 | 0.8878 | -0.006302 |
| 50 | + | Emphysema | 0.847477 | 0.9371 | 0.089623 |
| 51 | + | Fibrosis | 0.882602 | 0.8047 | -0.077902 |
| 52 | + | Pleural Thickening | 1.000000 | 0.8062 | -0.193800 |
| 53 | + | Hernia | 0.916610 | 0.9164 | -0.000210 |
| 54 | + |
| 55 | +# Visualization |
| 56 | +1. Create Folder Test |
| 57 | + ```Shell |
| 58 | + mkdir ROOT/azure-share/chestxray/data/ChestX-ray8/test_images |
| 59 | + ``` |
| 60 | + Copy any number of images under `ChestXray-NIHCC` to `test_images` and resize them to 224x224 pixels. |
| 61 | + |
| 62 | +2. Run `004_cam_simple.py` and it will output a Class Activation Map(CAM). The CAM lets us see which regions in the image were relevant to this class. |
| 63 | + |
| 64 | +  |
| 65 | + |
| 66 | +# Referenced Paper |
| 67 | +- Baseline result: https://arxiv.org/abs/1705.02315 |
| 68 | +- Image Localization: http://arxiv.org/abs/1512.04150 |
| 69 | +- The original chexnet paper mentioned in [StanfordML website](https://stanfordmlgroup.github.io/projects/chexnet/) as well as their [paper](https://arxiv.org/abs/1711.05225). |
| 70 | +- http://cs231n.stanford.edu/reports/2017/pdfs/527.pdf for pre-processing the data |
| 71 | +- https://arxiv.org/abs/1711.08760 for some other thoughts on the model architecture and the relationship between different diseases |
| 72 | + |
| 73 | +# Notes |
| 74 | + Please contact [email protected] if you have any problem. |
0 commit comments