Risk of Bias in Chest Radiography Deep Learning Foundation Models

This repository contains the code for the paper

Ben Glocker, Charles Jones, Mélanie Roschewitz, Stefan Winzeck
Risk of Bias in Chest Radiography Deep Learning Foundation Models
Radiology: Artificial Intelligence (2023). DOI: 10.1148/ryai.230060

Dataset

The CheXpert imaging dataset can be downloaded from https://stanfordmlgroup.github.io/competitions/chexpert/. The corresponding demographic information is available here.

CXR Foundation Model

In our work we analyze the CXR foundation model by Google Health. Information on how to use the model to generate feature embeddings for the CheXpert dataset is available on the original GitHub repository.

Code

For running the code, we recommend setting up a dedicated Python environment.

Setup Python environment using conda

Create and activate a Python 3 conda environment:

conda create -n chexploration python=3
conda activate chexploration

Install PyTorch using conda (for CUDA Toolkit 11.3):

conda install pytorch torchvision cudatoolkit=11.3 -c pytorch

Setup Python environment using virtualenv

Create and activate a Python 3 virtual environment:

virtualenv -p python3 <path_to_envs>/chexploration
source <path_to_envs>/chexploration/bin/activate

Install PyTorch using pip:

pip install torch torchvision

Install additional Python packages:

pip install matplotlib jupyter pandas seaborn pytorch-lightning scikit-learn scikit-image tensorboard tqdm openpyxl tabulate statsmodels

Requirements

The code has been tested on Windows 10 and Ubuntu 18.04/20.04 operating systems. The data analysis does not require any specific hardware and can be run on standard laptop computers. The training and testing of the disease detection models requires a high-end GPU workstation. For our experiments, we used a NVIDIA Titan X RTX 24 GB.

How to use

In order to replicate the results presented in the paper, please follow these steps:

Download the CheXpert dataset, copy the file train.csv to the datafiles/chexpert folder. Download the CheXpert demographics data, copy the file CHEXPERT DEMO.xlsx to the datafiles/chexpert folder.
Run the notebook chexpert.sample.ipynb to generate the study data.
Run the notebook chexpert.resample.ipynb to perform test-set resampling.

To train and analyze the CheXpert model:

Adjust the variable img_data_dir to point to the CheXpert imaging data and run the following scripts:
- Run the script disease-prediction.chexpert-model.py to train the disease detection model.
Run the script evaluate_disease_detection.py to evaluate the prediction performance.
Run the notebook chexpert.bias-inspection.chexpert-model.ipynb for the statistical bias analysis.

To train and analyze the CXR foundation model:

Adjust the variable data_dir to point to the CheXpert embeddings from the CXR foundation model and run the following scripts:
- Run the script disease-prediction.cxr-foundation.py to train the disease detection model. Default is set to a linear prediction head. Check the code for CXR-MLP-3 and CXR-MLP-5 variants.
Run the script evaluate_disease_detection.py to evaluate the prediction performance.
Run the notebook chexpert.bias-inspection.cxr-foundation.ipynb for the statistical bias analysis.

Note, for CXR foundation model it is assumed that all embeddings for the CheXpert dataset are already available and converted from TensforFlow to NumPy. See the notebook chexpert.convert.cxr-foundation.ipynb for how to convert the embeddings.

Funding sources

This work is supported through funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 757173, Project MIRA, ERC-2017-STG) and by the UKRI London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare.

License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
datafiles		datafiles
notebooks		notebooks
prediction		prediction
samples		samples
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Risk of Bias in Chest Radiography Deep Learning Foundation Models

Dataset

CXR Foundation Model

Code

Setup Python environment using conda

Setup Python environment using virtualenv

Install additional Python packages:

Requirements

How to use

Funding sources

License

About

Releases

Packages

Languages

License

biomedia-mira/cxr-foundation-bias

Folders and files

Latest commit

History

Repository files navigation

Risk of Bias in Chest Radiography Deep Learning Foundation Models

Dataset

CXR Foundation Model

Code

Setup Python environment using conda

Setup Python environment using virtualenv

Install additional Python packages:

Requirements

How to use

Funding sources

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages