cvasl is an open source collaborative Python library for analysis of brain MRIs, with a particular focus on arterial spin labeled sequences. This library supports ongoing research at the University of Amsterdam Medical Center on brain aging, but is built for the entire community of radiology researchers across university and academic medical centers worldwide.
cvasl is not a pure Python package. If you want to run all functionalities, you will need R installed as well.
To install cvasl, you can use pip:
pip install cvasl
For developers or those who want the latest version, you can install directly from the repository:
git clone https://github.com/brainspinner/cvasl.git
cd cvasl
pip install -e .
The repository is organized as follows:
-
cvasl/: Core package modules
- dataset.py: Contains the MRIdataset class for loading and preprocessing data
- harmonizers.py: Implements various harmonization methods
- harmonizer_cli.py: Command-line interface for harmonization
- prediction.py: Brain age prediction functionality
- prediction_cli.py: Command-line interface for brain age prediction
-
docs/: Documentation files
-
tests/: Test scripts and data
The cvasl package is designed to work with MRI datasets. For derived value datasets (which use measurements instead of images), you will need CSV or TSV files arranged in a specific format as outlined in seperated_values_specifications.md.
When working with patient data, ensure you do not expose any identifying information if using the code for demonstrations or pull requests.
cvasl provides two primary command-line interfaces:
harmonizer_cli.py
: For performing data harmonizationprediction_cli.py
: For brain age prediction
These tools allow you to process MRI data without writing Python code.
The MRIdataset
class in cvasl.dataset
is designed to handle MRI datasets from different sites, preparing them for harmonization and analysis.
path
(str or list): Path to the CSV file or a list of paths for multiple files.site_id
(int or str): Identifier for the data acquisition site.patient_identifier
(str, optional): Column name for patient IDs. Defaults to"participant_id"
.cat_features_to_encode
(list, optional): List of categorical features to encode. Defaults toNone
.ICV
(bool, optional): Whether to add Intracranial Volume (ICV) related features. Defaults toFalse
.decade
(bool, optional): Whether to add decade-related features based on age. Defaults toFalse
.features_to_drop
(list, optional): List of features to drop during preprocessing. Defaults to["m0", "id"]
.features_to_bin
(list, optional): List of features to bin. Defaults toNone
.binning_method
(str, optional): Binning method to use;"equal_width"
or"equal_frequency"
. Defaults to"equal_width"
.num_bins
(int, optional): Number of bins for binning. Defaults to10
.bin_labels
(list, optional): Labels for bins. Defaults toNone
.
from cvasl.dataset import MRIdataset, encode_cat_features
# Initialize datasets from different sites
edis = MRIdataset(path='../data/EDIS_input.csv', site_id=0, decade=True, ICV=True,
patient_identifier='participant_id', features_to_drop=["m0", "id"])
helius = MRIdataset(path='../data/HELIUS_input.csv', site_id=1, decade=True, ICV=True,
patient_identifier='participant_id', features_to_drop=["m0", "id"])
# For datasets spanning multiple files
topmri = MRIdataset(path=['../data/TOP_input.csv','../data/StrokeMRI_input.csv'],
site_id=3, decade=True, ICV=True,
patient_identifier='participant_id', features_to_drop=["m0", "id"])
# Preprocess datasets
datasets = [edis, helius, topmri]
[d.preprocess() for d in datasets]
# Encode categorical features across datasets
features_to_map = ['readout', 'labelling', 'sex']
datasets = encode_cat_features(datasets, features_to_map)
cvasl offers multiple harmonization methods to reduce site-specific variance in MRI data. You can access these through the harmonizer_cli.py
command-line tool.
General command structure:
python harmonizer_cli.py --dataset_paths <dataset_paths> --site_ids <site_ids> --method <harmonization_method> [method_specific_options] [dataset_options]
Parameters:
--dataset_paths
: Comma-separated paths to dataset CSV files. For datasets with multiple files, use semicolons within a dataset and commas between datasets (e.g.,path1,path2,"path3;path4",path5
).--site_ids
: Comma-separated site IDs corresponding to each dataset.--method
: Harmonization method to use (options listed below).[method_specific_options]
: Options specific to the chosen harmonization method.[dataset_options]
: Options for dataset loading and preprocessing.
- Method Class:
NeuroHarmonize
- CLI method value:
neuroharmonize
- Method-specific Options:
--nh_features_to_harmonize
: Features to harmonize (comma-separated)--nh_covariates
: Covariates (comma-separated)--nh_smooth_terms
: Smooth terms (comma-separated, optional)--nh_site_indicator
: Site indicator column name--nh_empirical_bayes
: Use empirical Bayes (True/False)
Example:
python harmonizer_cli.py --dataset_paths ../data/EDIS_input.csv,../data/HELIUS_input.csv --site_ids 0,1 --method neuroharmonize --patient_identifier participant_id --features_to_drop m0,id --features_to_map readout,labelling,sex --decade True --icv True --nh_features_to_harmonize aca_b_cov,mca_b_cov,pca_b_cov,totalgm_b_cov --nh_covariates age,sex,icv,site --nh_site_indicator site
- Method Class:
Covbat
- CLI method value:
covbat
- Method-specific Options:
--cb_features_to_harmonize
: Features to harmonize (comma-separated)--cb_covariates
: Covariates (comma-separated)--cb_site_indicator
: Site indicator column name--cb_patient_identifier
: Patient identifier column name--cb_numerical_covariates
: Numerical covariates (comma-separated)--cb_empirical_bayes
: Use empirical Bayes (True/False)
Example:
python harmonizer_cli.py --dataset_paths ../data/EDIS_input.csv,../data/HELIUS_input.csv --site_ids 0,1 --method covbat --patient_identifier participant_id --features_to_drop m0,id --features_to_map readout,labelling,sex --decade True --icv True --cb_features_to_harmonize participant_id,site,age,sex,site,aca_b_cov,mca_b_cov,pca_b_cov,totalgm_b_cov --cb_covariates age,sex --cb_numerical_covariates age --cb_site_indicator site
- Method Class:
NeuroCombat
- CLI method value:
neurocombat
- Method-specific Options:
--nc_features_to_harmonize
: Features to harmonize (comma-separated)--nc_discrete_covariates
: Discrete covariates (comma-separated)--nc_continuous_covariates
: Continuous covariates (comma-separated)--nc_site_indicator
: Site indicator column name--nc_patient_identifier
: Patient identifier column name--nc_empirical_bayes
: Use empirical Bayes (True/False)--nc_mean_only
: Mean-only adjustment (True/False)--nc_parametric
: Parametric adjustment (True/False)
Example:
python harmonizer_cli.py --dataset_paths ../data/EDIS_input.csv,../data/HELIUS_input.csv --site_ids 0,1 --method neurocombat --patient_identifier participant_id --features_to_drop m0,id --features_to_map readout,labelling,sex --decade True --icv True --nc_features_to_harmonize ACA_B_CoV,MCA_B_CoV,PCA_B_CoV,TotalGM_B_CoV --nc_discrete_covariates sex --nc_continuous_covariates age --nc_site_indicator site
- Method Class:
NestedComBat
- CLI method value:
nestedcombat
- Method-specific Options:
--nest_features_to_harmonize
: Features to harmonize (comma-separated)--nest_batch_list_harmonisations
: Batch variables for nested ComBat (comma-separated)--nest_site_indicator
: Site indicator column name--nest_discrete_covariates
: Discrete covariates (comma-separated)--nest_continuous_covariates
: Continuous covariates (comma-separated)--nest_intermediate_results_path
: Path for intermediate results--nest_patient_identifier
: Patient identifier column name--nest_return_extended
: Return extended outputs (True/False)--nest_use_gmm
: Use Gaussian Mixture Model (True/False)
Example:
python harmonizer_cli.py --dataset_paths ../data/EDIS_input.csv,../data/HELIUS_input.csv --site_ids 0,1 --method nestedcombat --patient_identifier participant_id --features_to_drop m0,id --features_to_map readout,labelling,sex --decade True --icv True --nest_features_to_harmonize ACA_B_CoV,MCA_B_CoV,PCA_B_CoV,TotalGM_B_CoV --nest_batch_list_harmonisations readout,ld,pld --nest_site_indicator site --nest_discrete_covariates sex --nest_continuous_covariates age --nest_use_gmm False
- Method Class:
CombatPlusPlus
- CLI method value:
combat++
- Method-specific Options:
--compp_features_to_harmonize
: Features to harmonize (comma-separated)--compp_discrete_covariates
: Discrete covariates (comma-separated)--compp_continuous_covariates
: Continuous covariates (comma-separated)--compp_discrete_covariates_to_remove
: Discrete covariates to remove (comma-separated)--compp_continuous_covariates_to_remove
: Continuous covariates to remove (comma-separated)--compp_site_indicator
: Site indicator column name--compp_patient_identifier
: Patient identifier column name--compp_intermediate_results_path
: Path for intermediate results
Example:
python harmonizer_cli.py --dataset_paths ../data/EDIS_input.csv,../data/HELIUS_input.csv --site_ids 0,1 --method combat++ --patient_identifier participant_id --features_to_drop m0,id --features_to_map readout,labelling,sex --decade True --icv True --compp_features_to_harmonize aca_b_cov,mca_b_cov,pca_b_cov,totalgm_b_cov --compp_discrete_covariates sex --compp_continuous_covariates age --compp_discrete_covariates_to_remove labelling --compp_continuous_covariates_to_remove ld --compp_site_indicator site
- Method Class:
ComscanNeuroCombat
- CLI method value:
comscanneuroharmonize
- Method-specific Options:
--csnh_features_to_harmonize
: Features to harmonize (comma-separated)--csnh_discrete_covariates
: Discrete covariates (comma-separated)--csnh_continuous_covariates
: Continuous covariates (comma-separated)--csnh_site_indicator
: Site indicator column name
Example:
python harmonizer_cli.py --dataset_paths ../data/EDIS_input.csv,../data/HELIUS_input.csv --site_ids 0,1 --method comscanneuroharmonize --patient_identifier participant_id --features_to_drop m0,id --features_to_map readout,labelling,sex --decade True --icv True --csnh_features_to_harmonize aca_b_cov,mca_b_cov,pca_b_cov,totalgm_b_cov --csnh_discrete_covariates sex --csnh_continuous_covariates decade --csnh_site_indicator site
- Method Class:
AutoCombat
- CLI method value:
autocombat
- Method-specific Options:
--ac_features_to_harmonize
: Features to harmonize (comma-separated)--ac_data_subset
: Data subset features (comma-separated)--ac_discrete_covariates
: Discrete covariates (comma-separated)--ac_continuous_covariates
: Continuous covariates (comma-separated)--ac_site_indicator
: Site indicator column name(s), comma-separated if multiple--ac_discrete_cluster_features
: Discrete cluster features (comma-separated)--ac_continuous_cluster_features
: Continuous cluster features (comma-separated)--ac_metric
: Metric for cluster optimization (distortion
,silhouette
,calinski_harabasz
)--ac_features_reduction
: Feature reduction method (pca
,umap
,None
)--ac_feature_reduction_dimensions
: Feature reduction dimensions (int)--ac_empirical_bayes
: Use empirical Bayes (True/False)
Example:
python harmonizer_cli.py --dataset_paths ../data/EDIS_input.csv,../data/HELIUS_input.csv --site_ids 0,1 --method autocombat --patient_identifier participant_id --features_to_drop m0,id --features_to_map readout,labelling,sex --decade True --icv True --ac_features_to_harmonize aca_b_cov,mca_b_cov,pca_b_cov,totalgm_b_cov --ac_data_subset aca_b_cov,mca_b_cov,pca_b_cov,totalgm_b_cov,site,readout,labelling,pld,ld,sex,age --ac_discrete_covariates sex --ac_continuous_covariates age --ac_site_indicator site,readout,pld,ld --ac_discrete_cluster_features site,readout --ac_continuous_cluster_features pld,ld
- Method Class:
RELIEF
- CLI method value:
relief
- Method-specific Options:
--relief_features_to_harmonize
: Features to harmonize (comma-separated)--relief_covariates
: Covariates (comma-separated)--relief_patient_identifier
: Patient identifier column name--relief_intermediate_results_path
: Path for intermediate results
Example:
python harmonizer_cli.py --dataset_paths ../data/EDIS_input.csv,../data/HELIUS_input.csv --site_ids 0,1 --method relief --patient_identifier participant_id --features_to_drop m0,id --features_to_map readout,labelling,sex --decade True --icv True --relief_features_to_harmonize aca_b_cov,mca_b_cov,pca_b_cov,totalgm_b_cov --relief_covariates sex,age --relief_patient_identifier participant_id
- Multiple Files: For datasets with multiple paths (like TOPMRI in examples), use semicolons (
;
) to separate paths within a dataset entry, and commas (,
) to separate different datasets. - Path Adjustment: Replace example paths with your actual data file paths.
- Parameter Tuning: Adjust harmonization parameters based on your dataset and harmonization goals.
- R Requirement: Methods like RELIEF and Combat++ require R to be installed with necessary packages (denoiseR, RcppCNPy, matrixStats).
- Output Files: Harmonized datasets are saved as new CSV files in the same directory as input datasets, with filenames appended with
output_<harmonization_method>
.
The prediction_cli.py
script provides a command-line interface for brain age prediction based on MRI data.
python prediction_cli.py --training_dataset_paths <training_paths> --training_site_ids <training_site_ids> --testing_dataset_paths <testing_paths> --testing_site_ids <testing_site_ids> --model <model_type> [model_specific_options] [dataset_options]
Parameters:
--training_dataset_paths
: Comma-separated paths to training dataset CSV files--training_site_ids
: Comma-separated site IDs for training datasets--testing_dataset_paths
: Comma-separated paths to testing dataset CSV files--testing_site_ids
: Comma-separated site IDs for testing datasets--model
: Type of prediction model to use[model_specific_options]
: Options specific to the chosen model[dataset_options]
: Options for dataset loading and preprocessing
python prediction_cli.py --training_dataset_paths ../data/EDIS_input.csv --training_site_ids 0 --testing_dataset_paths ../data/HELIUS_input.csv --testing_site_ids 1 --model linear_regression --features ACA_B_CoV,MCA_B_CoV,PCA_B_CoV,TotalGM_B_CoV --target age --output_file brain_age_predictions.csv
python prediction_cli.py --training_dataset_paths ../data/EDIS_input.csv --training_site_ids 0 --testing_dataset_paths ../data/HELIUS_input.csv --testing_site_ids 1 --model random_forest --features ACA_B_CoV,MCA_B_CoV,PCA_B_CoV,TotalGM_B_CoV --target age --n_estimators 100 --max_depth 10 --output_file brain_age_predictions.csv
python prediction_cli.py --training_dataset_paths ../data/EDIS_input.csv --training_site_ids 0 --testing_dataset_paths ../data/HELIUS_input.csv --testing_site_ids 1 --model svr --features ACA_B_CoV,MCA_B_CoV,PCA_B_CoV,TotalGM_B_CoV --target age --kernel rbf --C 1.0 --epsilon 0.1 --output_file brain_age_predictions.csv
python prediction_cli.py --training_dataset_paths ../data/EDIS_input.csv --training_site_ids 0 --testing_dataset_paths ../data/HELIUS_input.csv --testing_site_ids 1 --model neural_network --features ACA_B_CoV,MCA_B_CoV,PCA_B_CoV,TotalGM_B_CoV --target age --hidden_layer_sizes 100,50 --activation relu --solver adam --output_file brain_age_predictions.csv
Licensed under the terms specified in the LICENSE file.