BioASQ Task A

Getting the data

Register for BioASQ here: http://participants-area.bioasq.org/accounts/register/
Tick task A from the list of Tasks at the bottom of the registration page.
Navigate to the datasets: http://participants-area.bioasq.org/datasets/
Download Training v.2021 - the txt version which gives allMeSH_2021.zip.
Unzip it and place the file in the experiments/bioasq/data folder.

(Re)Creating the dataset

In what follows, we have provided scripts to recreate the dataset. Each abstract has a unique key, its pmid. We recreate the splits we created for the paper by filtering the downloaded dataset according to the pmids. For more information on how we created the original splits - e.g. got the pmids, see the appendix in the paper and the scripts under bin/preprocess. The script below assumes you have placed allMeSH_2021.json in the experiments/bioasq/data folder.

pip install awscli
mkdir -p data/subsets-v-20000
aws s3 cp s3://sigmoid-bottleneck/bioasq/data/train-100k-part-1.csv --no-sign-request  data/subsets-v-20000
aws s3 cp s3://sigmoid-bottleneck/bioasq/data/valid-5k.csv --no-sign-request  data/subsets-v-20000
aws s3 cp s3://sigmoid-bottleneck/bioasq/data/test-10k.csv --no-sign-request  data/subsets-v-20000
aws s3 cp s3://sigmoid-bottleneck/bioasq/data/vocab.txt --no-sign-request  data/subsets-v-20000
python construct_dataset.py --data allMeSH_2021.json

The script should take approximately 5 minutes to run and it will create 3 json files, so the directory structure should look like:

.
├── allMeSH_2021.json
├── construct_dataset.py
└── subsets-v-20000
    ├── test-10k.csv
    ├── train-100k-part-1.csv
    ├── valid-5k.csv
    ├── test-10k.json
    ├── train-100k-part-1.json
    ├── valid-5k.json
    └── vocab.txt

Running the experiments

You can use the scripts run-bsl.sh and run-dft.sh to train the BSL and DFT models, correspondingly. Set the SEED environment variable to change the random state.

export SEED=0
mkdir -p logs
./run-bsl.sh
./run-dft.sh

Each experiment is written to the experiment folder.

Verifying results

# Verification via the LP is parallelisable (the larger you can afford to make NUM_PROC, the better)
export MLBL_NUM_PROC=10
./eval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BioASQ Task A

Getting the data

(Re)Creating the dataset

Running the experiments

Verifying results

Files

README.md

Latest commit

History

README.md

File metadata and controls

BioASQ Task A

Getting the data

(Re)Creating the dataset

Running the experiments

Verifying results