Due to their size, the real data is not included in github. It can at present be downloaded from the below links:
AC587 (a small dataset, < 100 samples)
AA041 (a larger dataset, ~ 1000 samples)
England201618 (a >7000 sample dataset described in the publication M. tuberculosis microvariation is common and is associated with transmission: analysis of three years prospective universal sequencing in England)
Download the zip files, unzip them, and place them in /demos.
The input for the relatedness server are fasta files derived from reference mapped, consensus base called data. The sample data provided here is from the Public Health England bioinformatics pipeline used for TB processing, which is freely available at https://github.com/oxfordmmm/CompassCompact.
a collection of 43 mapped samples containing related TB isolates, as well as unrelated controls TB samples. The latter are added before the 43 related samples, as they are used by the server to estimate expected N frequencies in real data. To run the demo:
- make sure mongodb is running
- from the src directory
-- start the server
python findNeighbour3-server.py ../demos/AC587/config/config.json
-- run the software adding samples to the server
python demo_ac587.py
a larger collection of ~ 1000 mapped samples containing related TB data. To run the demo:
- make sure mongodb is running
- from the src directory
-- start the server
python findNeighbour3-server.py ../demos/AA041/config/config.json
-- run the software adding samples to the server
python demo_aa041.py
a collection of over 7,000 TB samples from England 2016 to 2018. This data set includes
i) .csv files containing the positions of where different bases map with high quality to the same site across each genome
ii) fasta files in which these positions are marked with IUPAC codes.
These fasta files can be loaded by scripts similar to those above.