This repository contains the bioinformatics pipelines that were used for the testing of accuracy, performance, and figure generation for the manuscript submission entitled "Taxonomic Classification Methods for Animal Barcode Datasets: Benchmarking for Accuracy and Speed".
Additionally the fold sequence datasets (in compressed FASTA files) generated during cross-validation can be found below and are listed by taxonomic group.
The file naming convention for each individual FASTA file uses the name of the taxonomic group followed by the fold numbering used during cross-validation and then either F, G or S to indicate family, genus or species level. ex: fold1_3mamS refering to fold sequence datasets 1 and 2 for Mammalia at species level or fold3gasG refering to only fold sequence dataset 3 for Gastropoda at genus level.
- Actinopterygii (ray-finned fishes): ActinopterygiiFoldSequenceData.zip
- Amphibia: AmphibiaFoldSequenceData.zip
- Anthophila (bees): AnthophilaFoldSequenceData.zip
- Araneae (spiders): AraneaeFoldSequenceData.zip
- Aves (birds): AvesFoldSequenceData.zip
- Diptera (flies): DipteraFoldSequenceData.zip
- Gastropoda (snails, slugs): GastropodaFoldSequenceData.zip
- Hymenoptera (bees, wasps, ants): HymenopteraFoldSequenceData.zip
- Lepidoptera (moths, butterflies): LepidopteraFoldSequenceData.zip
- Mammalia: MammaliaFoldSequenceData.zip