Skip to content

Sicily project repository for identifying cagedbirds in wildlife trade

License

Notifications You must be signed in to change notification settings

CV4EcologySchool/cv4e_cagedbird_ID

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-class, single target classification of birds sold in wildlife marketplaces

  • as a general thing - if you are altering or moving any data it is better to make a copy first, so you are making sure that you are not altering the original data and file/folder structure
  • Structure of data and labels
    • images of bird species from different sensors are stored in folders under data_root/high, there is one bird per image (single target)
    • these are named species_name_index_source.fileextension
    • In the overall annotations_new.json file is an entry for every image, its annotation (which is 1-1, i.e. there is one annotation per image) and the overall class, or category that it belongs to

Folder structure

  • preprocessing folder
    • annotations_backup: copies of the annotations files that are generated by the scripts in the preprocessing folder - when running the code, these are stored in the ./data folder, as they are read in along with the imagePaths
    • train.json is a random split (though using the same random seed for reproducibility) which contains 80% of the original data, across all initial 29 classes (training_18_08.json is an older version of the split, which contained some bad data and one class, the coal tit, which has since been completely removed since the crops were corrupted)
    • val.json is 20% of the original data, and generated at the same time as train.json (val_18_08.json)
    • upsampling json - is an attempt to balance the classes. In this file, it contains balanced data across all 29 classes

delete_backup_folders.py

  • I copied the data initially into folders, where folder_name/random_images was the subset for each species at the beginning that I knew were contributed by different labellers and thus using different sensors (or these were images collected online)

  • delete_files.py - this was used to delete files with _random in the file name when these were incorrectly copied to the wrong folder at the beginning - this code could be used to remove images from sources that might not be relevant (i.e. from a certain sensor if hte image is labelled with species_name_sensor_source.fileextension

  • visualisation_training.py and visualisation_validation.py is what prints the distribution of data across classes in both the training and validation sets, by loading the training and validation .jsons for visualisation

  • configs folder contains the default config .yaml which has most of the hyperparameters and CometML logging details

  • in the 'all_model_states' = these are currently in the .gitignore file, but there is now a folder which is renamed with whatever experiment name is in the default config file, so you now have individual folders per experiment, with a config file also named with the same experiment name

  • scripts folder not really being used by me

  • ct_classifier

  • contains most if not all of the training code and dataset structure and information

  • the class_mapping.pickle is a list of all the class_names, it is generated in the json_generate_80_20.py file from the preprocesing folder

  • dataset.py file: contains the FixedHeightResize class which does the padding on the model

  • also performs the other transformations on the images like the augmentations

  • testing metrics

  • evaluation metrics on the val set

  • histogram scores for upsampling, plotting the average precision on the val data

    • plot the confusion matrix
    • plot the species that were high confusion, so any overall score of less than 0.7, we showed a brief sample to see what species were getting confused with each other
  • the train.py file contains the definitions for the dataloaders, loading the model (whether starting from model epoch 0 or retraining from an already saved model). You can either load the training data in the .json or upsampling data (in the dl_train dataloader)

    • the training loop is defined as a while loop
    • it defines the statistics you want to measure
    • it also initialises the CometML experiments

About

Sicily project repository for identifying cagedbirds in wildlife trade

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published