GitHub - CV4EcologySchool/cv4e_cagedbird_ID: Sicily project repository for identifying cagedbirds in wildlife trade

Multi-class, single target classification of birds sold in wildlife marketplaces

as a general thing - if you are altering or moving any data it is better to make a copy first, so you are making sure that you are not altering the original data and file/folder structure

Structure of data and labels
- images of bird species from different sensors are stored in folders under data_root/high, there is one bird per image (single target)
- these are named species_name_index_source.fileextension
- In the overall annotations_new.json file is an entry for every image, its annotation (which is 1-1, i.e. there is one annotation per image) and the overall class, or category that it belongs to

Folder structure

preprocessing folder
- annotations_backup: copies of the annotations files that are generated by the scripts in the preprocessing folder - when running the code, these are stored in the ./data folder, as they are read in along with the imagePaths
- train.json is a random split (though using the same random seed for reproducibility) which contains 80% of the original data, across all initial 29 classes (training_18_08.json is an older version of the split, which contained some bad data and one class, the coal tit, which has since been completely removed since the crops were corrupted)
- val.json is 20% of the original data, and generated at the same time as train.json (val_18_08.json)
- upsampling json - is an attempt to balance the classes. In this file, it contains balanced data across all 29 classes

delete_backup_folders.py

I copied the data initially into folders, where folder_name/random_images was the subset for each species at the beginning that I knew were contributed by different labellers and thus using different sensors (or these were images collected online)
delete_files.py - this was used to delete files with _random in the file name when these were incorrectly copied to the wrong folder at the beginning - this code could be used to remove images from sources that might not be relevant (i.e. from a certain sensor if hte image is labelled with species_name_sensor_source.fileextension
visualisation_training.py and visualisation_validation.py is what prints the distribution of data across classes in both the training and validation sets, by loading the training and validation .jsons for visualisation
configs folder contains the default config .yaml which has most of the hyperparameters and CometML logging details
in the 'all_model_states' = these are currently in the .gitignore file, but there is now a folder which is renamed with whatever experiment name is in the default config file, so you now have individual folders per experiment, with a config file also named with the same experiment name
scripts folder not really being used by me
ct_classifier
contains most if not all of the training code and dataset structure and information
the class_mapping.pickle is a list of all the class_names, it is generated in the json_generate_80_20.py file from the preprocesing folder
dataset.py file: contains the FixedHeightResize class which does the padding on the model
also performs the other transformations on the images like the augmentations
testing metrics
evaluation metrics on the val set
histogram scores for upsampling, plotting the average precision on the val data
- plot the confusion matrix
- plot the species that were high confusion, so any overall score of less than 0.7, we showed a brief sample to see what species were getting confused with each other
the train.py file contains the definitions for the dataloaders, loading the model (whether starting from model epoch 0 or retraining from an already saved model). You can either load the training data in the .json or upsampling data (in the dl_train dataloader)
- the training loop is defined as a while loop
- it defines the statistics you want to measure
- it also initialises the CometML experiments

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
OneDrive		OneDrive
configs		configs
ct_classifier		ct_classifier
docs		docs
preprocessing		preprocessing
screenshots		screenshots
scripts		scripts
shell		shell
species_histograms		species_histograms
summer_2023		summer_2023
test_con2		test_con2
unannotated_test		unannotated_test
.gitignore		.gitignore
17_08_cv4e		17_08_cv4e
Code_Documentation.md		Code_Documentation.md
README.md		README.md
accuracy_f1_comparison.png		accuracy_f1_comparison.png
accuracy_f1_comparison_highres.pdf		accuracy_f1_comparison_highres.pdf
accuracy_f1_comparison_highres.png		accuracy_f1_comparison_highres.png
accuracy_f1_comparison_improved.png		accuracy_f1_comparison_improved.png
accuracy_f1_comparison_upsampled.png		accuracy_f1_comparison_upsampled.png
all_species_info.xlsx		all_species_info.xlsx
average_accuracy_per_species.csv		average_accuracy_per_species.csv
confusion_matrix_full_labels.html		confusion_matrix_full_labels.html
confusion_matrix_normalized_val.html		confusion_matrix_normalized_val.html
confusion_matrix_normalized_val.png		confusion_matrix_normalized_val.png
confusion_matrix_optimized_labels_val.html		confusion_matrix_optimized_labels_val.html
cv4e2.yml		cv4e2.yml
cv4e3_modified.yml		cv4e3_modified.yml
editted_test_pred.xlsx		editted_test_pred.xlsx
environment.yml		environment.yml
environment2.yml		environment2.yml
experiment_key.txt		experiment_key.txt
image_counts_plot.png		image_counts_plot.png
license		license
max_category_info.pkl		max_category_info.pkl
mismatch_visualization.png		mismatch_visualization.png
mismatch_visualization_orig.png		mismatch_visualization_orig.png
mismatch_visualization_upsampled.png		mismatch_visualization_upsampled.png
new_class_mapping.pkl		new_class_mapping.pkl
normalized_confusion_matrix.html		normalized_confusion_matrix.html
output.log		output.log
outputblur.log		outputblur.log
outputep100.log		outputep100.log
outputep25.log		outputep25.log
outputep50.log		outputep50.log
outputep75.log		outputep75.log
outputhorflip.log		outputhorflip.log
outputhorflip05.log		outputhorflip05.log
outputs_best_lr_10.log		outputs_best_lr_10.log
outputs_best_lr_5.log		outputs_best_lr_5.log
outputs_custom_sched.log		outputs_custom_sched.log
outputs_first_unknown.log		outputs_first_unknown.log
outputs_upsampled.log		outputs_upsampled.log
outputsep150.log		outputsep150.log
outputserase.log		outputserase.log
outputserase05.log		outputserase05.log
outputsharp.log		outputsharp.log
outputslr1-2.log		outputslr1-2.log
outputslr1-4.log		outputslr1-4.log
outputsr5e-5.log		outputsr5e-5.log
outputsrota.log		outputsrota.log
outputsrota15.log		outputsrota15.log
outputverflip.log		outputverflip.log
precision_recall_curve.png		precision_recall_curve.png
precision_recall_curve_macro_val.png		precision_recall_curve_macro_val.png
precision_recall_curve_micro_unknown.png		precision_recall_curve_micro_unknown.png
predictions.txt		predictions.txt
preds_test.csv		preds_test.csv
preds_val.csv		preds_val.csv
pres_plots.py		pres_plots.py
pres_plots_log.py		pres_plots_log.py
readme.md		readme.md
requirements.txt		requirements.txt
sample_images.png		sample_images.png
sample_images_test.png		sample_images_test.png
species_accuracy_metrics.csv		species_accuracy_metrics.csv
species_class_metrics.csv		species_class_metrics.csv
species_class_metrics2.csv		species_class_metrics2.csv
species_class_metrics_test.csv		species_class_metrics_test.csv
species_class_metrics_unknown.csv		species_class_metrics_unknown.csv
species_class_metrics_val.csv		species_class_metrics_val.csv
species_information_with_images.pdf		species_information_with_images.pdf
species_list.md		species_list.md
species_list.pdf		species_list.pdf
test_accuracy_and_f1_scores.png		test_accuracy_and_f1_scores.png
test_bluethroat.jpg		test_bluethroat.jpg
test_loader_sample4.png		test_loader_sample4.png
test_loader_sample_orig.png		test_loader_sample_orig.png
test_loader_sample_upsampled.png		test_loader_sample_upsampled.png
test_pred.csv		test_pred.csv
test_pred_top_2_with_confidence.csv		test_pred_top_2_with_confidence.csv
test_pred_top_2_with_confidence2.csv		test_pred_top_2_with_confidence2.csv
test_pred_top_2_with_confidence3.csv		test_pred_top_2_with_confidence3.csv
test_pred_top_2_with_confidence4.csv		test_pred_top_2_with_confidence4.csv
test_pred_top_2_with_confidence5.csv		test_pred_top_2_with_confidence5.csv
test_pred_top_2_with_confidence_orig.csv		test_pred_top_2_with_confidence_orig.csv
test_pred_top_2_with_confidence_orig2.csv		test_pred_top_2_with_confidence_orig2.csv
test_pred_top_2_with_confidence_upsampled.csv		test_pred_top_2_with_confidence_upsampled.csv
test_pred_top_2_with_confidence_upsampled2.csv		test_pred_top_2_with_confidence_upsampled2.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

CV4EcologySchool/cv4e_cagedbird_ID

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages