Image-based Deepfake Speech Detection

Beware! All of the code within this repository is for experimental purposes, thus it is not guaranteed to run anywhere you want it to.

Official implementation of the following paper: Anton Firc, Kamil Malinka, and Petr Hanáček. 2024. Deepfake Speech Detection: A Spectrogram Analysis. In Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing (SAC '24). Association for Computing Machinery, New York, NY, USA, 1312–1320. https://doi.org/10.1145/3605098.3635911

Requirements

TensorFlow 2.3.0
cuDNN version 7.6
CUDA version 10.1

Running scripts

python3 eval_model.py -i .<dataset_path> -m _max_10
- dataset_path = directory where mel and other data directories are located
python3 train_model.py -i <dataset_path> -n <run_name>
- dataset_path = directory where real and fake directories are located, each directory contains spectrograms
- run_name = final model name, used to load/save model weights (saves to, and loads from ./models directory)

Trained models

Models are available to download here: https://nextcloud.fit.vutbr.cz/s/8yRcMqxH3nYB6EC

The model name refers to the used pooling layer and setting: feature_pooling-layer_pooling-settting.h5

Dataset of modified speech

The dataset is a modification of the FoR (for-2-seconds) validation set. For any information on the original dataset please visit:

The dataset of modified speech is available here: https://nextcloud.fit.vutbr.cz/s/8yRcMqxH3nYB6EC

Metacentrum modules

module add anaconda3-2019.10 module add ffmpeg module add cuda-10.1 module add cudnn-7.6.4-cuda10.0

source activate IDSD

Citation

@inproceedings{10.1145/3605098.3635911,
author = {Firc, Anton and Malinka, Kamil and Han\'{a}\v{c}ek, Petr},
title = {Deepfake Speech Detection: A Spectrogram Analysis},
year = {2024},
isbn = {9798400702433},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3605098.3635911},
doi = {10.1145/3605098.3635911},
abstract = {The current voice biometric systems have no natural mechanics to defend against deepfake spoofing attacks. Thus, supporting these systems with a deepfake detection solution is necessary. One of the latest approaches to deepfake speech detection is representing speech as a spectrogram and using it as an input for a deep neural network. This work thus analyzes the feasibility of different spectrograms for deepfake speech detection. We compare types of them regarding their performance, hardware requirements, and speed. We show the majority of the spectrograms are feasible for deepfake detection. However, there is no general, correct answer to selecting the best spectrogram. As we demonstrate, different spectrograms are suitable for different needs.},
booktitle = {Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing},
pages = {1312–1320},
numpages = {9},
keywords = {deepfake, speech, image-based, deepfake detection, spectrogram},
location = {Avila, Spain},
series = {SAC '24}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
README.md		README.md
data_preprocessing.py		data_preprocessing.py
eval_model.py		eval_model.py
eval_model_loop.py		eval_model_loop.py
model_average.py		model_average.py
model_max.py		model_max.py
modify_recordings.py		modify_recordings.py
parse_asvspoof.py		parse_asvspoof.py
requirements-colab.txt		requirements-colab.txt
requirements.txt		requirements.txt
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image-based Deepfake Speech Detection

Requirements

Running scripts

Trained models

Dataset of modified speech

Metacentrum modules

Citation

About

Releases

Packages

Contributors 2

Languages

AntonFirc/IDSD

Folders and files

Latest commit

History

Repository files navigation

Image-based Deepfake Speech Detection

Requirements

Running scripts

Trained models

Dataset of modified speech

Metacentrum modules

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages