This is an experimental project extending on Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Base Bias in NLP that tests the hypothesis that using the output of Schick's de-biasing procedure as labels and fine-tuning the model directly will lead to similar or reduced toxicity scores according to Perspective API.
To train with your own data. Put the data in model-input/prompts+continuations/
and follow the corresponding format. Then run python3 ./finetune_gpt2_with_logits
from transformers import AutoModel
'''
0 = fine tuned on 1k examples
1 = fine tuned on 5k examples
2 = fine tuned on 10k examples
3 = fine tuned on 25k examples
'''
model_idx = 0 # [1, 2, 3]
model = AutoModel.from_pretrained(f"newtonkwan/gpt2-xl-ft-{model_idx}")
Real Toxicity Dataset - A dataset of 100k sentence snippets from the web for researchers to further address the risk of neural toxic degeneration in models (Gehman 2020)