Skip to content

Code and data for EMNLP2024 paper: WikiBias as an Extrapolation Corpus for Bias Detection

Notifications You must be signed in to change notification settings

KarlaDSJ/WikiBias-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ab0d9b1 · Nov 10, 2024

History

5 Commits
Nov 10, 2024
Nov 10, 2024
Nov 10, 2024
Nov 10, 2024
Nov 10, 2024

Repository files navigation

WikiBias as an Extrapolation Corpus for Bias Detection

Code and data for EMNLP2024 paper

This paper explores whether it is possible to train a machine learning model using Wikipedia data to detect subjectivity in sentences and generalize effectively to other domains.

Unzip the data.zip and lexicon.zip files before running the notebooks.

Cite as

@inproceedings{salas-jimenez-etal-2024-wikibias,
    title = "{W}iki{B}ias as an Extrapolation Corpus for Bias Detection",
    author = "Salas-Jimenez, K.  and
      Lopez-Ponce, Francisco Fernando  and
      Ojeda-Trueba, Sergio-Luis  and
      Bel-Enguix, Gemma",
    editor = "Lucie-Aim{\'e}e, Lucie  and
      Fan, Angela  and
      Gwadabe, Tajuddeen  and
      Johnson, Isaac  and
      Petroni, Fabio  and
      van Strien, Daniel",
    booktitle = "Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.wikinlp-1.10",
    pages = "46--52",
    abstract = "This paper explores whether it is possible to train a machine learning model using Wikipedia data to detect subjectivity in sentences and generalize effectively to other domains. To achieve this, we performed experiments with the WikiBias corpus, the BABE corpus, and the CheckThat! Dataset. Various classical models for ML were tested, including Logistic Regression, SVC, and SVR, including characteristics such as Sentence Transformers similarity, probabilistic sentiment measures, and biased lexicons. Pre-trained models like DistilRoBERTa, as well as large language models like Gemma and GPT-4, were also tested for the same classification task.",
}

About

Code and data for EMNLP2024 paper: WikiBias as an Extrapolation Corpus for Bias Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published