This repository contains our code and run submissions for task 1 & 2 of CENTRE@CLEF19. Our replicated/reproduced runs WCRobust04
and WCRobust0405
are based on this submission by Grossman and Cormack to TREC Common Core 2017.
-
Install requirements with pip:
pip install -r requirements.txt
-
Run
python -m nltk.downloader stopwords
-
Clone trec_eval, compile it and move the compiled file to the top-level of this repository.
-
Edit config_template.py. Make sure all relative paths are accessible from the top-level of the repository.
-
All path which have to be added are annotated with a TODO-comment
-
Make sure to set all paths for the compressed corpora files, in particular:
- Washington Post: path to json lines file of Washington Post corpus
- New York Times: path to compressed files from New York Times corpus (the following folder should be in this directory: data/)
- TREC45: path to compressed files from TREC disks 4 & 5 (minus cr) (the following folders should be in this directory: fbis/, fr94/, ft/, latimes/)
- AQUAINT: path to compressed files from AQUAINT corpus (the following folders should be in this directory: apw/, nyt/, xie/)
-
Optionally pre-configured paths can be adapted.
-
Rename config_template.py to config.py
-
From the top-level of this repository run
python -m <task>.<system-name>.main
for producing the respective run. For example, if you want to produce the run for task 1 executepython -m replicability.wcrobust04.main
-
If data preparation has already been done in a previous step, it can be omitted by setting the parameter
data_prep
toFalse
. -
If the
complete_run
entry in theconfig.py
-file remains unedited, the run will be found inartifact/runs/
Results from trec_eval are shown below.
You can find a copy of the article here. If you find the work helpful, please cite it as follows.
@inproceedings{DBLP:conf/clef/BreuerS19,
author = {Timo Breuer and
Philipp Schaer},
editor = {Linda Cappellato and
Nicola Ferro and
David E. Losada and
Henning M{\"{u}}ller},
title = {Replicability and Reproducibility of Automatic Routing Runs},
booktitle = {Working Notes of {CLEF} 2019 - Conference and Labs of the Evaluation
Forum, Lugano, Switzerland, September 9-12, 2019},
series = {{CEUR} Workshop Proceedings},
volume = {2380},
publisher = {CEUR-WS.org},
year = {2019},
url = {https://ceur-ws.org/Vol-2380/paper\_84.pdf},
timestamp = {Thu, 01 Jun 2023 10:09:02 +0200},
biburl = {https://dblp.org/rec/conf/clef/BreuerS19.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}