Skip to content

osirrc/terrier-docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSIRRC Docker Image for Terrier

Build Status Docker Cloud Build Status DOI

Arthur Câmara and Craig Macdonald

This is the docker image for the Terrier toolkit (v5.2) conforming to the OSIRRC jig for the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019. This image is available on Docker Hub

  • Supported test collections: robust04, gov2, cw09b, cw12b (web), core18 (newswire)
  • Supported hooks: init, index, train, search
  • Check the most current release version at the Releases page, and replace vx.y.z with the desired release version or with current for the latest release

Quick Start

The following jig command can be used to index TREC disks 4/5 for robust04:

python run.py prepare --repo osirrc2019/terrier --tag vx.y.z --collections robust04=/tmp/disk45/=trectext

The following jig command can be used to perform a retrieval run on the collection with the robust04 test collection, using BM25 as ranker:

python run.py search  \
    --repo osirrc2019/terrier \
    --tag vx.y.z \
    --collection robust04 \
    --topic topics/topics.robust04.txt \
    --qrels qrels/qrels.robust04.txt\
    --output /tmp/runs

Retrieval Methods:

This image supports the following weighting models: BM25 (bm25), PL2 (pl2) and DPH (dph).

Additionally, it supports Query Expansion and Proximity-based (DFRD) search, by including qe, prox or prox_qe to the --opts config argument: --opts config=<retrieval_model>_<extra>:

(BM25)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=bm25

(BM25 + query expansion)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=bm25_qe

(BM25 + Proximity)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=bm25_prox

(BM25 + Proximity + query expansion)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=bm25_prox_qe

(PL2)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=pl2

(PL2 + query expansion)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=pl2_qe

(PL2 + Proximity)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=pl2_prox

(PL2 + Proximity + query expansion)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=pl2_prox_qe

(DPH)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=dph

(DPH + query expansion)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=dph_qe

(DPH + Proximity)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=dph_prox

(DPH + Proximity + query expansion)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=dph_prox_qe

NOTE: for running DFRD (Proximity-based model), the index must be build using the --opts=block.indexing=true param

Learning to Rank Runs

Learning-to-rank will typically require that the index has more information, e.g. fields or blocks.

Indexing:

python run.py prepare     --repo osirrc2019/terrier --tag vx.y.z   --collections robust04=/tmp/disk45/=trectext --opts "FieldTags.process=HEADLINE"

Training:

You need to specify the features to be used by Terrier - see http://terrier.org/docs/v5.1/learning.html for more information about Terrier feature definitions.

python run.py train  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt    --test_split $PWD/sample_training_validation_query_ids/robust04_test.txt  --validation_split $PWD/sample_training_validation_query_ids/robust04_validation.txt --model_folder /tmp/runs --opts features="SAMPLE;WMODEL:SingleFieldModel(BM25,0);QI:SingleFieldModel(Dl,0)"

Retrieval:

You will need to specify the bm25_ltr_jforest configuration.

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=bm25_ltr_jforest

Expected Results

robust04

MAP BM25 +QE +Prox +Prox + QE DPH + QE +Prox +Prox +QE PL2 +QE
TREC 2004 Robust Track Topics 0.2363 0.2762 0.2404 0.2781 0.2479 0.2821 0.2501 0.2869 0.2241 0.2538

core18

MAP BM25 +QE +Prox +Prox + QE DPH + QE +Prox +Prox +QE PL2 +QE
TREC 2018 Common Core Track Topics 0.2326 0.2975 0.2369 0.2960 0.2427 0.3055 0.2428 0.3035 0.2225 0.2787

GOV2

MAP BM25 +QE +Prox +Prox + QE DPH + QE +Prox +Prox +QE PL2 +QE
TREC 2004 Terabyte Track: Topics 701-750 0.2461 0.2621 0.2537 0.2715 0.2804 0.3120 0.2834 0.3064 0.2334 0.2478
TREC 2005 Terabyte Track: Topics 751-800 0.3081 0.3506 0.3126 0.3507 0.3311 0.3754 0.3255 0.3095 0.2884 0.3160
TREC 2006 Terabyte Track: Topics 801-850 0.2629 0.3118 0.2724 0.3085 0.2917 0.3494 0.2904 0.3288 0.2363 0.2739

Interact hooks

This image also supports the interact hooks from the OSIRRC JIG. After initializing the image (with python run.py prepare):

python run.py interact --repo terrier --tag vx.y.z

The following (internal) ports will be made available:

  • Port 1980: Search interface: A user-friendly HTML search interface, reachable using a web browser. (more information can be found at the Terrier Documentation)

  • Port 1981: REST API: This port provides a REST API for Terrier. You can either directly submit requests to this port or use another Terrier instance as a client, like:

$ /bin/terrier interactive -I http://dockerhost:1981/
terrier query> information retrieval end:5
  Displaying 1-6 results
0 FBIS4-20699 10.268754805435458
1 FBIS4-20702 9.768490153503198
2 FR941027-2-00046 9.491347902606723
3 FBIS4-20701 9.456022500508775
4 FBIS3-24510 9.31403481019499
5 FBIS4-20700 8.79234249484928
  • Port 1982: Terrier-Spark Jupyter Notebook: This port initialises a Scala Jupyter Notebook that is able to interact with the Terrier Index previously initialized. By acessing the notebook, a sample notebook (simpleRun.ipynb) is available, with minimal working examples. For more information, check Macdonald, 2018 and Macdonald et al., 2018.

NOTE Currently, the JIG may redirect these ports to diverse ones in the host machine. This is due to the way that Docker deals with port assignment. Therefore, you should check the correct port assignment by running:

$> docker ps

and check the right port assignment under the PORTS column.

Reviews