-
Notifications
You must be signed in to change notification settings - Fork 3
Article Relevance Prediction
This page outlines the primary workflow and points to the required references to develop the Article Relevance Prediction model.
The article relevance prediction component requires a list of journals that are relevant to Neotoma. This dataset used to train and develop the model is available for download HERE. Download all files and extract the contents into MetaExtractor/data/article-relevance/raw/
.
The prediction pipeline requires the trained model object. The model is available HERE. Download the model file and put the .joblib file in MetaExtractor/models/article-relevance/
.
In order to train the model to reproduce the results, see Model Training. In order to set this to train the original model, set the environment variable USE_REVIEWED_DATA
: By default is true and use newly reviewed articles to train the model. If set to False, the pipeline will reproduce the original model.
The following steps can be followed to retrain the Article Relevance Prediction Model:
In order to retrain the model to reproduce the results, see Model Training. In order to set this to retrain the model:
- Set the environment variable
USE_REVIEWED_DATA
: By default is true and use newly reviewed articles to train the model. If set to False, the pipeline will reproduce the original model. - Set the environment variable
REVIEWED_FOLDER_PATH
: This allows the pipeline to use the results of the Data Review Tool to retrain a new model. Set this to where the result parquet file is stored.
In order to run the prediction pipeline, see Article Relevance Prediction
See a problem or have an idea to improve the project? Please submit an issue here: Submit New Issue to MetaExtractor