SMART-KPE

Code for paper "Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction" You can download the data here.

Update:

Since we released the code, we have been working on writing a document and comments for you. In order to make it easy to replicate the result and commpare to previous works, we're trying to generate checkpoints from all 15 varients according to BERT-KPE upon their version of codes (included in BERT-KPE-BASED folder) and we will release them as soon as possible.

We provide final checkpoints from BERT-KPE_based here. Currently we only upload the best model(Roberta2Joint based SMART-KPE, F@3: 0.405) and we'll update more from different varients soon. You can use the test.sh in the script folder to check the results.

To run the code, make sure you're using Pytorch 1.4.0, otherwise the data parallel part/transformer may not work properly.

If you would like to replicate the best result before we release other checkpoints, you can first try the following steps:

Download the image data and title data.
Add all the title data to the dataset files. In the original jsonl file, each line corresponds to a piece of data and contain 3 domains: url, text and VDOM. In order to use title data, you can add a new domain named title and the content is the title string. (We will add a script to help you process it in the near future)
Follow instructions of BERT-KPE to proprecess data and run the experiment with scripts provided in this repo.

Thanks again for your interest of our work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SMART-KPE

Files

README.md

Latest commit

History

README.md

File metadata and controls

SMART-KPE