Skip to content

Re-implementation of the paper Metric Learning for Dynamic Text Classification

Notifications You must be signed in to change notification settings

xelibrion/dynamic-text-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

catalyst-dynamic-text-classification

This code is a re-implementation of the paper Metric Learning for Dynamic Text Classification by ASAPP Research using Catalyst framework.

The original code for the paper can be found here asappresearch/dynamic-classification.

How to run

  1. Clone repository

    git clone [email protected]:xelibrion/catalyst-dynamic-text-classification.git
    cd catalyst-dynamic-text-classification
    
  2. Install dependencies

    pip install -e .
    
  3. Fetch data

    cd dynamic_class
    ./get_data.py
    
  4. Run train script to build vocabulary (it will fail to train the model without embeddings)

    ./train.py
    
  5. Compute words vectors for the vocabulary using a fasttext model. Can be downloaded here.

    cat input/vocab.txt | awk -F ' ' '{print $1}' > vocab_words.txt
    ~/projects/fasttext/fasttext print-word-vectors  ~/projects/fasttext/cc.en.300.bin < vocab_words.txt > vocab_vectors.txt
    

    Please note that the original paper used GloVe as word embeddings. You might want to experiment with the choice of embeddings.

    Also, the tokenizer could be much better - at the moment it simply splits on whitespace.

  6. Train the model

    ./train.py
    

Gotchas

This pipeline uses sru package, which might cause some challenges to get things running. See my comment here.

About

Re-implementation of the paper Metric Learning for Dynamic Text Classification

Resources

Stars

Watchers

Forks

Languages