catalyst-dynamic-text-classification

This code is a re-implementation of the paper Metric Learning for Dynamic Text Classification by ASAPP Research using Catalyst framework.

The original code for the paper can be found here asappresearch/dynamic-classification.

How to run

Clone repository

git clone [email protected]:xelibrion/catalyst-dynamic-text-classification.git
cd catalyst-dynamic-text-classification

Install dependencies
```
pip install -e .
```
Fetch data
```
cd dynamic_class
./get_data.py
```
Run train script to build vocabulary (it will fail to train the model without embeddings)
```
./train.py
```
Compute words vectors for the vocabulary using a fasttext model. Can be downloaded here.
```
cat input/vocab.txt | awk -F ' ' '{print $1}' > vocab_words.txt
~/projects/fasttext/fasttext print-word-vectors  ~/projects/fasttext/cc.en.300.bin < vocab_words.txt > vocab_vectors.txt
```
Please note that the original paper used GloVe as word embeddings. You might want to experiment with the choice of embeddings.

Also, the tokenizer could be much better - at the moment it simply splits on whitespace.
Train the model
```
./train.py
```

This pipeline uses sru package, which might cause some challenges to get things running. See my comment here.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
dynamic_class		dynamic_class
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py