Author: Horace He
This work collect 12 representative chinese online fantasy novels from 3 famous authors, and form a medium scale dataset of 768K sentences paired with book title and author information.
This work apply the RNNLM and VAE models on the proposed chinese online fantasy dataset. In order to alleviate the KL vanishing problem, various approaches have been studied to aid VAE optimization.
This work evaluate the performance of unconditional fantasy generation, and analyze the learned VAE latent space with t-SNE visualization.
- anaconda==4.8.3
- torch==1.5.0
- torchtext==0.6.0
- tensorflow==2.1.0
- tensorboard==2.1.1
- HanLP==2.0.0a44
- transfomer==2.10.0
export PYTHONPATH=/path/to/project/:$PYTHONPATH
The program will download and clean the text files autormatically.
python prepare_data.py
Build torchtext dataset after preparation.
python dataset.py
python src/train/train_lm.py
Run with "-h" option to see argument details.
python src/train/train_vae.py
Run with "-h" option to see argument details. Supported parameters include KL annealing method, KL annealing cycle, weight of reconstruction loss, whether to predict hidden state of decoder by latent variable, whether to use aggressive training, etc.
The training process can be visualized by running tensorboard in the results directory.
tensorboard --logdir .
And then open the port 6006 on browser.
Run the following command to write embeddings into results directory, then restart the tensorboard to see visualizations.
python src/visualize/latent_space_visualize.py
Run with "-h" option to see argument details.