While PyData Amsterdam 2024 was not that interesting, I wrote kind-of word2vec (cbow) as I understood it. I didn't care about anything but to kill some time and train a small NN on a M2 Pro.
- Run
data.py
to preprcoesshp.txt
into vocabs and word-to-index and index-to-word - Run
train.py
to start far-from-optimal train loop - Run
run.py
likepython run.py 'harry+ron-hermione'
to get top-5 words that are close in the learned space