Skip to content

Latest commit

 

History

History
26 lines (17 loc) · 743 Bytes

README.md

File metadata and controls

26 lines (17 loc) · 743 Bytes

Translator

Russian-Belarusian neural translator

The data is a part of my bachelor thesis about neural translation for the language pair Russian-Belarusian.

Repository

The repo consists of

  • 429k aligned sentence pairs (under Data/AlignedData), split into 10 batches

  • chunks to align (under Data/ChunksToAlign)

  • Data/TabbedCorpusMiddleSent.txt is a sample of 65966 sentences, at max 80 characters each, and is handy to train a model only on a sample of data.

  • neural network code.

Data source

? The main source of the data (web-pages,..)

Collection

? How the data was collected

This is an open-source project, data can be used freely. Any reviews are much than welcome.


Author: Tsimafei Prakapenka