Project Title

An aid to learning to read foreign languages

Project Description

The main content of this project is to build a simple Chinese-English translation web tool, which supports users to paste URL, extract paragraph text from it and translate automatically, users can also choose to paste the paragraph or word they want to translate manually.

ASR part is added to behave as an auxiliary input method, therefore AST+MT can be bounded as a Spoken Language Translation(SLT). ASR part is inspired by wav2letter which is proposed by FaceBook in 2017. The model of ASR is a solely CNN end to end speech recognition experiment.

Demo-Preview

Algorithm folder is to store different finished machine translation algoirthms. Corpus folder is to store different corpora used in machine translation. Demo_FrontEnd folder is used to make representation for demo. bysms folder is to store the whole system embedded into Django framework. database folder is to store the files related to databese format converting. results folder is to store the evalution files for machine tranlation algoirthms. speech2text folder is to store files related to speech recognition. requirements file is to store all the packages and testing softwares needed to installed in the local computer.

Installation

To use this project, first clone the repo on your device using the command below:

git clone https://github.com/michelle-chou25/An-aid-to-learning-to-read-foreign-languages.git

Then according to the requirements file to install packages and testing softwares.

Then using the command below (enter into bysms folder):

cd /Users/username/VScodeProjects/ProjectName/bysms

After the Django server part and MongoDB part start, input the URL http://127.0.0.1:8000/view/ in the browser

(the port number 8000 needs to be change into the default port number of the used computer).

Then the word query and sentence translation can be done in the FrontEnd part.

Corpora

The Google Drive link for trained Machine Translation models are as follows: (Format: Corpus name_Algorithm name_Translation direction)

Corpus1w_Transformer_Chinese-English: https://drive.google.com/file/d/1-q6RZcZyxEBfXzOFnJkLRNIa9ctQLWMD/view?usp=sharing
Corpus1w_Transformer_English-Chinese: https://drive.google.com/drive/folders/13y7P1uFnKijU8BS1J2YOVH_QyaHD114o?usp=sharing

Machine Translation Model

The Google Drive link for trained Machine Translation models are as follows: (Format: Corpus name_Algorithm name_Translation direction)

Corpus1w_Transformer_Chinese-English: https://drive.google.com/file/d/11269OHJtp9E6xLK1V2RdPEVDKj9ImljQ/view?usp=sharing
Corpus1w_Transformer_English-Chinese: https://drive.google.com/file/d/12V2-ysI5hJpd7iY0OMkYBZpjhFg0cWUE/view?usp=sharing
Corpus1w_seq2seq+attention_Chinese-English: https://drive.google.com/drive/folders/1-P3U4B7RNtwdeeM0KjpyfHB7Aji4s8nq?usp=sharing
Corpus_education_Transformer_Chinese-English: https://drive.google.com/drive/folders/1-R4lTnch3UEF_BQTAeUS74Yk9lSH5jIj?usp=sharing
Seq2seq_testcorpus_ seq2seq+attention_Chinese-English: https://drive.google.com/drive/folders/1UXjT5NN3JdNIImfNdWg3bHCXCNIPi9Ne?usp=sharing
ASR model https://drive.google.com/file/d/1n9zJQhNyEXAlD6nQVULpHOov1A5tSr2W/view?usp=sharing

Requirement package

listed in requirements file

Corpus

All used machine translation corpora can be downloaded by this link (after the corpora downloaded, put the corpora into Corpus folder) https://drive.google.com/file/d/1lAh29Qxmo1oqa9WphnU3K7L4CNGRVv4L/view?usp=sharing

ASR uses AISHELL-1 as the corpus, which is a Chinese madrian corpus inludes 178 hours speech recording of Chinese news, this corpus is conducted in a quiet room, from 400 people. Corpus address: https://www.openslr.org/33/

Requirement package

see requirements file

Contributors

Ruochen Xue & Nanjun Zhou

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Title

Project Description

Demo-Preview

Table of contents

Installation

Corpora

Machine Translation Model

Requirement package

Corpus

Requirement package

Contributors

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
.idea		.idea
Algorithm		Algorithm
Corpus		Corpus
bysms		bysms
database		database
images		images
results		results
speech2text		speech2text
webpreview		webpreview
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
requirements		requirements

michelle-chou25/An-aid-to-learning-to-read-foreign-languages

Folders and files

Latest commit

History

Repository files navigation

Project Title

Project Description

Demo-Preview

Table of contents

Installation

Corpora

Machine Translation Model

Requirement package

Corpus

Requirement package

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages