Replies: 1 comment
-
I list here some sources suggested by colleagues: From Steve: https://odin.linguistlist.org/ (Although the download function doesn't work at the moment. I did email them to fix it) CommonVoice Hundreds of languages, many low-resourced https://www.openslr.org/79/ Kannada critically endangered https://www.openslr.org/126/ Kannada critically endangered https://www.amazon.science/blog/amazon-releases-51-language-dataset-for-language-understanding From Chris: Check the 1000Langs corpus of parallel bible texts for overlap with our sample of languages: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Please list here suggestions for corpora sources that might be useful for the TeDDi Sample.
Beta Was this translation helpful? Give feedback.
All reactions