Python package for Natural Language Processing (NLP), focused on low-resource languages spoken in Mexico.
This is a project of Comunidad Elotl.
Developed by:
- Paul Aguilar @penserbjorne, [email protected]
- Robert Pugh @Lguyogiro, [email protected]
- Diego Barriga @umoqnier, [email protected]
Requiere python>=3.11
- Development Status
. Read Classifiers - pip package: elotl
- GitHub repository: ElotlMX/py-elotl
pip install elotl
git clone
cd py-elotl
pip install -e .
import elotl.corpus
list_of_corpus = elotl.corpus.list_of_corpus()
for row in list_of_corpus:
Name Description
['axolotl', 'Is a Spanish-Nahuatl parallel corpus']
['tsunkua', 'Is a Spanish-Otomí parallel corpus']
['kolo', 'Is a Spanish-Mixteco parallel corpus']
If a non-existent corpus is requested, a value of 0 is returned.
axolotl = elotl.corpus.load('axolotlr')
if axolotl == 0:
print("The name entered does not correspond to any corpus")
If an existing corpus is entered, a list is returned.
axolotl = elotl.corpus.load('axolotl')
'Y así, cuando hizo su ofrenda de fuego, se sienta delante de los demás y una persona se queda junto a él.',
'Auh in ye yuhqui in on tlenamacac niman ye ic teixpan on motlalia ce tlacatl itech mocaua.',
'Classical Nahuatl',
'Vida económica de Tenochtitlan',
Each element of the list has four indices:
- non_original_language (l1)
- original_language (l2)
- variant
- document_name
- iso lang (optional)
tsunkua = elotl.corpus.load('tsunkua')
for row in tsunkua:
print(row[0]) # language 1
print(row[1]) # language 2
print(row[2]) # variant
print(row[3]) # document
Una vez una señora se emborrachó
nándi na ra t'u̱xú bintí
Otomí del Estado de México (ots)
El otomí de toluca, Yolanda Lastra
The following structure is a reference. As the package grows it will be better documented.
├── dist
├── docs
├── elotl Top-level package
├── corpora Here are the corpus data
├── corpus Subpackage to load corpus
├── huave Huave language subpackage
└── Module to normalyze huave orthography and phonemas
├── Initialize the package
├── nahuatl Nahuatl language subpackage
└── Module to normalyze nahuatl orthography and phonemas
├── otomi Otomi language subpackage
└── Module to normalyze otomi orthography and phonemas
├── __pycache__
└── utils Subpackage with common functions and files
└── fst Finite State Transducer functions
└── att Module with static .att files
├── Makefile
├── pyproject.toml
└── tests
poetry env use 3.x
poetry shell
make all
Where 3.x
is your local python version. Check managing environments with poetry
Build the FSTs with make
make fst
poetry env use 3.x
poetry shell
python -m pip install --upgrade pip
poetry build
python -m pip install -e .
poetry publish
Remember to configure your PyPi credentials