Python package for Natural Language Processing (NLP), focused on low-resource languages spoken in Mexico.
This is a project of Comunidad Elotl.
Developed by:
- Paul Aguilar @penserbjorne, [email protected]
- Robert Pugh @Lguyogiro, [email protected]
- Diego Barriga @umoqnier, [email protected]
Requiere python>=3.11
- Development Status
Beta
. Read Classifiers - pip package: elotl
- GitHub repository: ElotlMX/py-elotl
pip install elotl
git clone https://github.com/ElotlMX/py-elotl.git
cd py-elotl
pip install -e .
import elotl.corpus
print("Name\t\tDescription")
list_of_corpus = elotl.corpus.list_of_corpus()
for row in list_of_corpus:
print(row)
Output:
Name Description
['axolotl', 'Is a Spanish-Nahuatl parallel corpus']
['tsunkua', 'Is a Spanish-Otomí parallel corpus']
['kolo', 'Is a Spanish-Mixteco parallel corpus']
If a non-existent corpus is requested, a value of 0 is returned.
axolotl = elotl.corpus.load('axolotlr')
if axolotl == 0:
print("The name entered does not correspond to any corpus")
If an existing corpus is entered, a list is returned.
axolotl = elotl.corpus.load('axolotl')
print(axolotl[0])
[
'Y así, cuando hizo su ofrenda de fuego, se sienta delante de los demás y una persona se queda junto a él.',
'Auh in ye yuhqui in on tlenamacac niman ye ic teixpan on motlalia ce tlacatl itech mocaua.',
'Classical Nahuatl',
'Vida económica de Tenochtitlan',
'nci'
]
Each element of the list has four indices:
- non_original_language (l1)
- original_language (l2)
- variant
- document_name
- iso lang (optional)
tsunkua = elotl.corpus.load('tsunkua')
for row in tsunkua:
print(row[0]) # language 1
print(row[1]) # language 2
print(row[2]) # variant
print(row[3]) # document
Una vez una señora se emborrachó
nándi na ra t'u̱xú bintí
Otomí del Estado de México (ots)
El otomí de toluca, Yolanda Lastra
The following structure is a reference. As the package grows it will be better documented.
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── dist
├── docs
├── elotl Top-level package
├── corpora Here are the corpus data
├── corpus Subpackage to load corpus
├── huave Huave language subpackage
└── orthography.py Module to normalyze huave orthography and phonemas
├── __init__.py Initialize the package
├── nahuatl Nahuatl language subpackage
└── orthography.py Module to normalyze nahuatl orthography and phonemas
├── otomi Otomi language subpackage
└── orthography.py Module to normalyze otomi orthography and phonemas
├── __pycache__
└── utils Subpackage with common functions and files
└── fst Finite State Transducer functions
└── att Module with static .att files
├── LICENSE
├── Makefile
├── MANIFEST.in
├── pyproject.toml
├── README.md
└── tests
poetry env use 3.x
poetry shell
make all
Where 3.x
is your local python version. Check managing environments with poetry
Build the FSTs with make
.
make fst
poetry env use 3.x
poetry shell
python -m pip install --upgrade pip
poetry build
python -m pip install -e .
poetry publish
Remember to configure your PyPi credentials