RAPOSA

Lexicological framework for pipeline text processing

Description

RAPOSA processes texts word by word and applies different filters in a conveyor-belt like fashion.

Define a pipeline, with its tokenization method, and the different tubes through which the tokens will travel. Tubes may modify the token, discard it, tag it, or any combination of those three. Some basic pipelines and tubes are included, but every case is different, so customization was the key guiding principle. As such, we encourage to check the demo.py file and the code itself to know how to create and combine your own derived classes.

The intended use case for RAPOSA is lexicology analysis, being of special convenience for neology, lexicography and morphology, but its open-endedness and customization allow for many different kinds of purposes. For this reason, it also includes many other NLP/CompLing goodies.

RAPOSA is not tied to any specific language, though currently it may only contain filters for some languages due to obvious time development constraints. RAPOSA is specially proud to support minorized and minority languages.

Contributions are always warmly welcome and appreciated!

Use

Define a pipeline, with its tokenization method, and the different tubes through which the tokens will travel. Tubes may modify the token, discard it, tag it, or any combination of those three. Some basic pipelines and tubes are included, but every case is different, so customization was the key guiding principle. As such, we encourage to check the demo.py file and the code itself to know how to create and combine your own derived classes.

As the software is under development, look at demo.py for examples until proper docs are in place.

Patronage

The initial version of this package has been developed under a research scholarship from the Deputación da Coruña for the year 2016.

License

The software is released under a MIT License (see LICENSE file in the root folder for details), except for the following resources, which are derivative work:

Module langs.gl.stemmer is an adaptation of the code at http://bvg.udc.es/recursos_lingua/stemming.jsp, copyright 2006 Biblioteca Virtual Galega
The data in langs/es/data/drae_2011.dat is taken from the lemmas for the Diccionario de la Real Academia Española as release at http://dirae.es/
The data in langs/es/data/nombres_2016.dat and langs/es/data/apellidos_2016.dat is taken from official data from the Spanish Instituto Nacional de Estadística: http://www.ine.es/dyngs/INEbase/es/operacion.htm?c=Estadistica_C&cid=1254736177009&menu=resultados&idp=1254734710990
The data in langs/gl/data/corga_1.7.dat is taken from the frequency list of the Corpus de Referencia do Galego Actual: http://corpus.cirp.es/corga/ As such, this modified lexicon is released under the terms of the Lesser General Public License For Linguistic Resources as the original. See LICENSE file in that folder for details.
The data in langs/gl/data/xiada_2.6.dat is taken from the XIADA project by the Centro Ramón Piñeiro para a Investigación en Humanidades: http://corpus.cirp.es/xiada/ As such, this modified lexicon is released under the terms of the Lesser General Public License For Linguistic Resources as the original. See LICENSE file in that folder for details.
The data in lans/gl/data/estraviz_09_2017.dat is taken from the sitemaps for the Dicionário Estraviz: http://estraviz.org/
The data in langs/gl/data/toponimia_2013.dat is taken from official data from the Xunta de Galicia: http://abertos.xunta.gal/catalogo/territorio-vivienda-transporte/-/dataset/0159/microtoponimia-galicia

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
raposa		raposa
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAPOSA

Description

Use

Patronage

License

About

Releases

Packages

Languages

License

xurxodiz/raposa

Folders and files

Latest commit

History

Repository files navigation

RAPOSA

Description

Use

Patronage

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages