Definition Relation Extractor

The Definition Relation Extractor is a set of tools to pre-annotate, post-process and generate an RDF graph from natural languge dictionary definitions following the conceptual model proposed in the following work:

Vivian S. Silva, Siegfried Handschuh and André Freitas. Categorization of Semantic Roles for Dictionary Definitions. Cognitive Aspects of the Lexicon (CogALex-V), Workshop at the 26th International Conference on Computational Linguistics, (COLING), Osaka, 2016.

The definitions are pre-annotated based on syntactic patterns. Sample data generated by the pre-annotation can be manually curated to feed a machine learning classifier that can, in turn, classify a whole linguistic resource. This final classified data can then be converted into an RDF graph.

The WordNetGraph is an example of graph generated by the Definition Relation Extractor. Pre-annotated data was curated with the help of the Brat annotation tool and then used to train a RNN model. The trained model was used to classify all WordNet's noun and verb definitions, which where later post-processed, in order to fix some mistakes in the sequence of labels, and finally converted to an RDF graph.

Dependencies

Pre-annotation

Class `extraction.RoleExtractor`

Reads a list of natural language definitions and identifies the definition's semantic roles for each of them

Input:

List of definitions: one per line in the format id|POS|word_list|def, where:

id: the synset id (an integer, starting from 1)
POS: noun or verb
word_list: a comma-separated list of words that compose the synset (1 to n)
def: the definition text

Output:

Pre-annotated data file: definitions classified in IOB format

Post-Processing

Class `extraction.PostProcessing`

Reads classified data generated by a machine learning classifier and prepares it to be converted into and RDF graph

Input:

List of definitions: one per line in the format id|POS|word_list|def, where:

id: the synset id (an integer, starting from 1)
POS: noun or verb
word_list: a comma-separated list of words that compose the synset (1 to n)
def: the definition text

Classified data: file in IOB format (returned by the RNN classifier)

Note: sequence of definitions in both files must match

Output:

Fixed classified data: file in IOB format with all classifications fixed (missing supertypes added and inconsistent IOB sequences adjusted)

RDF Model Construction

Class `model.ModelBuilder`

Input:

List of definitions: one per line in the format id|POS|word_list|def, where:

id: the synset id (an integer, starting from 1)
POS: noun or verb
word_list: a comma-separated list of words that compose the synset (1 to n)
def: the definition text

Classified data: file in IOB format (preferably the one returned by the PostProcessing routine)

Note: sequence of definitions in both files must match

Output:

RDF files in XML and/or N-TRIPLES format (options must be set in the configuration file params.txt in the conf folder)

Utils

Auxiliary routines to convert data between different formats

Class `util.IOBtoStandoff`

Generate a file in the standoff format to be read by the Brat annotation tool

Input:

Classified data: file in IOB format

Output:

FIle in standoff format as defined by the Brat tool

Class `util.StandofftoIOB`

Reads the standoff file generated by the Brat tool after data annotation and converts it back to IOB format

Input:

List of definitions: one per line in the format id|POS|word_list|def, where:

id: the synset id (an integer, starting from 1)
POS: noun or verb
word_list: a comma-separated list of words that compose the synset (1 to n)
def: the definition text

Standoff file: file generated by the Brat tool

Note: the sequence in the list of definitions must be the same as in the one sent as input to the Brat tool

Output:

File in IOB format

Class `util.DataScriptBuilder`

Generates a python script for creating the dataset to be sent as input for the RNN model

Input:

Classified data: file in IOB format

Output:

A python script to generate a pickle file to feed the RNN model

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
conf		conf
data		data
input		input
output		output
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Definition Relation Extractor

Dependencies

Pre-annotation

Class `extraction.RoleExtractor`

Post-Processing

Class `extraction.PostProcessing`

RDF Model Construction

Class `model.ModelBuilder`

Utils

Class `util.IOBtoStandoff`

Class `util.StandofftoIOB`

Class `util.DataScriptBuilder`

About

Uh oh!

Releases

Packages

Languages

ssvivian/DefRelExtractor

Folders and files

Latest commit

History

Repository files navigation

Definition Relation Extractor

Dependencies

Pre-annotation

Class extraction.RoleExtractor

Post-Processing

Class extraction.PostProcessing

RDF Model Construction

Class model.ModelBuilder

Utils

Class util.IOBtoStandoff

Class util.StandofftoIOB

Class util.DataScriptBuilder

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Class `extraction.RoleExtractor`

Class `extraction.PostProcessing`

Class `model.ModelBuilder`

Class `util.IOBtoStandoff`

Class `util.StandofftoIOB`

Class `util.DataScriptBuilder`

Packages