-
Notifications
You must be signed in to change notification settings - Fork 2
Home
This semantic network is intended to be used in practical NLP application. It is developed primarily for Czech language by marxsk and students of Computational Linguistics on Masaryk University.
Currently, following semantic classes are present (and few more are in development):
- _person = entity that can act by its own will (people, companies, animals, ...)
- _person/individual = word in singular is represented always by single entity
- _person/animal = word represents animal
- _event = something with start/end in time
- _substance
_person was a first defined and annotated class. The idea for this class is based on verb valency lexicons where person+institution+organization+... are the most frequent group because actor usually belongs there. If our approach will not work on this class, there is just small chance that it will work elsewhere. We have merged several "traditional" semantic classes together because we did not want to distinguish difficult cases immediately. So, we treat 'bank' as a person because 'person from bank' is usually not explicitly used in text.
_person/individual is much closer to traditional 'person' semantic class because it does not contain any companies or social groups. Animals, ghosts, ... are still part of this group. The reason why we have this subgroup is curiosity. We want to know if there is a difference between this group and its parental group. Creating such subgroups is very appreciated and supported by project. So, it is very cheap&fast to annotate it - all you have to do is annotated _person+ tokens which are <5% of all tokens. In case of sub-sub-groups this is even much less.
_event for prepositional attachment it is quite important to know if noun phrase can represent time.
_substance can be defined easily but in Czech is hard to distinguish between substance and specific object that instantiate substance. Also annotators were quite confused, so this have to be re-annotated later with better annotation guidelines and improved formalism for testing. This semantic class is used to be example of how to not define a good class :) The main positive is that it lead to further analysis of phenomena called 'missing token' and types inference.