SemanticAnalyzer SDK Sample Java Code

These code samples illustrate the usage of Lemmatizing and Sentiment Analysis SDKs for the Russian language.

To start using these technologies in your projects, you need to acquire the license. Get in touch at [email protected]

Sentiment Analysis technology can be consumed as an API on rapidapi.com:

https://rapidapi.com/insider-insider-default/api/russiansentimentanalyzer

Looking for Chinese sentiment analyzer? Try out the Fuxi API: https://rapidapi.com/insider-insider-default/api/fuxiapi

Lemmatizer features

Documentation in pdf form in Russian and in English is available.

The process of lemmatization constitutes in deriving lemma and a POS tag for a given surface form (word). Because Russian is highly inflectional, it is very important to derive the word lemma and use it instead of stem, which is more crude way of normalizing Russian.

The application area of the lemmatizer is very wide:

information retrieval (we have a token filter for Lucene / Solr / Elasticsearch, contact us, if you need one)
sentiment analysis (read on, if you are interested in this)
machine translation: to avoid issues with sparse word forms space one can lemmatize them first before translating
your project / research

Dictionary size

The dictionary contains order of 100k lemmas, which translates to several million words, including the grammatical cases as well as polysemic (multi-meaning, homonyms) words.

Part of Speech (POS) tags

For each word, lemmatizer returns its POS tag. There can be many POS tags for a given word.

User dictionary

If for a particular word you do not agree with the lemma and POS tag prediction, you can redefine this behaviour in your personal user dictionary. It is done by establishing a link with an existing word, grammatical features of which are the closest to your target. For instance, if to assume the lemmatizer does not recognize the word инет (social media slang word from Internet), you define it via the linked word Интернет (Internet):

инет\tинтернет

(\t is the symbol of tabulation)

Sentiment Analyzer features

Documentation in pdf form in Russian is available.

3-way classification

The system returns one of the following labels for a given text (or sentence): NEUTRAL, POSITIVE, NEGATIVE.

Object oriented sentiment detection

Most of the times, especially when monitoring a brand / person / company in the social / news media, it is important to know the sentiment oriented to it. In the following example:

I like Phone1, but Phone2 is ugly.

we expect to get POSITIVE label for the object Phone1, and NEGATIVE for the object Phone2.

Object synonyms

Because an object can be referred to using different words or word sequences (like "Android" or "Droid" etc), the system supports describing the target object with an array of object synonyms. The first object synonym to be found in the given text will trigger sentiment detection algorithm.

Sentiment detection quality control

The quality can be controlled by overriding / introducing new sentiment words in the user polarity dictionaries.

DocTop API

The system for topical grouping in unstructured content. Large-scale compatible: you can generate topics out of your text silos on as big a dataset as millions of texts. Supports multiple languages:

Access / subscribe to the API here: https://rapidapi.com/insider-insider-default/api/doctop

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
conf		conf
documentation		documentation
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SemanticAnalyzer SDK Sample Java Code

Lemmatizer features

Dictionary size

Part of Speech (POS) tags

User dictionary

Sentiment Analyzer features

3-way classification

Object oriented sentiment detection

Object synonyms

Sentiment detection quality control

DocTop API

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

insidersolutions/nlproc_sdk_sample_code

Folders and files

Latest commit

History

Repository files navigation

SemanticAnalyzer SDK Sample Java Code

Lemmatizer features

Dictionary size

Part of Speech (POS) tags

User dictionary

Sentiment Analyzer features

3-way classification

Object oriented sentiment detection

Object synonyms

Sentiment detection quality control

DocTop API

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages