A parser performing sentiment analysis that uses the Apache OpenNLP and Apache Tika libraries to perform text analysis on the the Large Movie Review Dataset. Negative and positive reviews were combined together in a file "result", and each review has a "positive" or a "negative" label before it.
$ cd $HOME/src
$ git clone https://github.com/USCDataScience/SentimentAnalysisParser
$ cd SentimentAnalysisParser
$ mvn install assembly:assembly
$ cd target/sentiment
$ mkdir -p model/org/apache/tika/parser/sentiment/topic/
$ bin/sentiment SentimentTrainer -model model/org/apache/tika/parser/sentiment/topic/en-sentiment.bin -lang en -data ./../../examples/categorical_dataset -encoding UTF-8
The model is written to en-sentiment.bin
Make sure you are in target/sentiment
$ bin/sentiment Tika -model model/org/apache/tika/parser/sentiment/topic/en-sentiment.bin -o ../../examples/gun-output1 -j ../../examples/gun-ads
- Chris A. Mattmann, JPL
- Anastasija Mensikova, Trinity College, CT
This project began as the Google Summer of Code 2016 project of Anastasija Mensikova for Apache Software Foundation under the supervision of Chris Mattmann