TomatoEngine

An application to crawl a text corpus of Rotten Tomatoes movie reviews, act as a search engine to query over the corpus and perform text classification and clustering.

This repo is structured into four main folders:

TomatoCrawler
TomatoClassifier
TomatoSearch
OkTomato

TomatoCrawler

It is a crawling module implemented in Node.js.

To install the dependency,

$ npm install

To run the crawling,

$ node TomatoCrawler/main.js

TomatoClassifier

First, we need to install the following dependencies manually because the installation process is not consistent across platform:

Install Mathplotlib
Install Scipy
Install Numpy
Install Scikit-learn

To run the classifier,

$ python3 main.py

It will try different classifiers and show precision. We tweaks parameters in main.py for different classifier.

To label all the data using the classifier,

$ python3 label_data.py

TomatoSearch

There are two folders config and website which are contains the code for indexing and the website respectively. The instructions can be found as follows:

OkTomato

This folder is mainly used to download the entities from Elasticsearch and upload them to Wit.ai.

In the OkTomato directory:

To download the entities, run

$ python data/populate_data.py

To upload to Wit.ai, run

$ python upload_entities.py

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
OkTomato		OkTomato
TomatoClassifier		TomatoClassifier
TomatoCrawler		TomatoCrawler
TomatoSearch		TomatoSearch
.gitignore		.gitignore
.spyderproject		.spyderproject
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TomatoEngine

TomatoCrawler

TomatoClassifier

TomatoSearch

OkTomato

About

Releases

Packages

Contributors 4

Languages

junyi/TomatoEngine

Folders and files

Latest commit

History

Repository files navigation

TomatoEngine

TomatoCrawler

TomatoClassifier

TomatoSearch

OkTomato

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages