NLP-TOS-Summarization

Clone repository. Go to root directory.
pip install -r requirements.txt
Additional work may need to be done to set up NLTK since you may need to download data.
For sentence compression, download the Stanford Parser
- Note: working version (3.9.2)
- Place zip in known location
- Unzip
Set 'CLASSPATH' environment variable:
- export CLASSPATH=$CLASSPATH:<path to stanford-parser directory>

For help running model 1, cd into model1 and type python main.py -h for command line options.

Ensure that you are using Python 3

Start with the original Terms of Service/Legal text in a .txt file, and navigate to within the model2 directory
Run the text file through the super_parser9000.py to normalize the text formatting. The script take two arguments, the name/path of the text you wish to re-format, and the name you wish to bestow upon the re-formatted text file the script returns.
Now, we extract the features that will be used in the training (or testing) of the data file. Run the file feature_creation.py with the first argument being the name/path to the original ground truth text, the second argument being the name/path to the re-formatted text file the super_parser9000.py returned, and the third argument being what you wish the returned JSON file to be named.
To train the model, comment in which supervised machine learning model you wish to use in the train.py file, and run with the following arguments and flags:

-tf -- the path to the myriad of files that you wish to train the model
-ts -- the path to the JSON file previously created by feature_creation.py that will be tested on
-of -- the path to where you wish the model's output summary to be placed

The ROUGE metrics were calculated by using the pyrouge wrapper around the original Rouge.1.5.5.pl file, and are reliant on Python 2.7

Name	Name	Last commit message	Last commit date
Latest commit shanalily Update README.md Apr 27, 2019 94cd91c · Apr 27, 2019 History 139 Commits
PyRouge	PyRouge	testing	Apr 26, 2019
data	data	Merge branch 'master' of github.com:jhong16/NLP-TOS-Summarization int…	Apr 27, 2019
evaluation	evaluation	Merge branch 'master' of github.com:jhong16/NLP-TOS-Summarization int…	Apr 27, 2019
model1	model1	comments	Apr 27, 2019
model2	model2	ROUGE metrics	Apr 27, 2019
parsing	parsing	moved corpus, made json file (stores metadata about extracted tos text)	Apr 25, 2019
unprocessed_data	unprocessed_data	moved corpus, made json file (stores metadata about extracted tos text)	Apr 25, 2019
utils	utils	hello	Apr 17, 2019
.gitignore	.gitignore	update gitignore	Apr 16, 2019
README.md	README.md	Update README.md	Apr 27, 2019
requirements.txt	requirements.txt	fixed merge conflicts	Apr 18, 2019
test.py	test.py	comments	Apr 27, 2019

Provide feedback