Media Cloud Story Quote Extractor

A helper that will extract quotes from a DB of stories from Media cloud. This starts with a Mongo database full of stories, where each document in the database is a story that has a story_text property.

Requirements:

Python3 - we use pyenv to manage different versions p* Stanford CoreNLP Server - This requires you to be running a copy of the Stanford CoreNLP Server, (here is my fork of the Docker install with some tweaks for the annotators we use for quote extraction).
Redis - we use this via celery as a queue for parallel processing
Mongo - this holds the story information

Dev Installation

Install the dependencies pip install -r requirements.txt.

Configuration

Copy the .env.template to .env and then edit it.

Use

Open up one terminal window and start the workers waiting: celery worker -A quoteworker -l info. Watch the log to see if processing stories.

In another window start filling up the queue with python queue-stories-from-db.py .

Notes

To empty out your queue of jobs, run redis-cli FLUSHALL.
Run a few quick sanity tests to make sure you are connected to the NLP server: test.sh

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
quoteworker		quoteworker
.env.template		.env.template
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
queue-stories-from-db.py		queue-stories-from-db.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt
test.sh		test.sh
time-to-complete.py		time-to-complete.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Media Cloud Story Quote Extractor

Dev Installation

Configuration

Use

Notes

About

Releases

Packages

Languages

mitmedialab/Quote-Annotator

Folders and files

Latest commit

History

Repository files navigation

Media Cloud Story Quote Extractor

Dev Installation

Configuration

Use

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages