Skip to content

Latest commit

 

History

History
43 lines (27 loc) · 1.1 KB

README.md

File metadata and controls

43 lines (27 loc) · 1.1 KB

Categorizer (a PragueHacks 2016 project)

Simple classification engine for government/municipality documents built with TensorFlow Documents are tagged based on occurrence of certain words and other characteristics of a document.

This project is a prototype for

Built during Prague Hacks 2016

Setup

Requirements:

sudo pip install numpy
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.10.0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade $TF_BINARY_URL

Run

Prepare data:

  1. copy tagged content files to ./input
  2. copy feature vector to features.csv
  3. export CATS=`cat cats.txt
  4. bash generate-all.sh features.csv $CATS

Train DNN

  1. python train.py $CATS

Run classification on new data

  1. python predict.py features.csv $CATS output.csv