To Do Clean up/refactor Eliminate the use of pandas Document Test cases Make setup.py put the files in data somewhere accessible Optimize (big frequencies)