Skip to content

slanglab/freq-e

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

freq-e = (class) frequency estimation

Use this software to infer the class frequencies in a collection of items (e.g documents or images). For example, given all blog posts about Barack Obama during a certain time period, what is the overall positive sentiment towards him?

In our academic paper we show that naive approaches which aggregate the hard labels or soft probabilities outputted from a trained discriminative classifier were often biased. Instead, we use an implicit likelihood method which combines a discriminitive classifier in a generative framework and allows for more robust estimation when the true prevalences of the train and test groups differ. See also

This software currently only supports binary predictions. Future work will expand this to multiclass.

Installation

Installing the freq-e package, assuming Python 3:

  1. pip install freq-e
  2. Then follow py_tutorial/tutorial.ipynb.

Usage

As we specify in py_tutorial/tutorial.ipynb, there are three different ways to obtain class frequency estimates:

  1. Create a FreqEstimator object and use the built-in training method.
  2. You can also train a scikit-learn classifier yourself and pass it in to freq-e. Here the model class is restricted to scikit-learn models that have a .decision_function() method.
  3. Use the standalone infer_freq_from_predictions() method and pass in the predicted probabilities of the positive class of the test set. This may be useful in the cases where you have certain classifier architectures that are not built from sklearn (e.g. an LSTM or CNN).

Citing

If you use this software, please cite our paper. Here is the Bibtex entry:

@inproceedings{keith2018uncertainty,
  title={Uncertainty-aware generative models for inferring document class prevalence},
  author={Keith, Katherine and O'Connor, Brendan},
  booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
  year={2018}
}

Contact

Contact the software authors with any questions: Katherine Keith ([email protected]) and Brendan O'Connor ([email protected]).