A lightweight but fast Bloomfilter written in Python(2.7).
Based on mmap and MurMur, Spooky hash functions as the base hashes. Use double hashing to reduce the number of hashes to two.
Install boost and boost-python
- Ubuntu
sudo apt-get install libboost-all-dev
- Centos
sudo yum install boost-devel
- OSX
brew install boost boost-python
Install pyhash
sudo pip install pyhash
Install fastbloom
sudo pip install fastbloom
>>> from fastbloom import BloomFilter
>>> filter_ = BloomFilter(10000, 0.001) # set size of input 1000, error rate 0.1%
>>> filter_.add('www.google.com')
>>> 'www.google.com' in filter_
True
>>> 'www.github.com' in filter_
False
When input_size set to 1000000, accepted_error_rate set to 0.01%
- Memory consumption: 4 MB
- Add operation: 7322.42 ops
- Check operation: 24788.79 ops
- Actual fault positive rate: 0.0000
You can just run the benchmark.py to see the actual benchmarks
- add a faster bloomfilter
- add save and restore functions(as files)
- write a scalable bloomfilter