Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTM community detector (py3) #5

Merged
merged 16 commits into from
Jul 29, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ deprecated/
.idea/
.project
.pydevproject
*.swp
htmjava.log
nab/detectors/htmjava/.pydevproject
scripts/.ipynb_checkpoints/
Expand All @@ -19,3 +20,4 @@ plot_*/
pyenv2/
build/
dist/
htm.core/
149 changes: 75 additions & 74 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
The Numenta Anomaly Benchmark [![Build Status](https://travis-ci.org/numenta/NAB.svg?branch=master)](https://travis-ci.org/numenta/NAB)
# The Numenta Anomaly Benchmark [![Build Status](https://travis-ci.org/numenta/NAB.svg?branch=master)](https://travis-ci.org/numenta/NAB)
-----------------------------

Welcome. This repository contains the data and scripts comprising the Numenta
Expand Down Expand Up @@ -28,26 +28,44 @@ Ahmad, S., Lavin, A., Purdy, S., & Agha, Z. (2017). Unsupervised real-time
anomaly detection for streaming data. Neurocomputing, Available online 2 June
2017, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2017.04.070

#### Scoreboard
## Community edition

This repo is [NAB community edition](https://github.com/htm-community/NAB) which is a for of the original [Numenta's NAB](https://github.com/numenta/NAB). One of the reasons for forking
was a lack of developer activity in the upstream repo.

### Features:

- [x] Identical algorithms and datasets as the Numenta's NAB. So the results are `reproducible`.
- [x] `Python 3` codebase (as Python 2 reaches end-of-life at 1/1/2020 and Numenta's not yet ported)
- [x] additional community-provided detectors:
- `htmcore`: currently the only HTM implementation able to run in NAB natively in python 3. (with many improvements in [Community HTM implementation, successor of nupic.core](https://github.com/htm-community/htm.core/).
- `numenta`, `numenta_TM` detectors (original from Numenta) made compatible with the Py3 codebase (only requires Py2 installed)
- [ ] additional datasets
- TBD, none so far

Statement: We'll try to upstream any changes, new detectors and datasets to upstream Numenta's NAB, when the devs have time to apply the changes.

## Scoreboard

The NAB scores are normalized such that the maximum possible is 100.0 (i.e. the perfect detector), and a baseline of 0.0 is determined by the "null" detector (which makes no detections).

| Detector | Standard Profile | Reward Low FP | Reward Low FN |
|---------------|------------------|---------------|---------------|
| Perfect | 100.0 | 100.0 | 100.0 |
| [Numenta HTM](https://github.com/numenta/nupic)* | 70.5-69.7 | 62.6-61.7 | 75.2-74.2 |
| [CAD OSE](https://github.com/smirmik/CAD)† | 69.9 | 67.0 | 73.2 |
| [earthgecko Skyline](https://github.com/earthgecko/skyline) | 58.2 | 46.2 | 63.9 |
| [KNN CAD](https://github.com/numenta/NAB/tree/master/nab/detectors/knncad)† | 58.0 | 43.4 | 64.8 |
| [Relative Entropy](http://www.hpl.hp.com/techreports/2011/HPL-2011-8.pdf) | 54.6 | 47.6 | 58.8 |
| [Random Cut Forest](http://proceedings.mlr.press/v48/guha16.pdf) **** | 51.7 | 38.4 | 59.7 |
| [Twitter ADVec v1.0.0](https://github.com/twitter/AnomalyDetection)| 47.1 | 33.6 | 53.5 |
| [Windowed Gaussian](https://github.com/numenta/NAB/blob/master/nab/detectors/gaussian/windowedGaussian_detector.py) | 39.6 | 20.9 | 47.4 |
| [Etsy Skyline](https://github.com/etsy/skyline) | 35.7 | 27.1 | 44.5 |
| Bayesian Changepoint** | 17.7 | 3.2 | 32.2 |
| [EXPoSE](https://arxiv.org/abs/1601.06602v3) | 16.4 | 3.2 | 26.9 |
| Random*** | 11.0 | 1.2 | 19.5 |
| Null | 0.0 | 0.0 | 0.0 |
| Detector | Standard Profile | Reward Low FP | Reward Low FN | Detector name | Time (s) |
|---------------|------------------|---------------|---------------|---------------|------------|
| Perfect | 100.0 | 100.0 | 100.0 | | |
| [Numenta HTM](https://github.com/numenta/nupic)* | 70.5-69.7 | 62.6-61.7 | 75.2-74.2 | `numenta` | |
| [CAD OSE](https://github.com/smirmik/CAD)† | 69.9 | 67.0 | 73.2 | | |
| [earthgecko Skyline](https://github.com/earthgecko/skyline) | 58.2 | 46.2 | 63.9 | | |
| [KNN CAD](https://github.com/htm-community/NAB/tree/master/nab/detectors/knncad)† | 58.0 | 43.4 | 64.8 | | |
| [Relative Entropy](http://www.hpl.hp.com/techreports/2011/HPL-2011-8.pdf) | 54.6 | 47.6 | 58.8 | | |
| [Random Cut Forest](http://proceedings.mlr.press/v48/guha16.pdf) **** | 51.7 | 38.4 | 59.7 | | |
| [htm.core](https://github.com/htm-community/htm.core/) | 50.83 | 49.95 | 52.64 | `htmcore` | |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first shot results are not really good, we have to pump it up guys! 📌

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed commit that uses numenta_detector's params, currently running benchmarks

| [Twitter ADVec v1.0.0](https://github.com/twitter/AnomalyDetection)| 47.1 | 33.6 | 53.5 | | |
| [Windowed Gaussian](https://github.com/htm-community/NAB/blob/master/nab/detectors/gaussian/windowedGaussian_detector.py) | 39.6 | 20.9 | 47.4 | | |
| [Etsy Skyline](https://github.com/etsy/skyline) | 35.7 | 27.1 | 44.5 | | |
| Bayesian Changepoint** | 17.7 | 3.2 | 32.2 | | |
| [EXPoSE](https://arxiv.org/abs/1601.06602v3) | 16.4 | 3.2 | 26.9 | | |
| Random*** | 11.0 | 1.2 | 19.5 | | |
| Null | 0.0 | 0.0 | 0.0 | | |

*As of NAB v1.0*

Expand All @@ -64,22 +82,6 @@ The NAB scores are normalized such that the maximum possible is 100.0 (i.e. the

Please see [the wiki section on contributing algorithms](https://github.com/numenta/NAB/wiki/NAB-Contributions-Criteria#anomaly-detection-algorithms) for discussion on posting algorithms to the scoreboard.

#### Corpus

The NAB corpus of 58 timeseries data files is designed to provide data for research
in streaming anomaly detection. It is comprised of both
real-world and artifical timeseries data containing labeled anomalous periods of behavior.

The majority of the data is real-world from a variety of sources such as AWS
server metrics, Twitter volume, advertisement clicking metrics, traffic data,
and more. All data is included in the repository, with more details in the [data
readme](https://github.com/numenta/NAB/tree/master/data). We are in the process
of adding more data, and actively searching for more data. Please contact us at
[[email protected]](mailto:[email protected]) if you have similar data (ideally with
known anomalies) that you would like to see incorporated into NAB.

The NAB version will be updated whenever new data (and corresponding labels) is
added to the corpus; NAB is currently in v1.0.

#### Additional Scores

Expand All @@ -88,35 +90,49 @@ For comparison, here are the NAB V1.0 scores for some additional flavors of HTM.
* Numenta HTM using NuPIC v.0.5.6: This version of NuPIC was used to generate the data for the paper mentioned above (Unsupervised real-time anomaly detection for streaming data. Neurocomputing, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2017.04.070). If you are interested in replicating the results shown in the paper, use this version.
* [HTM Java](https://github.com/numenta/htm.java) is a Community-Driven Java port of HTM.
* [nab-comportex](https://github.com/floybix/nab-comportex) is a twist on HTM anomaly detection using [Comportex](https://github.com/htm-community/comportex), a community-driven HTM implementation in Clojure. Please see [Felix Andrew's blog post](http://floybix.github.io/2016/07/01/attempting-nab) on experiments with this algorithm.
* NumentaTM HTM detector uses the implementation of temporal memory found
[here](https://github.com/numenta/nupic.core/blob/master/src/nupic/algorithms/TemporalMemory.hpp).
* Numenta HTM detector with no likelihood uses the raw anomaly scores directly. To
run without likelihood, set the variable `self.useLikelihood` in
[numenta_detector.py](https://github.com/numenta/NAB/blob/master/nab/detectors/numenta/numenta_detector.py)
to `False`.

* NumentaTM HTM detector uses the implementation of temporal memory found [here](https://github.com/numenta/nupic.core/blob/master/src/nupic/algorithms/TemporalMemory.hpp).
* Numenta HTM detector with no likelihood uses the raw anomaly scores directly. To run without likelihood, set the variable `self.useLikelihood` in [numenta_detector.py](https://github.com/numenta/NAB/blob/master/nab/detectors/numenta/numenta_detector.py) to `False`.



| Detector |Standard Profile | Reward Low FP | Reward Low FN |
|---------------|---------|------------------|---------------|
| Numenta HTMusing NuPIC v0.5.6* | 70.1 | 63.1 | 74.3 |
| [nab-comportex](https://github.com/floybix/nab-comportex)† | 64.6 | 58.8 | 69.6 |
| [NumentaTM HTM](https://github.com/numenta/NAB/blob/master/nab/detectors/numenta/numentaTM_detector.py)* | 64.6 | 56.7 | 69.2 |
| [HTM Java](https://github.com/numenta/NAB/blob/master/nab/detectors/htmjava) | 56.8 | 50.7 | 61.4 |
| [NumentaTM HTM](https://github.com/htm-community/NAB/blob/master/nab/detectors/numenta/numentaTM_detector.py)* | 64.6 | 56.7 | 69.2 |
| [HTM Java](https://github.com/htm-community/NAB/blob/master/nab/detectors/htmjava) | 56.8 | 50.7 | 61.4 |
| Numenta HTM*, no likelihood | 53.62 | 34.15 | 61.89 |

\* From NuPIC version 0.5.6 ([available on PyPI](https://pypi.python.org/pypi/nupic/0.5.6)).

† Algorithm was an entry to the [2016 NAB Competition](http://numenta.com/blog/2016/08/10/numenta-anomaly-benchmark-nab-competition-2016-winners/).

Installing NAB 1.0


## Corpus

The NAB corpus of 58 timeseries data files is designed to provide data for research
in streaming anomaly detection. It is comprised of both
real-world and artifical timeseries data containing labeled anomalous periods of behavior.

The majority of the data is real-world from a variety of sources such as AWS
server metrics, Twitter volume, advertisement clicking metrics, traffic data,
and more. All data is included in the repository, with more details in the [data
readme](https://github.com/numenta/NAB/tree/master/data). We are in the process
of adding more data, and actively searching for more data. Please contact us at
[[email protected]](mailto:[email protected]) if you have similar data (ideally with
known anomalies) that you would like to see incorporated into NAB.

The NAB version will be updated whenever new data (and corresponding labels) is
added to the corpus; NAB is currently in v1.0.


## Installing NAB 1.0
------------------

### Supported Platforms

- OSX 10.9 and higher
- Amazon Linux (via AMI)
- Linux

Other platforms may work but have not been tested.

Expand All @@ -125,34 +141,22 @@ Other platforms may work but have not been tested.

You need to manually install the following:

- [Python 2.7](https://www.python.org/download/)
- [Python 3](https://www.python.org/download/)
- [pip](https://pip.pypa.io/en/latest/installing.html)
- [NumPy](http://www.numpy.org/)
- [NuPIC](http://www.github.com/numenta/nupic) (only required if running the Numenta detector)

##### Download this repository

Use the Github links provided in the right sidebar.
#### Download this repository

##### Install the Python requirements
Use the Github [download links](https://github.com/htm-community/NAB/archive/master.zip) provided in the right sidebar,
or `git clone https://github.com/htm-community/NAB`

cd NAB
(sudo) pip install -r requirements.txt

This will install the required modules.

##### Install NAB
#### Install NAB

Recommended:

cd NAB
pip install . --user


> Note: If NuPIC is not already installed, the version specified in
`NAB/requirements.txt` will be installed. If NuPIC is already installed, it
will not be re-installed.


If you want to manage dependency versions yourself, you can skip dependencies
with:

Expand Down Expand Up @@ -198,13 +202,11 @@ follow the directions below to "Run a subset of NAB".

##### Run HTM with NAB

First make sure NuPIC is installed and working properly. Then:

cd /path/to/nab
python run.py -d numenta --detect --optimize --score --normalize
python run.py -d htmcore --detect --optimize --score --normalize

This will run the Numenta detector only and produce normalized scores. Note that
by default it tries to use all the cores on your machine. The above command
This will run the community HTM detector `htmcore` (to run Numenta's detector use `-d numenta`) and produce normalized scores.
Note that by default it tries to use all the cores on your machine. The above command
should take about 20-30 minutes on a current powerful laptop with 4-8 cores.
For debugging you can run subsets of the data files by modifying and specifying
specific label files (see section below). Please type:
Expand All @@ -229,11 +231,10 @@ the specific version of NuPIC (and associated nupic.core) that is noted in the

This will run everything and produce results files for all anomaly detection
methods. Several algorithms are included in the repo, such as the Numenta
HTM anomaly detection method, as well as methods from the [Etsy
Skyline](https://github.com/etsy/skyline) anomaly detection library, a sliding
window detector, Bayes Changepoint, and so on. This will also pass those results
files to the scoring script to generate final NAB scores. **Note**: this option
will take many many hours to run.
HTM anomaly detection method, as well as methods from the [Etsy Skyline](https://github.com/etsy/skyline) anomaly detection library,
a sliding window detector, Bayes Changepoint, and so on.
This will also pass those results files to the scoring script to generate final NAB scores.
**Note**: this option will take many many hours to run.

##### Run subset of NAB data files

Expand Down
16 changes: 15 additions & 1 deletion config/thresholds.json
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,20 @@
"threshold": 0.9947875976562506
}
},
"htmcore": {
"reward_low_FN_rate": {
"score": -1.6511067861578468,
"threshold": 0.5014187204194446
},
"reward_low_FP_rate": {
"score": 20.39992539458499,
"threshold": 0.5122987896930875
},
"standard": {
"score": 30.348893213842153,
"threshold": 0.5014187204194446
}
},
"htmjava": {
"reward_low_FN_rate": {
"score": 8.764037437134272,
Expand Down Expand Up @@ -209,4 +223,4 @@
"threshold": 1.0
}
}
}
}
32 changes: 32 additions & 0 deletions nab/detectors/htmcore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# HtmcoreDetector HTM implementation from [htm.core](https://github.com/htm-community/htm.core/)

This detector provides HTM implementation from [htm.core](https://github.com/htm-community/htm.core/),
which is an actively developed, community version of Numenta's [nupic.core](https://github.com/numenta/nupic.core).

This is a python 3 detector, called `htmcore`, as Numenta is switching NAB to python 3, this is the closes detector you can get to
`numenta`, `numentaTM` detectors.

`Htm.core` offers API and features similar and compatible with the official HTM implementations `nupic`, `nupic.core`. Although there
are significant speed and features improvements available! For more details please see [the htm.core project's README](https://github.com/htm-community/htm.core/blob/master/README.md)
Bugs and questions should also be reported there.

## Installation

`htmcore` detector is automatically installed with your `NAB` installation (`python setup.py install`),
so you don't have to do anything to have it available.

### Requirements to install

- [Python 3](https://www.python.org/download/)
- [Git](https://git-scm.com/downloads)


## Usage

Is the same as the default detectors, see [NAB README section Usage](https://github.com/htm-community/NAB/blob/master/README.md#usage)

### Example
Follow the instructions in the main README to run optimization, scoring, and normalization, e.g.:

`python run.py -d htmcore --optimize --score --normalize`

Empty file.
Loading