Feature: pipeline spark k-sigma anomaly filtering #13

FelipeTrost · 2024-10-28T19:52:53Z

Summary

Anomaly detection with the k-sigma method for spark.

This method either computes the mean and standard deviation, or the median and the median absolute deviation (MAD) of the data. The k-sigma method then filters out all data points that are k times the standard deviation away from the mean, or k times the MAD away from the median.

A future improvement could be to support multiple columns of the data.

Signed-off-by: Felipe Trost <[email protected]>

Signed-off-by: Dominik Hoffmann <[email protected]>

dh1542 · 2024-10-29T11:13:29Z

lftm. Merge

dh1542 · 2024-10-29T11:25:29Z

Test pipeline fails when naming the test file according to conventions (test_....) . Also happened in other PR.

dh1542 · 2024-10-29T12:28:01Z

Uses dependency that is only available in a later pyspark version:
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.median.html

Our tests with older pyspark versions fail. The pipeline beforehand ran through because the test file wasn't recognized as it was missing the (test_...) name.
@FelipeTrost can you have a look? Maybe you can find a workaround

FelipeTrost added 3 commits October 28, 2024 20:02

feat: spark k-sigma anomaly detection

aa84a72

Signed-off-by: Felipe Trost <[email protected]>

test: unit tests for spark k-sigma anomaly detection

7e7b8dd

Signed-off-by: Felipe Trost <[email protected]>

docs: accurate comment for k-sigma anomaly detection

0ff2fc2

Signed-off-by: Felipe Trost <[email protected]>

FelipeTrost changed the title ~~Pipeline: spark k-sigma anomaly filtering~~ Feature: pipeline spark k-sigma anomaly filtering Oct 28, 2024

#6-PR: Renamed test to suit naming convention

bd7f047

Signed-off-by: Dominik Hoffmann <[email protected]>

dh1542 self-requested a review October 29, 2024 11:13

dh1542 assigned FelipeTrost Oct 29, 2024

dh1542 merged commit 2b068c7 into main Oct 29, 2024
5 of 11 checks passed

dh1542 mentioned this pull request Oct 29, 2024

Revert "Feature: pipeline spark k-sigma anomaly filtering" #15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: pipeline spark k-sigma anomaly filtering #13

Feature: pipeline spark k-sigma anomaly filtering #13

FelipeTrost commented Oct 28, 2024

dh1542 commented Oct 29, 2024

dh1542 commented Oct 29, 2024 •

edited

Loading

dh1542 commented Oct 29, 2024 •

edited

Loading

Feature: pipeline spark k-sigma anomaly filtering #13

Feature: pipeline spark k-sigma anomaly filtering #13

Conversation

FelipeTrost commented Oct 28, 2024

Summary

dh1542 commented Oct 29, 2024

dh1542 commented Oct 29, 2024 • edited Loading

dh1542 commented Oct 29, 2024 • edited Loading

dh1542 commented Oct 29, 2024 •

edited

Loading

dh1542 commented Oct 29, 2024 •

edited

Loading