Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: pipeline spark k-sigma anomaly filtering #13

Merged
merged 4 commits into from
Oct 29, 2024
Merged

Feature: pipeline spark k-sigma anomaly filtering #13

merged 4 commits into from
Oct 29, 2024

Conversation

FelipeTrost
Copy link

Summary

Anomaly detection with the k-sigma method for spark.

This method either computes the mean and standard deviation, or the median and the median absolute deviation (MAD) of the data. The k-sigma method then filters out all data points that are k times the standard deviation away from the mean, or k times the MAD away from the median.

A future improvement could be to support multiple columns of the data.

@FelipeTrost FelipeTrost changed the title Pipeline: spark k-sigma anomaly filtering Feature: pipeline spark k-sigma anomaly filtering Oct 28, 2024
@dh1542 dh1542 self-requested a review October 29, 2024 11:13
@dh1542
Copy link

dh1542 commented Oct 29, 2024

lftm. Merge

@dh1542 dh1542 merged commit 2b068c7 into main Oct 29, 2024
5 of 11 checks passed
@dh1542
Copy link

dh1542 commented Oct 29, 2024

Test pipeline fails when naming the test file according to conventions (test_....) . Also happened in other PR.

@dh1542
Copy link

dh1542 commented Oct 29, 2024

Uses dependency that is only available in a later pyspark version:
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.median.html

Our tests with older pyspark versions fail. The pipeline beforehand ran through because the test file wasn't recognized as it was missing the (test_...) name.
@FelipeTrost can you have a look? Maybe you can find a workaround

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants