Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (monitoring): [alerts] enable new recommended experience for aks clusters #435

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Commits on Nov 1, 2024

  1. Configuration menu
    Copy the full SHA
    828fd66 View commit details
    Browse the repository at this point in the history

Commits on Nov 6, 2024

  1. replace legacy Completed job count CI alert w/ KubeJobStale

    Pod level alert: at least one Job instance did not complete successfully
    for the last 6 hours.
    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    07d62c7 View commit details
    Browse the repository at this point in the history
  2. replace legacy Container CPU % CI alert w/ KubeContainerAverageCPUHigh

    Pod level alert: The average CPU usage per container exceeds 95%
    for the last 5 minutes.
    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    098242a View commit details
    Browse the repository at this point in the history
  3. replace legacy Container working set memory % CI alert w/ KubeContain…

    …erAverageMemoryHigh
    
    Pod level alert: The average memory usage per container exceeds 95% for
    the last 5 minutes
    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    0853d3b View commit details
    Browse the repository at this point in the history
  4. replace legacy Failed Pod counts CI alert w/ KubePodFailedState

    Pod level alert: One or more pods is in a failed state for the last 5
    minutes
    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    d1585a9 View commit details
    Browse the repository at this point in the history
  5. disabling legacy Node CPU % CI alert

    Platform level alert Node cpu percentage is replacing this
    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    68581e2 View commit details
    Browse the repository at this point in the history
  6. disabling legacy Node Disk Usage % CI alert

    no replacement available
    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    2b29202 View commit details
    Browse the repository at this point in the history
  7. replace legacy Node NotReady status CI alert w/ KubeNodeUnreachable

    Node level alert: A node has been unreachable for the last 15 minutes
    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    c8b46c9 View commit details
    Browse the repository at this point in the history
  8. disabling legacy Node working set memory % CI alert

    Platform level alert Node memory working set percentage is greater than 100% is
    replacing this
    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    dff578e View commit details
    Browse the repository at this point in the history
  9. replace legacy OOM Killed Containers CI alert w/ KubeContainerOOMKill…

    …edCount
    
    Cluster level alert: One or more containers within pods have been killed
    due to out-of-memory (OOM) events for the last 5 minutes
    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    d0becd6 View commit details
    Browse the repository at this point in the history
  10. replace legacy Persistent Volume Usage % CI alert w/ KubePVUsageHigh

    Pod level alert: The average usage of Persistent Volumes (PVs)
    on pod exceeds 80% for the last 15 minutes
    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    9adcf3b View commit details
    Browse the repository at this point in the history
  11. replace legacy Pods ready % CI alert w/ KubePodReadyStateLow

    Pod level alert: The percentage of pods in a ready state falls below 80%
    for any deployment or daemonset in the Kubernetes cluster for the last 5 minutes
    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    5c07b9a View commit details
    Browse the repository at this point in the history
  12. replace legacy Restarting container count CI alert w/ KubePodContaine…

    …rRestart
    
    Pod level alert: One or more containers within pods in the Kubernetes
    cluster have been restarted at least once within the last hour
    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    8aab44b View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    decf1db View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    36f9e57 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    779d85f View commit details
    Browse the repository at this point in the history
  16. remove unsed module

    ferantivero committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    40ab855 View commit details
    Browse the repository at this point in the history