-
Notifications
You must be signed in to change notification settings - Fork 3k
[Monitor-OpenTelemetry-Exporter] Add 15 Second Warmup for Long-Interval Statsbeat #41229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
JacksonWeber
wants to merge
15
commits into
Azure:main
from
JacksonWeber:jacksonweber/add-statsbeat-warmup
Closed
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
41d5f1c
Add support for 15 second delay in long interval stats.
JacksonWeber 3d6add6
Update _statsbeat.py
JacksonWeber 182cdd6
Update CHANGELOG.md
JacksonWeber babb38d
Update sdk/monitor/azure-monitor-opentelemetry-exporter/azure/monitor…
JacksonWeber facfb41
Add constant for 15 second warmup value.
JacksonWeber 78fdf7b
Fix lint.
JacksonWeber 60749f4
Attempt to fix lint.
JacksonWeber 465e125
Update _statsbeat.py
JacksonWeber b74c20e
Update _statsbeat.py
JacksonWeber 729545c
Fix pylint.
JacksonWeber 8cf7735
Fix lint.
JacksonWeber 79f9782
Address PR comments.
JacksonWeber 793b7c6
Clean up tests.
JacksonWeber 25edfc8
Remove unneeded code.
JacksonWeber 31b0078
Add type information for the exporter param.
JacksonWeber File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,17 @@ | ||
# Copyright (c) Microsoft Corporation. All rights reserved. | ||
# Licensed under the MIT License. | ||
""" | ||
Internal module for statsbeat functionality in Azure Monitor OpenTelemetry exporter. | ||
This module handles collection and exporting of statsbeat metrics. | ||
""" | ||
import threading | ||
from threading import Timer | ||
|
||
from opentelemetry.sdk.metrics import MeterProvider | ||
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader | ||
from opentelemetry.sdk.resources import Resource | ||
|
||
from azure.monitor.opentelemetry.exporter._constants import _INITIAL_DELAY_SECONDS | ||
from azure.monitor.opentelemetry.exporter.statsbeat._exporter import _StatsBeatExporter | ||
from azure.monitor.opentelemetry.exporter.statsbeat._statsbeat_metrics import _StatsbeatMetrics | ||
from azure.monitor.opentelemetry.exporter.statsbeat._state import ( | ||
|
@@ -18,14 +24,42 @@ | |
_get_stats_short_export_interval, | ||
) | ||
|
||
|
||
_STATSBEAT_METRICS = None | ||
_STATSBEAT_LOCK = threading.Lock() | ||
|
||
def _delayed_export_statsbeat(): | ||
""" | ||
Function to perform a delayed export of statsbeat metrics | ||
after the initial delay period has passed. | ||
""" | ||
# Check if we're in a shutdown state | ||
with _STATSBEAT_STATE_LOCK: | ||
if _STATSBEAT_STATE["SHUTDOWN"]: | ||
return | ||
|
||
JacksonWeber marked this conversation as resolved.
Show resolved
Hide resolved
|
||
with _STATSBEAT_LOCK: | ||
if _STATSBEAT_METRICS is not None and _STATSBEAT_METRICS._meter_provider is not None: # pylint: disable=protected-access | ||
try: | ||
# Trigger a forced export of the metrics after the delay | ||
_STATSBEAT_METRICS._meter_provider.force_flush() # pylint: disable=protected-access | ||
except: # pylint: disable=bare-except | ||
pass | ||
|
||
# pylint: disable=global-statement | ||
# pylint: disable=protected-access | ||
def collect_statsbeat_metrics(exporter) -> None: | ||
""" | ||
Initialize and collect statsbeat metrics from the exporter. | ||
|
||
Sets up a periodic metric reader to export metrics and initializes required | ||
metrics for collecting statistics about the exporter's behavior. | ||
|
||
:param exporter: The exporter instance to collect metrics from. Contains information | ||
about instrumentation key, endpoint, and other configuration details. | ||
:type exporter: ~azure.monitor.opentelemetry.exporter.AzureMonitorLogExporter or | ||
~azure.monitor.opentelemetry.exporter.AzureMonitorTraceExporter or | ||
~azure.monitor.opentelemetry.exporter.AzureMonitorMetricExporter | ||
""" | ||
global _STATSBEAT_METRICS | ||
# Only start statsbeat if did not exist before | ||
if _STATSBEAT_METRICS is None: | ||
|
@@ -36,15 +70,17 @@ def collect_statsbeat_metrics(exporter) -> None: | |
) | ||
reader = PeriodicExportingMetricReader( | ||
statsbeat_exporter, | ||
export_interval_millis=_get_stats_short_export_interval() * 1000, # 15m by default | ||
export_interval_millis=_get_stats_short_export_interval() * 1000, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: There's still some of these random refactors and new variables? |
||
) | ||
mp = MeterProvider( | ||
metric_readers=[reader], | ||
resource=Resource.get_empty(), | ||
) | ||
# long_interval_threshold represents how many collects for short interval | ||
# should have passed before a long interval collect | ||
long_interval_threshold = _get_stats_long_export_interval() // _get_stats_short_export_interval() | ||
long_export = _get_stats_long_export_interval() | ||
short_export = _get_stats_short_export_interval() | ||
long_interval_threshold = long_export // short_export | ||
_STATSBEAT_METRICS = _StatsbeatMetrics( | ||
mp, | ||
exporter._instrumentation_key, | ||
|
@@ -54,13 +90,22 @@ def collect_statsbeat_metrics(exporter) -> None: | |
exporter._credential is not None, | ||
exporter._distro_version, | ||
) | ||
# Export some initial stats on program start | ||
mp.force_flush() | ||
# initialize non-initial stats | ||
_STATSBEAT_METRICS.init_non_initial_metrics() | ||
|
||
# Schedule a second export after the initial delay to send feature, instrumentation, | ||
# and attach statsbeat metrics (which have a 15-second delay) | ||
timer = Timer(_INITIAL_DELAY_SECONDS, _delayed_export_statsbeat) | ||
timer.daemon = True # Set as daemon so it doesn't block program exit | ||
timer.start() | ||
|
||
def shutdown_statsbeat_metrics() -> None: | ||
""" | ||
Shut down the statsbeat metrics collection system. | ||
|
||
This ensures proper cleanup of resources and marks the system as shut down | ||
to prevent further metric collection. | ||
""" | ||
global _STATSBEAT_METRICS | ||
shutdown_success = False | ||
if _STATSBEAT_METRICS is not None: | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,6 +7,7 @@ | |
import re | ||
import sys | ||
import threading | ||
import time | ||
from typing import Any, Dict, Iterable, List | ||
|
||
import requests # pylint: disable=networking-import-outside-azure-core-transport | ||
|
@@ -18,6 +19,7 @@ | |
from azure.monitor.opentelemetry.exporter._constants import ( | ||
_ATTACH_METRIC_NAME, | ||
_FEATURE_METRIC_NAME, | ||
_INITIAL_DELAY_SECONDS, | ||
_KUBERNETES_SERVICE_HOST, | ||
_REQ_DURATION_NAME, | ||
_REQ_EXCEPTION_NAME, | ||
|
@@ -138,6 +140,9 @@ def __init__( | |
_FEATURE_METRIC_NAME[0]: sys.maxsize, | ||
} | ||
self._long_interval_lock = threading.Lock() | ||
# Add startup timestamp and delay for initial statsbeat metrics | ||
self._startup_time = time.time() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these needed? |
||
self._initial_delay_seconds = _INITIAL_DELAY_SECONDS # 15 second delay for initial metrics | ||
_StatsbeatMetrics._COMMON_ATTRIBUTES["cikey"] = instrumentation_key | ||
if _utils._is_attach_enabled(): | ||
_StatsbeatMetrics._COMMON_ATTRIBUTES["attach"] = _AttachTypes.INTEGRATED | ||
|
@@ -266,6 +271,7 @@ def _get_feature_metric(self, options: CallbackOptions) -> Iterable[Observation] | |
return observations | ||
|
||
def _meets_long_interval_threshold(self, name) -> bool: | ||
# For feature and attach metrics, check if the initial delay has passed | ||
with self._long_interval_lock: | ||
# if long interval theshold not met, it is not time to export | ||
# statsbeat metrics that are long intervals | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 60s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing value - cleaning.