Skip to content

SQLServer Extended Event Handlers #20229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 131 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
87cb158
poc test first pass
azhou-datadog Apr 16, 2025
5ed38c3
log events
azhou-datadog Apr 17, 2025
3329ed8
logging
azhou-datadog Apr 17, 2025
72b1592
run_job_loop, not start
azhou-datadog Apr 17, 2025
3742c0b
params correction
azhou-datadog Apr 17, 2025
749458f
rpc_events xml parsing basic
azhou-datadog Apr 17, 2025
46deb8a
batch_events and share utils
azhou-datadog Apr 17, 2025
0ac008d
timestamp and timing implementation
azhou-datadog Apr 18, 2025
653d88f
event file implement
azhou-datadog Apr 18, 2025
6e71063
fix file path
azhou-datadog Apr 18, 2025
5546bbd
return complete xml
azhou-datadog Apr 18, 2025
feb1781
parse xml on client side
azhou-datadog Apr 21, 2025
a67591e
time parsing and query section seperately
azhou-datadog Apr 21, 2025
3f88433
convert string to bytes
azhou-datadog Apr 21, 2025
bc87a76
now test sqlserver parsing
azhou-datadog Apr 21, 2025
9c53d48
remove sqlserver parsing version
azhou-datadog Apr 21, 2025
0c11ffe
missing statement from rpc_events
azhou-datadog Apr 21, 2025
7b6f621
print event payload
azhou-datadog Apr 21, 2025
7dc8c3f
fix json parsing
azhou-datadog Apr 21, 2025
a0a85af
add event source to event payload
azhou-datadog Apr 21, 2025
a3490c0
implement error events
azhou-datadog Apr 21, 2025
2b5dc00
remove config
azhou-datadog Apr 21, 2025
7ee61ad
test start time timestamp calculation
azhou-datadog Apr 21, 2025
bee2f6e
make allen test check more loose
azhou-datadog Apr 21, 2025
7f1a516
log host and session id as well
azhou-datadog Apr 22, 2025
7c81019
delete log
azhou-datadog Apr 22, 2025
1b49409
delete correct log
azhou-datadog Apr 22, 2025
f157eff
use resolved hostname
azhou-datadog Apr 22, 2025
20486f7
try to detect ring buffer event loss
azhou-datadog Apr 22, 2025
7158822
more visibility on timestamp gaps
azhou-datadog Apr 22, 2025
5182ad4
do not limit max events for testing
azhou-datadog Apr 22, 2025
4c901c9
temp increase of max events
azhou-datadog Apr 22, 2025
3621e59
fill in dbm_type based on event session name
azhou-datadog Apr 22, 2025
cc624a4
implement sql statement events
azhou-datadog Apr 23, 2025
83018c3
implement sp statement events
azhou-datadog Apr 23, 2025
de60988
combine query completions to a single event session
azhou-datadog Apr 23, 2025
5c743ef
refactors
azhou-datadog Apr 23, 2025
a4f3a4f
implement attention events
azhou-datadog Apr 23, 2025
98339ae
remove joined event handlers, add query start timing data
azhou-datadog Apr 24, 2025
6b09ff0
clean up
azhou-datadog Apr 24, 2025
b8ab2f0
clean up
azhou-datadog Apr 24, 2025
47b518c
more clean up
azhou-datadog Apr 24, 2025
d6373ac
RQT and obfuscate queries first pass
azhou-datadog Apr 25, 2025
bcab373
get query completion timestamp into rqt event
azhou-datadog Apr 25, 2025
fdd96ea
better timing data
azhou-datadog Apr 25, 2025
4ddff8c
add more logging
azhou-datadog Apr 25, 2025
351773b
remove caching for now to get visibility for debugging
azhou-datadog Apr 25, 2025
842ac5a
calculate raw query signature
azhou-datadog Apr 25, 2025
da0383b
normalize timestamps
azhou-datadog Apr 25, 2025
ae0ae64
add xe_type
azhou-datadog Apr 25, 2025
0788348
fix event_name for error events
azhou-datadog Apr 25, 2025
76074ff
add query_signature to non-RQT event
azhou-datadog Apr 25, 2025
1061c3e
refactor obfuscating logic
azhou-datadog Apr 25, 2025
3b6bbac
clean up dead code
azhou-datadog Apr 25, 2025
52ba8b6
consolidate more code
azhou-datadog Apr 25, 2025
7b9a3f0
normalize timestamp for timestamp filtering
azhou-datadog Apr 25, 2025
31eaa9a
simplify timestamp filtering
azhou-datadog Apr 28, 2025
c77b6d1
fix timestamp gap logging
azhou-datadog Apr 28, 2025
a60284e
simplify event logging
azhou-datadog Apr 28, 2025
7531bb4
omit duration and query_start from query error RQT
azhou-datadog Apr 28, 2025
313b37a
omit in XE event too
azhou-datadog Apr 28, 2025
d512f07
refactors
azhou-datadog Apr 28, 2025
1e95e6f
missed path fix
azhou-datadog Apr 28, 2025
df08540
add sql fields back
azhou-datadog Apr 28, 2025
4527c8a
explicitly state sql fields expected for each event session
azhou-datadog Apr 28, 2025
aa4146f
move raw query signature calculation
azhou-datadog Apr 28, 2025
7536cb6
implement configuration
azhou-datadog Apr 28, 2025
3c2d84a
unit test first pass
azhou-datadog Apr 29, 2025
7056265
change imports
azhou-datadog Apr 29, 2025
bce1233
import change
azhou-datadog Apr 29, 2025
5a8a7b2
add handlers test
azhou-datadog Apr 29, 2025
34ac207
fix stub import
azhou-datadog Apr 29, 2025
efdd4a1
don't mock event handler
azhou-datadog Apr 29, 2025
cc5d249
mock keys return dict
azhou-datadog Apr 29, 2025
e197407
fix tests
azhou-datadog Apr 29, 2025
cd8c3b0
timestamp mock fixes
azhou-datadog Apr 29, 2025
1b96927
TimeMock class
azhou-datadog Apr 29, 2025
911a60d
avoid mocking time.time
azhou-datadog Apr 29, 2025
a807663
refactors
azhou-datadog Apr 29, 2025
6a16ec3
fix expected types in rqt event
azhou-datadog Apr 29, 2025
cd20039
module end test
azhou-datadog Apr 30, 2025
bb749f8
space in file name!!
azhou-datadog Apr 30, 2025
dbee3f5
add attention test
azhou-datadog Apr 30, 2025
d17df4f
fix attention test
azhou-datadog Apr 30, 2025
9b9b196
add integration test
azhou-datadog Apr 30, 2025
350ea9c
send events to datadog
azhou-datadog Apr 30, 2025
f69c9ac
check if sleep makes test consistent
azhou-datadog Apr 30, 2025
543bbb9
debug test
azhou-datadog Apr 30, 2025
3598748
fix cursor call
azhou-datadog Apr 30, 2025
5358d9a
grant select to datadog user
azhou-datadog Apr 30, 2025
c856ac5
grant to bob
azhou-datadog Apr 30, 2025
1b1a4c6
wrong setup
azhou-datadog Apr 30, 2025
bea0170
delete extra vars
azhou-datadog Apr 30, 2025
8ddc87c
log all calls
azhou-datadog Apr 30, 2025
79a886f
run check
azhou-datadog Apr 30, 2025
f728617
follow activity.py pattern
azhou-datadog Apr 30, 2025
fa1d339
fix event type
azhou-datadog Apr 30, 2025
df124d2
debug logging
azhou-datadog Apr 30, 2025
a3da104
fix config
azhou-datadog Apr 30, 2025
533f171
refactor test
azhou-datadog Apr 30, 2025
26d4b12
remove sleep
azhou-datadog Apr 30, 2025
65f76d3
enable cache, add timestamp test
azhou-datadog Apr 30, 2025
bdbc957
fix happy path test
azhou-datadog Apr 30, 2025
06fb391
linter fixes part 1
azhou-datadog Apr 30, 2025
02d1dd2
linters part 2
azhou-datadog Apr 30, 2025
e4328b2
concat strings for linter
azhou-datadog Apr 30, 2025
be9cb15
delete statement level event files
azhou-datadog Apr 30, 2025
5a6b7df
Add database instance to events
azhou-datadog May 2, 2025
ab99147
batch events for query_completion and query_errors
azhou-datadog May 5, 2025
789ddb0
fix unit test serialization and add test for checking batching logic
azhou-datadog May 5, 2025
cae8150
add method tracking and code clean up
azhou-datadog May 6, 2025
a781337
add change log
azhou-datadog May 6, 2025
a92bc0d
fix conditional logging
azhou-datadog May 6, 2025
7c3c773
remove timing data now that we have tracked methods
azhou-datadog May 6, 2025
4409099
log ANY first rqt event
azhou-datadog May 6, 2025
8d56f9c
validate config
azhou-datadog May 6, 2025
3441911
fix import
azhou-datadog May 7, 2025
809e135
license fix
azhou-datadog May 7, 2025
a913fd1
validate models
azhou-datadog May 7, 2025
eede6bc
make collection interval a number, not int
azhou-datadog May 7, 2025
4ec6a98
fix unit tests
azhou-datadog May 7, 2025
dace4e6
update all setup scripts to set up XE sessions
azhou-datadog May 7, 2025
a625e4d
add query visibility into error
azhou-datadog May 7, 2025
19a6803
clean up code
azhou-datadog May 7, 2025
4807906
add raw query signature to query completion and error
azhou-datadog May 8, 2025
0f8cfef
revert to execute with retries
azhou-datadog May 8, 2025
751af58
debug pipeline, only run on 2022 sqlserver
azhou-datadog May 8, 2025
ae7b15a
use convert syntax for adodbapi
azhou-datadog May 8, 2025
bfb1f1d
add back 2019 sqlserver version
azhou-datadog May 9, 2025
7bb8b32
address review comments
azhou-datadog May 9, 2025
7750c81
delete dead code
azhou-datadog May 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 55 additions & 1 deletion sqlserver/assets/configuration/spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -885,7 +885,9 @@ files:
display_default: false
- name: collect_raw_query_statement
description: |
Configure the collection of raw query statements in query activity and execution plans.
Configure the collection of raw query statements in query activity, execution plans, and XE events.
To collect raw query statements from XE events, set `xe_collection.query_completions.enabled` and
`xe_collection.query_errors.enabled` to `true`.
Raw query statements and execution plans may contain sensitive information (e.g., passwords)
or personally identifiable information in query text.
Enabling this option will allow the collection and ingestion of raw query statements and
Expand Down Expand Up @@ -997,6 +999,58 @@ files:
value:
example: false
type: boolean
- name: xe_collection
description: |
Configure the collection of events from XE (Extended Events) sessions. Requires `dbm: true`.

Set `collect_raw_query_statement.enabled` to `true` to collect the raw query statements for each event.
options:
- name: debug_sample_events
description: |
Set the maximum number of XE events to log in debug mode per collection. Used for troubleshooting.
This only affects logging when debug mode is enabled. Defaults to 3.
hidden: true
value:
type: integer
example: 3
display_default: 3
- name: query_completions
description: |
Configure the collection of completed queries from the `datadog_query_completions` XE session.

Set `query_completions.enabled` to `true` to enable the collection of query completion events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can set these as descriptions of the properties themselves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually really annoying - we can't set descriptions on properties of an "object". You can set descriptions on individual options. The reason I'm using object here is to get the nested configuration of

  • xe_collection
    • query_completion
      • enabled
    • query_error
      • enabled

We're not able to get this nesting if we use the "option" value. I'm going to keep this format unless we want to talk about this piece more.

Use `query_completions.collection_interval` to set the interval (in seconds) for the collection of
query completion events. Defaults to 10 seconds. If you intend on updating this value,
it is strongly recommended to use a consistent value throughout all SQL Server agent deployments.
value:
type: object
properties:
- name: enabled
type: boolean
example: false
- name: collection_interval
type: number
example: 10
display_default: 10
- name: query_errors
description: |
Configure the collection of query errors from the `datadog_query_errors` XE session.

Set `query_errors.enabled` to `true` to enable the collection of query error events.

Use `query_errors.collection_interval` to set the interval (in seconds) for the collection of
query error events. Defaults to 10 seconds. If you intend on updating this value,
it is strongly recommended to use a consistent value throughout all SQL Server agent deployments.
value:
type: object
properties:
- name: enabled
type: boolean
example: false
- name: collection_interval
type: number
example: 10
display_default: 10
- name: deadlocks_collection
description: |
Configure the collection of deadlock data.
Expand Down
2 changes: 2 additions & 0 deletions sqlserver/changelog.d/20229.added
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Added SQLServer Extended Event Handlers

1 change: 1 addition & 0 deletions sqlserver/datadog_checks/sqlserver/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ def __init__(self, init_config, instance, log):
self.activity_config: dict = instance.get('query_activity', {}) or {}
self.schema_config: dict = instance.get('schemas_collection', {}) or {}
self.deadlocks_config: dict = instance.get('deadlocks_collection', {}) or {}
self.xe_collection_config: dict = instance.get('xe_collection', {}) or {}
self.cloud_metadata: dict = {}
aws: dict = instance.get('aws', {}) or {}
gcp: dict = instance.get('gcp', {}) or {}
Expand Down
29 changes: 29 additions & 0 deletions sqlserver/datadog_checks/sqlserver/config_models/instance.py
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,34 @@ class SchemasCollection(BaseModel):
max_execution_time: Optional[float] = None


class QueryCompletions(BaseModel):
model_config = ConfigDict(
arbitrary_types_allowed=True,
frozen=True,
)
collection_interval: Optional[float] = Field(None, examples=[10])
enabled: Optional[bool] = Field(None, examples=[False])


class QueryErrors(BaseModel):
model_config = ConfigDict(
arbitrary_types_allowed=True,
frozen=True,
)
collection_interval: Optional[float] = Field(None, examples=[10])
enabled: Optional[bool] = Field(None, examples=[False])


class XeCollection(BaseModel):
model_config = ConfigDict(
arbitrary_types_allowed=True,
frozen=True,
)
debug_sample_events: Optional[int] = None
query_completions: Optional[QueryCompletions] = None
query_errors: Optional[QueryErrors] = None


class InstanceConfig(BaseModel):
model_config = ConfigDict(
validate_default=True,
Expand Down Expand Up @@ -406,6 +434,7 @@ class InstanceConfig(BaseModel):
tags: Optional[tuple[str, ...]] = None
use_global_custom_queries: Optional[str] = None
username: Optional[str] = None
xe_collection: Optional[XeCollection] = None

@model_validator(mode='before')
def _initial_validation(cls, values):
Expand Down
31 changes: 30 additions & 1 deletion sqlserver/datadog_checks/sqlserver/data/conf.yaml.example
Original file line number Diff line number Diff line change
Expand Up @@ -643,7 +643,9 @@ instances:
#
# keep_identifier_quotation: false

## Configure the collection of raw query statements in query activity and execution plans.
## Configure the collection of raw query statements in query activity, execution plans, and XE events.
## To collect raw query statements from XE events, set `xe_collection.query_completions.enabled` and
## `xe_collection.query_errors.enabled` to `true`.
## Raw query statements and execution plans may contain sensitive information (e.g., passwords)
## or personally identifiable information in query text.
## Enabling this option will allow the collection and ingestion of raw query statements and
Expand Down Expand Up @@ -797,6 +799,33 @@ instances:
#
# propagate_agent_tags: false

## Configure the collection of events from XE (Extended Events) sessions. Requires `dbm: true`.
##
## Set `collect_raw_query_statement.enabled` to `true` to collect the raw query statements for each event.
#
# xe_collection:

## @param query_completions - mapping - optional
## Configure the collection of completed queries from the `datadog_query_completions` XE session.
##
## Set `query_completions.enabled` to `true` to enable the collection of query completion events.
## Use `query_completions.collection_interval` to set the interval (in seconds) for the collection of
## query completion events. Defaults to 10 seconds. If you intend on updating this value,
## it is strongly recommended to use a consistent value throughout all SQL Server agent deployments.
#
# query_completions: {}

## @param query_errors - mapping - optional
## Configure the collection of query errors from the `datadog_query_errors` XE session.
##
## Set `query_errors.enabled` to `true` to enable the collection of query error events.
##
## Use `query_errors.collection_interval` to set the interval (in seconds) for the collection of
## query error events. Defaults to 10 seconds. If you intend on updating this value,
## it is strongly recommended to use a consistent value throughout all SQL Server agent deployments.
#
# query_errors: {}

## Configure the collection of deadlock data.
#
# deadlocks_collection:
Expand Down
26 changes: 26 additions & 0 deletions sqlserver/datadog_checks/sqlserver/sqlserver.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
from datadog_checks.sqlserver.statements import SqlserverStatementMetrics
from datadog_checks.sqlserver.stored_procedures import SqlserverProcedureMetrics
from datadog_checks.sqlserver.utils import Database, construct_use_statement, parse_sqlserver_major_version
from datadog_checks.sqlserver.xe_collection.registry import get_xe_session_handlers

try:
import datadog_agent
Expand Down Expand Up @@ -157,6 +158,9 @@ def __init__(self, name, init_config, instances):
self.agent_history = SqlserverAgentHistory(self, self._config)
self.deadlocks = Deadlocks(self, self._config)

# XE Session Handlers
self.xe_session_handlers = []

# _database_instance_emitted: limit the collection and transmission of the database instance metadata
self._database_instance_emitted = TTLCache(
maxsize=1,
Expand All @@ -169,6 +173,7 @@ def __init__(self, name, init_config, instances):
self.check_initializations.append(self.load_static_information)
self.check_initializations.append(self.config_checks)
self.check_initializations.append(self.make_metric_list_to_collect)
self.check_initializations.append(self.initialize_xe_session_handlers)

# Query declarations
self._query_manager = None
Expand All @@ -177,6 +182,13 @@ def __init__(self, name, init_config, instances):

self._schemas = Schemas(self, self._config)

def initialize_xe_session_handlers(self):
"""Initialize the XE session handlers without starting them"""
# Initialize XE session handlers if not already initialized
if not self.xe_session_handlers:
self.xe_session_handlers = get_xe_session_handlers(self, self._config)
self.log.debug("Initialized %d XE session handlers", len(self.xe_session_handlers))

def cancel(self):
self.statement_metrics.cancel()
self.procedure_metrics.cancel()
Expand All @@ -185,6 +197,13 @@ def cancel(self):
self._schemas.cancel()
self.deadlocks.cancel()

# Cancel all XE session handlers
for handler in self.xe_session_handlers:
try:
handler.cancel()
except Exception as e:
self.log.error("Error canceling XE session handler for %s: %s", handler.session_name, e)

def config_checks(self):
if self._config.autodiscovery and self.instance.get("database"):
self.log.warning(
Expand Down Expand Up @@ -810,6 +829,13 @@ def check(self, _):
self.sql_metadata.run_job_loop(self.tags)
self._schemas.run_job_loop(self.tags)
self.deadlocks.run_job_loop(self.tags)

# Run XE session handlers
for handler in self.xe_session_handlers:
try:
handler.run_job_loop(self.tags)
except Exception as e:
self.log.error("Error running XE session handler for %s: %s", handler.session_name, e)
else:
self.log.debug("Skipping check")

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# (C) Datadog, Inc. 2025-present
# All rights reserved
# Licensed under a 3-clause BSD style license (see LICENSE)
Loading
Loading