SQLServer Extended Event Handlers #20229

azhou-datadog · 2025-05-06T20:16:33Z

What does this PR do?

Implements the SQLServer Extended Event Handlers. This enables deobfuscation and query error visibility. This is a beefy PR so I will describe at a high level each component. See the RFC here

Configuration

Adds new xe_collection config section for two handlers: query_completions and query_errors
Each handler has enabled and collection_interval settings
Updates documentation for collect_raw_query_statement to mention XE events support - if you want RQT events sourced from the XE collection, you need to enable this config.

XE Handler

Implements XESessionBase for all XE session interaction
Handles connection to SQL Server and efficient XML event processing
Logic to read from ring buffer. Event file reading is not currently fully implemented, and will be a future improvement.
Provides standardized event normalization and payload generation
Calls SQL obfuscation on relevant fields and signature generation
Includes RQT (Raw Query Text) event generation for raw SQL collection
Uses timestamp-based filtering to avoid duplicates

Events emitted

Currently collects three types of query completion events, emitted as dbm_type=query_completion:

SQL batch completions
RPC completions
Module/procedure completions
Eventually we will also collect sp_statement_completed and sql_statement_completed.

Collects two types of error events, emitted as dbm_type=query_error:

SQL query errors with severity >= 11
Attention signals (query cancellations)
Eventually we will bring deadlock monitoring into this as well, but out of scope for now

Emits RQT (Raw Query Text) events when collect_raw_query_statement.enabled is true:

Contains original unobfuscated SQL statements with proper rate limiting
Includes both obfuscated and raw query signatures for future query correlation
Collects metadata about tables, commands, and query structure
Available for both query_completion and query_error events
The RQT events have a "statement" field, which represents the SQL executed. Some event types have multiple fields that can be interpreted as representing the sql statement. See _get_primary_sql_field implementations to see how each event type considers its primary sql field, which will get filled into the statement field.

Testing

Unit tests with XML fixtures covering the full XE collection pipeline
Integration tests verifying actual XE session interaction
Updated SQL scripts to create required XE sessions in test environment
Added validation of payload structure and field values

Motivation

Get query error and deobfuscated query visibility for sqlserver. This is a targeted feature for Rockstar, but greatly strengthens DBM's sqlserver offering.

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

codecov · 2025-05-06T20:30:05Z

Codecov Report

Attention: Patch coverage is 87.43268% with 140 lines in your changes missing coverage. Please review.

Project coverage is 91.04%. Comparing base (ea04835) to head (7750c81).
Report is 12 commits behind head on master.

Additional details and impacted files

Flag	Coverage Δ
active_directory	`?`
activemq	`?`
activemq_xml	`?`
aerospike	`?`
airflow	`?`
amazon_msk	`?`
ambari	`?`
apache	`?`
appgate_sdp	`?`
arangodb	`?`
argo_rollouts	`?`
argo_workflows	`?`
argocd	`?`
aspdotnet	`?`
avi_vantage	`?`
aws_neuron	`?`
azure_iot_edge	`?`
boundary	`?`
btrfs	`?`
cacti	`?`
calico	`?`
cassandra	`?`
cassandra_nodetool	`?`
celery	`?`
ceph	`?`
cert_manager	`?`
cilium	`?`
cisco_aci	`?`
citrix_hypervisor	`?`
clickhouse	`?`
cloud_foundry_api	`?`
cloudera	`?`
cockroachdb	`?`
confluent_platform	`?`
consul	`?`
coredns	`?`
couch	`?`
couchbase	`?`
crio	`?`
datadog_checks_base	`?`
datadog_checks_dev	`?`
datadog_checks_downloader	`?`
datadog_cluster_agent	`?`
dcgm	`?`
ddev	`?`
directory	`?`
disk	`?`
dns_check	`?`
dotnetclr	`?`
druid	`?`
duckdb	`?`
ecs_fargate	`?`
eks_fargate	`?`
elastic	`?`
envoy	`?`
esxi	`?`
etcd	`?`
exchange_server	`?`
external_dns	`?`
fluentd	`?`
fluxcd	`?`
fly_io	`?`
foundationdb	`?`
gearmand	`?`
gitlab	`?`
gitlab_runner	`?`
glusterfs	`?`
go_expvar	`?`
gunicorn	`?`
haproxy	`?`
harbor	`?`
hazelcast	`?`
hdfs_datanode	`?`
hdfs_namenode	`?`
hive	`?`
hivemq	`?`
http_check	`?`
hudi	`?`
ibm_ace	`?`
ibm_db2	`?`
ibm_i	`?`
ibm_mq	`?`
ibm_was	`?`
ignite	`?`
iis	`?`
impala	`?`
infiniband	`?`
istio	`?`
jboss_wildfly	`?`
kafka	`?`
kafka_consumer	`?`
karpenter	`?`
keda	`?`
kong	`?`
kube_apiserver_metrics	`?`
kube_controller_manager	`?`
kube_dns	`?`
kube_metrics_server	`?`
kube_proxy	`?`
kube_scheduler	`?`
kubeflow	`?`
kubelet	`?`
kubernetes_cluster_autoscaler	`?`
kubernetes_state	`?`
kubevirt_api	`?`
kubevirt_controller	`?`
kubevirt_handler	`?`
kyototycoon	`?`
kyverno	`?`
lighttpd	`?`
linkerd	`?`
linux_proc_extras	`?`
mapr	`?`
mapreduce	`?`
marathon	`?`
marklogic	`?`
mcache	`?`
mesos_master	`?`
milvus	`?`
mongo	`?`
mysql	`?`
nagios	`?`
network	`?`
nfsstat	`?`
nginx	`?`
nginx_ingress_controller	`?`
nvidia_nim	`?`
nvidia_triton	`?`
octopus_deploy	`?`
openldap	`?`
openmetrics	`?`
openstack	`?`
openstack_controller	`?`
pdh_check	`?`
pgbouncer	`?`
php_fpm	`?`
postfix	`?`
postgres	`?`
powerdns_recursor	`?`
presto	`?`
process	`?`
prometheus	`?`
proxysql	`?`
pulsar	`?`
quarkus	`?`
rabbitmq	`?`
ray	`?`
redisdb	`?`
rethinkdb	`?`
riak	`?`
riakcs	`?`
sap_hana	`?`
scylla	`?`
silk	`?`
silverstripe_cms	`?`
singlestore	`?`
slurm	`?`
snmp	`?`
snowflake	`?`
solr	`?`
sonarqube	`?`
sonatype_nexus	`?`
spark	`?`
sqlserver	`90.99% <87.43%> (+4.83%)`	⬆️
squid	`?`
ssh_check	`?`
statsd	`?`
strimzi	`?`
supabase	`?`
supervisord	`?`
system_core	`?`
system_swap	`?`
tcp_check	`?`
teamcity	`?`
tekton	`?`
teleport	`?`
temporal	`?`
teradata	`?`
tibco_ems	`?`
tls	`?`
tomcat	`?`
torchserve	`?`
traefik_mesh	`?`
traffic_server	`?`
twemproxy	`?`
twistlock	`?`
varnish	`?`
vault	`?`
velero	`?`
vertica	`?`
vllm	`?`
voltdb	`?`
vsphere	`?`
weaviate	`?`
weblogic	`?`
win32_event_log	`?`
windows_performance_counters	`?`
windows_service	`?`
wmi_check	`?`
yarn	`?`
zk	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sethsamuel

Overall looks good, really appreciate the thorough commenting. A few questions/comments then LGTM

sethsamuel · 2025-05-09T15:08:05Z

sqlserver/assets/configuration/spec.yaml

@@ -997,6 +999,50 @@ files:
      value:
        example: false
        type: boolean
+    - name: xe_collection
+      description: |
+        Available for Agent 7.67 and newer.


nit: We generally don't have comments like this since the spec file is in Git

sethsamuel · 2025-05-09T15:09:02Z

sqlserver/assets/configuration/spec.yaml

+          description: |
+            Configure the collection of completed queries from the `datadog_query_completions` XE session.
+
+            Set `query_completions.enabled` to `true` to enable the collection of query completion events.


nit: You can set these as descriptions of the properties themselves

this is actually really annoying - we can't set descriptions on properties of an "object". You can set descriptions on individual options. The reason I'm using object here is to get the nested configuration of

xe_collection

query_completion

enabled

query_error

enabled

We're not able to get this nesting if we use the "option" value. I'm going to keep this format unless we want to talk about this piece more.

sethsamuel · 2025-05-09T15:17:06Z