-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Distributed Tracing semantics #1066
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -109,12 +109,39 @@ namespace MAT_NS_BEGIN | |
/// <param name="deviceClass">Device class.</param> | ||
DECLARE_COMMONFIELD(DeviceClass, COMMONFIELDS_DEVICE_CLASS); | ||
|
||
/// <summary> | ||
/// <summary> | ||
/// Set the device orgId context information of telemetry event. | ||
/// </summary> | ||
/// <param name="deviceClass">Device orgId</param> | ||
DECLARE_COMMONFIELD(DeviceOrgId, COMMONFIELDS_DEVICE_ORGID); | ||
|
||
/// <summary> | ||
/// Set the W3C TraceContext TraceId information of telemetry event. | ||
/// </summary> | ||
/// <param name="traceId">TraceContext TraceId</param> | ||
DECLARE_COMMONFIELD(TraceId, COMMONFIELDS_DT_TRACEID); | ||
|
||
/// <summary> | ||
/// Set the W3C TraceContext SpanId information of telemetry event. | ||
/// </summary> | ||
/// <param name="spanId">TraceContext SpanId</param> | ||
DECLARE_COMMONFIELD(SpanId, COMMONFIELDS_DT_SPANID); | ||
|
||
/// <summary> | ||
/// Set the W3C TraceContext TraceFlags information of telemetry event. | ||
/// </summary> | ||
/// <param name="TraceFlags">TraceContext TraceFlags</param> | ||
virtual void SetTraceFlags(uint8_t traceFlags) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: the current spec only supports one flag (
(the last is probably best since it doesn't constrain the caller) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. W3C spec and Common Schema support An integer representation of the W3C TraceContext trace-flags. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes I am saying that we could have struct TraceFlags {
unsigned char sampled: 1,
unsigned char reserved: 7
};
...
virtual void SetTraceFlags(TraceFlags traceFlags)
...
TraceFlags flags;
flags.sampled = 1;
context.SetTraceFlags(flags); or rather enum TraceFlags {
Sampled: 1,
};
...
context.SetTraceFlags(TraceFlags::Sampled); rather than passing a raw uint8_t. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could later on add any additional method overloads we want. Indeed, we could add yet another method what you're proposing. I would rather go with an additional method that just does
Thus, I followed the W3C TraceContext guidelines in this regard. You can see that other implementations, e.g. Elastic Node Traceparent package - did not elaborate on the meaning of the We can add more decoration later. It's add-ons, not the necessity. I'd like to avoid the semantics that could possibly be later on borrowed from elsewhere, e.g. these semantics all exists in finer detail in OpenTelemetry API. The goal of this PR is to cover the bare min set of transport level detail, in order to enable the POC work that we're doing.. Incremental changes could come later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok sounds good not to over-specify. |
||
{ | ||
SetCommonField(COMMONFIELDS_DT_TRACEFLAGS, traceFlags); | ||
} | ||
|
||
/// <summary> | ||
/// Set the remote context (parent) SpanId information of telemetry event. | ||
/// </summary> | ||
/// <param name="ParentId">ParentId</param> | ||
DECLARE_COMMONFIELD(ParentId, COMMONFIELDS_DT_PARENTID); | ||
|
||
/// <summary> | ||
/// Set the network cost context information of telemetry event. | ||
/// </summary> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that
parentId
is meant to be used in events with Span semantics. Here we are adding it to every single event instead, which looks a bit weird. Are we doing this because supporting explicit Span semantics is more complicated? Would it cause problems if we decide to support explicit Span semantics later?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two separate logical destinations here:
first three element: traceId, spanId, traceFlags - is our current TraceContext, what we would've sent in
traceparent
header. That's what spec is describing.the last one: parentId - is what we receive as our parent, describes current-to-parent relationship, i.e. what initiated us to run. Our span's parent. Developer focusing on client telemetry debugging would see this on one event. This is not part of
traceparent
sent to the remote end, but part of client telemetry that the our code emits. That field allows us to stitch together the caller (our parent) and our current span into tree view.. Without this information attached on event - the Trace View explorer UX won't be able to stitch the two operations on time ladder chart..So:
spanid
-- is our current span id. Part of W3C TraceContext spec.parentId
-- is our parent, our remote span id.parentId
(prev) ->spanId
(current) -> then remote service would generate its own$spanId(of service)
(next)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How you would use it:
Obtain the
ILogger
associated with logging some service information, i.e.auto apiLogger = GetLogger("CatalogServiceAPILogger")
This is similar to how
ILogger
gets allocated for a component in .NET.Then you
SetContext(...)
on that logger's semantic context. All events coming from that logger are associated with the service interaction. This is all that's necessary for the Distributed Trace Explorer view to show your events coming from that logger on a time ladder chart.When your client starts interacting with catalog service via REST API call, you'd initiate your transaction with some initial seed for the
traceparent
header: yourtraceId
,spanId
,traceFlags
- the ones you set on context... And you pass it to remote service.Your client emits events using
apiLogger->LogEvent
-- all of these events get indexed by correlation platform and contain the necessary semantics to render them in a trace view.If you need to re-initialize the session, i.e. user logged out / logged in, you'd have the new transaction -
traceId
, and you'd update the context on that logger. All further events now have a new differenttraceId
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not every single event, not at LogManager level. Only for events on specific component logger, that are relevant to specific service interaction. However, if a client controls the initial seed of
traceId
, then the sametraceId
could be stamped on different logger contexts, for example:auto loggerA = GetLogger("serviceA-REST-Logger")
auto loggerB = GetLogger("serviceB-REST-Logger")
auto loggerC = GetLogger("serviceC-REST-Logger")
.Then you use
SetContext
at logger-level to stamp the sametraceId
, and their client-side interaction events would show up in one distributed tracing call tree / time ladder chart. Correspondingly, the far end, e.g. C# service - always stamps the same set of fields on all ofLog
andSpan
events coming from C# land. That's how it works, and it does stamp it on most (all?) service events too.Second example in
APITest
actually shows how you can selectively append the context directly on just one event without usingSetContext
.. So both ways are possible, logger context - all events for specific named logger (this would be the preferred model). And local (event property) example - stamping on just one event.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed more offline. Having a
parentId
for an event is a valid model even if the system doesn't follow the span model (see also comment in TraceContext standard). Supporting explicit Span semantics is sort of overlapping with what OpenTelemetry does and out of scope for now.