Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Distributed Tracing semantics #1066

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions lib/api/ContextFieldsProvider.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,70 @@ namespace MAT_NS_BEGIN
record.extDevice[0].deviceClass = iter->second.as_string;
}

/* Distributed tracing fields.
*
* Part A extension:
* - ext.dt.traceId -> env_dt_traceId
* - ext.dt.spanId -> env_dt_spanId
* - ext.dt.traceFlags -> env_dt_traceFlags
*
* Part B mapping:
* - parentId -> parentId
*/
iter = m_commonContextFields.find(COMMONFIELDS_DT_TRACEID);

/* Only run the context mapping if TraceId field is set */
if (iter != m_commonContextFields.end())
{
/* Collector does not natively support ext.dt extension in CS4.0.
* When it does, fields could eventually be moved to ext.dt in
* CS4.1 or CS5.0. Currently we map fields to corresponding Geneva
* naming convention in order to enable the Distributed Trace View
* experience. This approach is future-proof, since it avoids the
* unnecessary remapping at 1DS Interchange.
*/
CsProtocol::Value strValue;
strValue.stringValue = iter->second.as_string;
// Map to Geneva Part A extension field name.
ext[COMMONFIELDS_DT_TRACEID] = strValue;

iter = m_commonContextFields.find(COMMONFIELDS_DT_SPANID);
if (iter != m_commonContextFields.end())
{
strValue.stringValue = iter->second.as_string;
// Map to Geneva Part A extension field name.
ext[COMMONFIELDS_DT_SPANID] = strValue;
}

iter = m_commonContextFields.find(COMMONFIELDS_DT_PARENTID);
if (iter != m_commonContextFields.end())
{
strValue.stringValue = iter->second.as_string;
// 1DS pipeline on export to Aria-Kusto does not differentiate
// between Part B and Part C properties. Since Part B (baseData)
// remains hidden below the radar - we populate the field in
// 'data' (Part C). That way the field can be consumed in Aria-Kusto,
// custom Kusto, and in Geneva.
//
// 1DS Interchange maps the field to the same 'parentId' column
// as populated by OpenTelemetry Geneva exporters. That allows us
// to stitch the client and service telemetry in one view.
ext[COMMONFIELDS_DT_PARENTID] = strValue;
}

iter = m_commonContextFields.find(COMMONFIELDS_DT_TRACEFLAGS);
if (iter != m_commonContextFields.end())
{
if (iter->second.type == EventProperty::TYPE_INT64)
{
CsProtocol::Value numValue;
numValue.type = CsProtocol::ValueKind::ValueInt64;
numValue.longValue = iter->second.as_int64;
ext[COMMONFIELDS_DT_TRACEFLAGS] = numValue;
}
}
}

iter = m_commonContextFields.find(COMMONFIELDS_COMMERCIAL_ID);
if (iter != m_commonContextFields.end())
{
Expand Down
9 changes: 9 additions & 0 deletions lib/include/public/CommonFields.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,15 @@
#define COMMONFIELDS_DEVICE_CLASS "DeviceInfo.Class"
#define COMMONFIELDS_DEVICE_ORGID "DeviceInfo.OrgId"

/* Ref. https://www.w3.org/TR/trace-context/#traceparent-header-field-values
- ParentId - parent of the current span received via remote context propagation.
- SpanId - current SpanId sent in 'traceparent' to remote service.
*/
#define COMMONFIELDS_DT_TRACEID "env_dt_traceId"
#define COMMONFIELDS_DT_SPANID "env_dt_spanId"
#define COMMONFIELDS_DT_TRACEFLAGS "env_dt_traceFlags"
#define COMMONFIELDS_DT_PARENTID "parentId"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that parentId is meant to be used in events with Span semantics. Here we are adding it to every single event instead, which looks a bit weird. Are we doing this because supporting explicit Span semantics is more complicated? Would it cause problems if we decide to support explicit Span semantics later?

Copy link
Contributor Author

@maxgolov maxgolov Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two separate logical destinations here:

  • first three element: traceId, spanId, traceFlags - is our current TraceContext, what we would've sent in traceparent header. That's what spec is describing.

  • the last one: parentId - is what we receive as our parent, describes current-to-parent relationship, i.e. what initiated us to run. Our span's parent. Developer focusing on client telemetry debugging would see this on one event. This is not part of traceparent sent to the remote end, but part of client telemetry that the our code emits. That field allows us to stitch together the caller (our parent) and our current span into tree view.. Without this information attached on event - the Trace View explorer UX won't be able to stitch the two operations on time ladder chart..

So:

  • spanid -- is our current span id. Part of W3C TraceContext spec.
  • parentId -- is our parent, our remote span id.

parentId (prev) -> spanId (current) -> then remote service would generate its own $spanId(of service) (next)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How you would use it:

  • Obtain the ILogger associated with logging some service information, i.e.
    auto apiLogger = GetLogger("CatalogServiceAPILogger")
    This is similar to how ILogger gets allocated for a component in .NET.

  • Then you SetContext(...) on that logger's semantic context. All events coming from that logger are associated with the service interaction. This is all that's necessary for the Distributed Trace Explorer view to show your events coming from that logger on a time ladder chart.

  • When your client starts interacting with catalog service via REST API call, you'd initiate your transaction with some initial seed for the traceparent header: your traceId, spanId, traceFlags - the ones you set on context... And you pass it to remote service.

  • Your client emits events using apiLogger->LogEvent -- all of these events get indexed by correlation platform and contain the necessary semantics to render them in a trace view.

  • If you need to re-initialize the session, i.e. user logged out / logged in, you'd have the new transaction - traceId, and you'd update the context on that logger. All further events now have a new different traceId.

Copy link
Contributor Author

@maxgolov maxgolov Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are adding it to every single event instead

Not every single event, not at LogManager level. Only for events on specific component logger, that are relevant to specific service interaction. However, if a client controls the initial seed of traceId, then the same traceId could be stamped on different logger contexts, for example:

  • auto loggerA = GetLogger("serviceA-REST-Logger")
  • auto loggerB = GetLogger("serviceB-REST-Logger")
  • auto loggerC = GetLogger("serviceC-REST-Logger").

Then you use SetContext at logger-level to stamp the same traceId, and their client-side interaction events would show up in one distributed tracing call tree / time ladder chart. Correspondingly, the far end, e.g. C# service - always stamps the same set of fields on all of Log and Span events coming from C# land. That's how it works, and it does stamp it on most (all?) service events too.

Second example in APITest actually shows how you can selectively append the context directly on just one event without using SetContext.. So both ways are possible, logger context - all events for specific named logger (this would be the preferred model). And local (event property) example - stamping on just one event.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed more offline. Having a parentId for an event is a valid model even if the system doesn't follow the span model (see also comment in TraceContext standard). Supporting explicit Span semantics is sort of overlapping with what OpenTelemetry does and out of scope for now.


#define COMMONFIELDS_NETWORK_PROVIDER "DeviceInfo.NetworkProvider"
#define COMMONFIELDS_NETWORK_TYPE "DeviceInfo.NetworkType"
#define COMMONFIELDS_NETWORK_COST "DeviceInfo.NetworkCost"
Expand Down
29 changes: 28 additions & 1 deletion lib/include/public/ISemanticContext.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -109,12 +109,39 @@ namespace MAT_NS_BEGIN
/// <param name="deviceClass">Device class.</param>
DECLARE_COMMONFIELD(DeviceClass, COMMONFIELDS_DEVICE_CLASS);

/// <summary>
/// <summary>
/// Set the device orgId context information of telemetry event.
/// </summary>
/// <param name="deviceClass">Device orgId</param>
DECLARE_COMMONFIELD(DeviceOrgId, COMMONFIELDS_DEVICE_ORGID);

/// <summary>
/// Set the W3C TraceContext TraceId information of telemetry event.
/// </summary>
/// <param name="traceId">TraceContext TraceId</param>
DECLARE_COMMONFIELD(TraceId, COMMONFIELDS_DT_TRACEID);

/// <summary>
/// Set the W3C TraceContext SpanId information of telemetry event.
/// </summary>
/// <param name="spanId">TraceContext SpanId</param>
DECLARE_COMMONFIELD(SpanId, COMMONFIELDS_DT_SPANID);

/// <summary>
/// Set the W3C TraceContext TraceFlags information of telemetry event.
/// </summary>
/// <param name="TraceFlags">TraceContext TraceFlags</param>
virtual void SetTraceFlags(uint8_t traceFlags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the current spec only supports one flag (sampled) so I wonder if we should rather:

  • add a SetSampled(bool) method, or
  • change this to take a bitfield, or
  • provide an enum or a TRACEFLAGS_SAMPLED constant to use here instead of '1'

(the last is probably best since it doesn't constrain the caller)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W3C spec and Common Schema support traceFlags:

image

An integer representation of the W3C TraceContext trace-flags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am saying that we could have

struct TraceFlags {
  unsigned char sampled: 1,
  unsigned char reserved: 7
};
...
virtual void SetTraceFlags(TraceFlags traceFlags)
...
TraceFlags flags;
flags.sampled = 1;
context.SetTraceFlags(flags);

or rather

enum TraceFlags {
  Sampled: 1,
};
...
context.SetTraceFlags(TraceFlags::Sampled);

rather than passing a raw uint8_t.

Copy link
Contributor Author

@maxgolov maxgolov Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could later on add any additional method overloads we want. Indeed, we could add yet another method what you're proposing. I would rather go with an additional method that just does SetTraceContext. My goal was to expose the bare min set of flags, as per spec:

  • in Common Schema spec this is int
  • in W3C TraceContext spec this described as byte / uint8_t

Thus, I followed the W3C TraceContext guidelines in this regard. You can see that other implementations, e.g. Elastic Node Traceparent package - did not elaborate on the meaning of the flags field. It's just flags, number eventually encoded as hex. No more no less than what the 1DS (being the transport layer) is expecting to get.

We can add more decoration later. It's add-ons, not the necessity. I'd like to avoid the semantics that could possibly be later on borrowed from elsewhere, e.g. these semantics all exists in finer detail in OpenTelemetry API. The goal of this PR is to cover the bare min set of transport level detail, in order to enable the POC work that we're doing.. Incremental changes could come later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good not to over-specify.

{
SetCommonField(COMMONFIELDS_DT_TRACEFLAGS, traceFlags);
}

/// <summary>
/// Set the remote context (parent) SpanId information of telemetry event.
/// </summary>
/// <param name="ParentId">ParentId</param>
DECLARE_COMMONFIELD(ParentId, COMMONFIELDS_DT_PARENTID);

/// <summary>
/// Set the network cost context information of telemetry event.
/// </summary>
Expand Down
21 changes: 21 additions & 0 deletions lib/shared/SemanticContextCX.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,27 @@ namespace Microsoft {
{
m_semanticContextCore->SetUserTimeZone(FromPlatformString(userTimeZone));
}

void SemanticContextImpl::TraceId::set(String^ traceId)
{
m_semanticContextCore->SetTraceId(FromPlatformString(traceId));
}

void SemanticContextImpl::SpanId::set(String ^ spanId)
{
m_semanticContextCore->SetSpanId(FromPlatformString(spanId));
}

void SemanticContextImpl::ParentId::set(String ^ parentId)
{
m_semanticContextCore->SetParentId(FromPlatformString(parentId));
}

void SemanticContextImpl::TraceFlags::set(unsigned char traceFlags)
{
m_semanticContextCore->SetTraceFlags(static_cast<uint8_t>(traceFlags));
}

}
}
}
Expand Down
38 changes: 36 additions & 2 deletions lib/shared/SemanticContextCX.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,23 @@ namespace Microsoft {
}

virtual void SetUserId(String^ userId, PiiKind piiKind) = 0;

property String^ TraceId {
virtual void set(String^ traceId) = 0;
}

property String^ SpanId {
virtual void set(String^ spanId) = 0;
}

property unsigned char TraceFlags
{
virtual void set(unsigned char traceFlags) = 0;
}

property String^ ParentId {
virtual void set(String^ parentId) = 0;
}
};

/// @cond INTERNAL_DOCS
Expand Down Expand Up @@ -206,8 +223,25 @@ namespace Microsoft {
{
virtual void set(String^ appExperimentIds);
}

virtual void SetEventExperimentIds(String^ eventName, String^ experimentIds);

virtual void SetEventExperimentIds(String^ eventName, String^ experimentIds);

property String^ TraceId {
virtual void set(String^ traceId);
}

property String^ SpanId {
virtual void set(String ^ spanId);
}

property unsigned char TraceFlags
{
virtual void set(unsigned char traceFlags);
}

property String^ ParentId {
virtual void set(String ^ parentId);
}

internal:
SemanticContextImpl(MAT::ISemanticContext* semanticContextCore);
Expand Down
43 changes: 43 additions & 0 deletions tests/functests/APITest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1455,6 +1455,49 @@ TEST(APITest, Custom_Decorator)
config.AddModule(CFG_MODULE_DECORATOR, nullptr);
}

TEST(APITest, TraceContext)
{
auto& config = LogManager::GetLogConfiguration();
LogManager::Initialize(TEST_TOKEN, config);
auto logger = LogManager::GetLogger();
auto context = logger->GetSemanticContext();

// Information that is propagated in W3C TraceContext header.
// These fields must be formatted as strings conforming to W3C spec:
// https://www.w3.org/TR/trace-context/#traceparent-header-field-values
context->SetTraceId("0af7651916cd43dd8448eb211c80319c");
context->SetSpanId("b9c7c989f97918e1");
context->SetTraceFlags(1);

// Information that allows to stitch this span to its remote parent, e.g.
// SpanId='00f067aa0ba902b7' is parent of SpanId='b9c7c989f97918e1'.
// ParentId is not sent as part of W3C TraceContext header, but it is
// logged via 1DS pipeline in order to allow attaching local telemetry
// to remote context.
context->SetParentId("00f067aa0ba902b7");

// All events emitted by this logger contain the same trace context.
EventProperties myEvent("MyEvent.With.TraceContext",
{
{"message", "Hello, W3CTraceContext!"}
});
logger->LogEvent(myEvent);

{
// Verify that individual events are capable of overridding the context.
EventProperties myEvent2("MyEvent2.With.TraceContext",
{
{"message", "Hello again, W3CTraceContext!"}
});
// This event borrows the logger context TraceId, is manually
// parented to previous SpanId and carries its own SpanId.
myEvent2.SetProperty(COMMONFIELDS_DT_PARENTID, "b9c7c989f97918e1");
myEvent2.SetProperty(COMMONFIELDS_DT_SPANID, "b9c7c989f97918c2");
logger->LogEvent(myEvent2);
}
LogManager::FlushAndTeardown();
}

#endif // HAVE_MAT_DEFAULT_HTTP_CLIENT

// TEST_PULL_ME_IN(APITest)
Expand Down