Description
I've been working on integrating the opentelemetry sdk into a core support library for a set of microservices. I've run across an issue where the type selection for an interface seems unnecessarily restrictive. Originally I thought this was a design change that just didn't make it to documentation. But after some investigation I don't think that's it. I'll explain the case and then my reasoning.
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317")
span_processor = SimpleSpanProcessor(otlp_exporter)
trace.set_tracer_provider(TracerProvider(active_span_processor=span_processor))
This will execute just fine. But will throw errors in type checking because SimpleSpanProcessor
is not one of sdk.trace.SynchronousMultiSpanProcessor
or sdk.trace.ConcurrentMultiSpanProcessor
.
This was initially confusing to me. It's not in any of the supporting documentation. Those are the only two descendants of sdk.trace.SpanProcessor
in that module. Why would one choose to explicitly list descendants instead of just picking the base class for type checks? Especially when the implementation of sdk.trace.Tracer
sticks to the interface defined at sdk.trace.SpanProcessor
.
These changes were contributed in #594. There seems to be some conversation happening out-of-thread that directed some of this. I am not exactly sure about the original design intent. The type specification seems to force an interface.
I don't know the state of the ecosystem at that time. But these days, opentelemetry-collector
can do all the multiplexing you want. With the benefit of being optimized for its own case. And in its own entirely separate process. I'm going to configure mine exactly like that. Sending traces to Jaeger and X-Ray at the same time.
sdk.trace.TracerProvider.add_span_processor()
acts as a proxy for the underlying sdk.trace.SynchronousMultiSpanProcessor.add_span_processor()
or sdk.trace.ConcurrentMultiSpanProcessor.add_span_processor()
. It also provides an implementation for sdk.trace.TracerProvider.shutdown()
that allows for an optional exit handler.
I do not think that this part of the interface is necessary, and it is not part of the api specification. I personally only need one TracerProvider
for my needs, and would prefer a shorter call stack for the sake of simplicity and performance. If someone needs more than one TracerProvider
then the existing MultiSpanProcessor implementations will work as they are.
And there is no need to proxy add_span_processor()
, shutdown()
, or force_flush()
. In fact, these function calls are not part of the api spec. Instead, one should invoke those methods on the SpanProcessor of choice. Perhaps we could makesdk.trace.TracerPovider._active_span_processor
public, add a property if we really want to control it, use get_tracer().span_processor
, or save a reference to one's chosen SpanProcessor
at construction time.
So, to summarize:
- The type requirements for
sdk.trace.TracerProvider._active_span_processor
should be relaxed tosdk.trace.SpanProcessor
.
I also think:
sdk.trace.TracerProvider
should scale back its implementation to the interface defined inapi.trace.TracerProvider
. It does not need to act as a proxy for the underlying SpanProcessor.- Consider removing both of these MultiSpanProcessor-s. Since we now have BatchSpanProcessor.
But the OpenTelemetry docs say that public interfaces to the sdk must remain stable and I'm not sure where the 'public' line is drawn here.
I realize there are several different issues happening here. But I think they're closely related and want to at least launch a discussion from here.
I am absolutely willing to submit a PR of the changes. But I'm new to this project. I want to make sure I understand the design goals.