Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asynchronous resource providers execution #19

Closed
SylvainJuge opened this issue Oct 13, 2023 · 3 comments
Closed

asynchronous resource providers execution #19

SylvainJuge opened this issue Oct 13, 2023 · 3 comments
Assignees

Comments

@SylvainJuge
Copy link
Member

The resource providers that are used in #4 are SDK providers, and those are required to be immutable by the SDK by design (in the sense that it's not something that we might change easily).

As a consequence, adding some cloud resource providers like GCP or AWS should work, but this will further delay the agent (and likely the application) startup. Those type of metadata endpoints seem to have quite often long response times (in the order of seconds), which makes the problem even worse when trying multiple of them in a sequence for "automatic" cloud provider detection.

This is an additional challenge to having a suitable implementation to use, so far only the AWS implementation is available in the opentelemetry-contrib-java repo.


implementation idea

  • remove the cloud resource providers from automatic service loader configuration, thus they aren't automatically executed by the SDK on agent startup.
  • execute them explicitly and asynchronously when the agent starts, we might even start them in parallel or apply heuristics (for example on environment variables) to prioritize detection.
  • buffering all the data is required while the cloud resource providers execute
  • wrap all the SpanData at export time (in the span/metric exporter) to send with the updated resource once it's available (given it's expected to be immutable, it should be easily cacheable and avoid too much allocation).
@jackshirazi
Copy link
Contributor

there's a question about delaying the first send of data until we have it all, or whether the first send should be with the data available at that time and later sends fully populated. The delay would be needed for a fully otel compliant agent, but I think we should be optimizing for Elastic, so we can send partially populated data and update later

@SylvainJuge
Copy link
Member Author

I also wonder if this would be a case where we could leverage an agent uniquely generated ID, which could then be used combined with some time-based buffering on apm-server side. While a bit hack-ish that sounds definitely doable, however keeping buffering in the agent would likely remain the simplest solution here.

@SylvainJuge
Copy link
Member Author

For now we have a simple async execution of resource providers in the agent, so we can consider this solved for now until further improvement is needed (for example making it configurable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants