Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement] Correctly nest spans when tracing okhttp requests #1069

Closed
wants to merge 3 commits into from

Conversation

robert3005
Copy link
Contributor

Before this PR

OkHttp: execute-enqueue would measure time in dispatcher pool and acquire-limiter

After this PR

==COMMIT_MSG==
OkHttp tracing spans get correctly nested where acquire-limiter and okhttp: execute don't overlap and okhttp: execute-enqueue measures just the queuing time.
==COMMIT_MSG==

Possible downsides?

Current spans are not easy to understand - this attempts to at least make sure they measure the right thing.

@robert3005 robert3005 requested a review from a team as a code owner April 18, 2019 12:57
AsyncTracer tracerTag = chain.request().tag(AsyncTracer.class);
if (tracerTag == null) {
AsyncTracerTag tracerTag = chain.request().tag(AsyncTracerTag.class);
if (tracerTag == null && tracerTag.asyncTracer() == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised errorprone didn't catch this, but then I remembered we don't ahve uber's nullaway enabled!

@iamdanfox
Copy link
Contributor

I'm a bit late to this tracing game so I added some comments to try to capture the state of the world. Also here is the 'before this PR' state of the world:

Screenshot 2019-04-18 at 18 16 51

enqueueInternal(callback);
// tracer.withTrace completes the 'acquire-limiter-enqueue'
// tracer.withTrace starts the 'acquire-limiter-run' span
tracer.withTrace(() -> {
Copy link
Contributor

@iamdanfox iamdanfox Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think before merging this it would be nice to come up with a solution that doesn't require us to include the acquire-limiter-run and ignore span hacks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my thought as well - this pr is here to point to intended behaviour

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the hack here needed?
It seems like leaving the old behavior would suffice to leave the acquire-limiter-run span empty and the other spans with the appropriate parents:

tracer.withTrace(() -> null);
enqueueInternal(callback);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't work - the callback is executed on different thread and different trace. You need to add spans to the original trace

@pkoenig10
Copy link
Member

@iamdanfox in your example, the execute-run is not actually nested under acquire-limiter-run. The alignment occurs on the left side of the div rather than the left side of the text.

@pkoenig10
Copy link
Member

pkoenig10 commented Apr 20, 2019

This implementation still nests all async spans directly under the caller's span. This makes it difficult to group spans for a request together when multiple requests are being made simultaneously.

I've put up an alternative and (IMO) simpler implementation in #1073. In that PR I updated the test before making the change to clearly show the span behavior before and after. I would appreciate if someone could take a look.

@robert3005
Copy link
Contributor Author

Unfortunately the implementations aren't equivalent - this implementation tracks time spent in dispatcher queue while the other doesn't. I am happy to add another top level span but we originally thought it's not helpful, seems we have been wrong.

@robert3005 robert3005 force-pushed the rk/different-spans branch from 19d5511 to ed3f6f3 Compare May 23, 2019 15:10
@robert3005 robert3005 closed this May 31, 2019
@robert3005
Copy link
Contributor Author

If you feel this is useful feel free to reopen - it seems that we aren't concerned with this issue right now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants