This repository has been archived by the owner on Dec 6, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Composite Samplers #250
Composite Samplers #250
Changes from 10 commits
bfd33d9
611a140
16005a5
772ef28
395bd0c
fcb6a4b
9495d33
b37f054
7bcd40b
c72b06a
7cb67c1
ab8e962
47084f2
a6d3705
cae3b74
0d3ce18
9a7afb3
129c99c
10ba0c1
1c15c2b
a0bfb1b
e544794
635b216
06cc32d
5cc2174
d3c82a3
bfc3081
ed2a314
424012e
9c47d0f
05943e9
4c40e85
577537e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: While EachOf is fine, I am wondering if we can consider "AllOf" as an alternative name - it feels a bit stronger and contrasts well with the AnyOf model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, I had considered AllOf originally, but decided against it because it looks too much like AnyOf ... especially in small font and for tired eyes ... but definitely an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all delegated samplers are consistent, it still makes sense to set the
th
value even in the case of negative sampling decisions.One could perhaps say that, in general, for both
EachOf
andAnyOf
, theth
value must be set if and only if all delegated samplers are consistent (regardless of the sampling decision).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea, but didn't we expect that
th
is removed after a negative sampling decision? Similarly to how the p-values were handled?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. If we never put
th
to the trace state for negative sampling decisions, which I think makes sense, then again what is the purpose of an emptyth
field representing 0% probability? @jmacdThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to read-through the latest iteration of this document before coming back to this thread--
I don't think an "empty"
th
field should be meaningful except to represent unknown information, in which case the field should be absent, not present w/ an empty value.I guess the reason this might matter, the explicit unsetting of th-values for negative sampling decisions, is that for approach-1 spans where a
RECORD_ONLY
decision is reached, there will be a copy of the span for reading, in memory. When the span is accessed, both the formerth
value and the unsetth
value are somehow meaningful.I would like us to come back to the invariants we believe there are, including the
sampled
flag.If the
sampled
flag is NOT set andth
is non-empty, I believe we have an inconsistency. This is why I believe we should unsetth
.If the
sampled
flag is set and theth
is unset (or empty), I believe we have unknown sampling.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Would the names Chain or Sequence better convey the intent here? Conjunction feels a bit too abstract (and seems to have more interpretations than the chaining/pipeline implied here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both Chain and Sequence suggest, IMHO, a possibility to have more steps (delegates) than just two. And I do not want to deal with more than 2 delegates - this kind of composition is rarely needed for two, and I do not have a use case for more than two. It is a bit like IfThen construct (I'm not suggesting this name though, this could be too confusing).
I think Chain is acceptable, and I'm open to any new suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just Junction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditional
, maybe, orContingent
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a lot of text here to describe what is essentially boolean/tri-state conditions. We want to be able to AND and OR together a set of conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sampling configurations 1 and 2 are not equivalent. To demonstrate this, consider the following example:
Assume you have two types of root spans A and B, which should be sampled with probabilities of 40% and 20%, respectively. Overall, at most 750 spans should be sampled per minute. The corresponding configuration 1 for this would be
If the original rates are 1000/min for A and 2000/min for B, 325/min would be sampled for each of both, giving a total of exactly 750/min as desired.
Correspondingly, configuration 2 would be
This would give 250/min for A and 400/min for B, which gives a total of 650/min. Hence, the result differs from that of configuration 1.
To make configuration 2 equivalent, we must replace
EachOf
withConjunction
. Furthermore, we would need a weighted variant of theConsistentRateLimiting
sampler, which analyzes the recent distribution of theth
values given by the first sampler of the conjunction and which adapts theth
values in such a way that the desired sampling rate is met. Ideally, theConsistentWeightedRateLimiting
would increase the thresholds for A and B from 0.6 to 0.675 and from 0.8 to 0.8375, respectively. (Assuming thresholds are values from [0,1].) This would again give 325/min for both, like configuration 1.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The total of 650/min is a concern, I'm less worried about the individual distribution between type A and B. After all, Consistent samplers have very different behavior than leaky-bucket (balancing vs. proportional).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
Conjunction
will work better here. Which raises another question: do we needEachOf
at all (neither in Approach One nor Approach Two)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the
th
values only makes sense if the given sampler is used as the Second delegate inConjunction
.So how about generalizing this idea by defining
Weighted
orProportional
sampler which analyzes the population of therv
values in the population to sample. We cannot use the sampling probability value if it is influenced by the span'srv
, but it does not mean that it cannot be influenced by therv
values from the previous spans. Keeping track of the distribution of therv
values at a given sampling stage allows us not only to define a proportional variant ofRateLimitingSampler
, but also something likeProportionalFixedRate
sampler. It would always reduce the volume of spans according to the defined factor. Not much use in head sampling perhaps, but I believe it would be useful in tail sampling.More specifically, such a sampler would track the average
rv
of the incoming spans. For an unsampled population this average is expected to be 0.5. If the population has been pre-sampled, the spans with smallerrv
would have been eliminated, and the observed average would be higher. This would allow the samplers with proportional behavior to adjust their sampling probability accordingly (just as you proposed in your example).