Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Composite Samplers #250

Closed
wants to merge 33 commits into from
Closed
Changes from 10 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
bfd33d9
Creating "Composite Samplers" OTEP.
PeterF778 Feb 29, 2024
611a140
File rename
PeterF778 Feb 29, 2024
16005a5
Merge branch 'main' into composite_samplers
PeterF778 Feb 29, 2024
772ef28
Creating "Composite Samplers" OTEP.
PeterF778 Feb 29, 2024
395bd0c
File rename
PeterF778 Feb 29, 2024
fcb6a4b
Updated version, with examples
PeterF778 Mar 5, 2024
9495d33
Merge branch 'composite_samplers' of https://github.com/PeterF778/ote…
PeterF778 Mar 5, 2024
b37f054
Fixing markdown-lint errors
PeterF778 Mar 5, 2024
7bcd40b
Fixing more markdown-lint errors
PeterF778 Mar 5, 2024
c72b06a
Modfied the behavior of Conjunction sampler.
PeterF778 Mar 12, 2024
7cb67c1
Added description for Approach Two.
PeterF778 Mar 15, 2024
ab8e962
Added description of ConsistentRateLimited, another example, and a fe…
PeterF778 Mar 21, 2024
47084f2
Changing description of sampler as suggested
PeterF778 Apr 9, 2024
a6d3705
Update text/0250-Composite_Samplers.md
PeterF778 Apr 19, 2024
cae3b74
Update text/0250-Composite_Samplers.md
PeterF778 Apr 19, 2024
0d3ce18
Update text/0250-Composite_Samplers.md
PeterF778 Apr 19, 2024
9a7afb3
Update text/0250-Composite_Samplers.md
PeterF778 Apr 22, 2024
129c99c
Update text/0250-Composite_Samplers.md
PeterF778 Apr 22, 2024
10ba0c1
Describing a two-phase processing for Approach Two.
PeterF778 May 1, 2024
1c15c2b
Fixing a typo
PeterF778 May 1, 2024
a0bfb1b
Minor adjustments and corrections, introducing Composable Sampler.
PeterF778 May 3, 2024
e544794
Adding more references
PeterF778 May 3, 2024
635b216
Update text/0250-Composite_Samplers.md
PeterF778 Jun 6, 2024
06cc32d
Update text/0250-Composite_Samplers.md
PeterF778 Jun 6, 2024
5cc2174
Update text/0250-Composite_Samplers.md
PeterF778 Jun 6, 2024
d3c82a3
Changing SamplingAdvice to SamplingIntent.
PeterF778 Jun 11, 2024
bfc3081
Removed EachOf and the incorrect example using ConsistentRateLimiting…
PeterF778 Jun 19, 2024
ed2a314
Removing special handling of THRESHOLD from Approach One.
PeterF778 Jun 21, 2024
424012e
Merge branch 'main' into composite_samplers
PeterF778 Jul 26, 2024
9c47d0f
Minor changes, whitespace fixes.
PeterF778 Jul 26, 2024
05943e9
Fine tuning the behavior of GetAttributes for AnyOf sampler.
PeterF778 Aug 28, 2024
4c40e85
Added link for the PR with the prototype implementation for Java.
PeterF778 Aug 30, 2024
577537e
Updated the link to prototype implementation.
PeterF778 Sep 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
204 changes: 204 additions & 0 deletions text/0250-Composite_Samplers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Composite Samplers Proposal

This proposal addresses head-based sampling as described by the [Open Telemetry SDK](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampling).
It introduces additional _composite samplers_. Composite samplers use other samplers (delegates) to make sampling decisions. The composite samplers invoke the delegate samplers, but eventually make the final call.

The new samplers proposed here are compatible with [Threshold propagation in trace state (OTEP 235)](https://github.com/open-telemetry/oteps/pull/235) as used by Consistent Probability samplers. Also see Draft PR 3910 [Probability Samplers based on W3C Trace Context Level 2](https://github.com/open-telemetry/opentelemetry-specification/pull/3910).

## Motivation

The need for configuring head sampling has been explicitly or implicitly indicated in several discussions, both within the [Samplig SIG](https://docs.google.com/document/d/1gASMhmxNt9qCa8czEMheGlUW2xpORiYoD7dBD7aNtbQ) and in the wider community. Some of the discussions are going back a number of years, see for example
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved

- issue [173](https://github.com/open-telemetry/opentelemetry-specification/issues/173): Way to ignore healthcheck traces when using automatic tracer across all languages?
- issue [1060](https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1060): Exclude URLs from Tracing
- issue [1844](https://github.com/open-telemetry/opentelemetry-specification/issues/1844): Composite Sampler

## The Goal

The goal of this proposal is to help creating advanced sampling configurations using pre-defined building blocks. Let's consider the following example of sampling requirements. It is believed that many users will have requirements following the same pattern. Most notable elements here are trace classification based on target URL, some spans requiring special handling, and putting a sanity cap on the total volume of exported spans.

### Example

Head-based sampling requirements.

- for root spans:
- drop all `/healthcheck` requests
- capture all `/checkout` requests
- capture 25% of all other requests
- for non-root spans
- follow the parent sampling decision
- however, capture all calls to service `/foo` (even if the trace will be incomplete)
- in any case, do not exceed 1000 spans/minute

## New Samplers

### AnyOf

`AnyOf` is a composite sampler which takes a non-empty list of Samplers (delegates) as the argument.

Upon invocation of its `shouldSample` method, it MUST go through the whole list and invoke `shouldSample` method on each delegate sampler, passing the same arguments as received.

`AnyOf` sampler MUST return a `SamplingResult` which is constructed as follows:

- The sampling Decision is based on the delegate sampling Decisions. If all of the delegate Decisions are `DROP`, the composite sampler MUST return `DROP` Decision as well.
If any of the delegate Decisions is `RECORD_AND_SAMPLE`, the composite sampler MUST return `RECORD_AND_SAMPLE` Decision.
Otherwise, if any of the delegate Decisions is `RECORD_ONLY`, the composite sampler MUST return `RECORD_ONLY` Decision.
- The set of span Attributes to be added to the `Span` is the sum of the sets of Attributes as provided by the delegate samplers within their `SamplingResults`s.
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved
- The `TraceState` to be used with the new `Span` is obtained by cumulatively applying all the potential modfications of the parent `TraceState` by the delegate samplers, with special handling of the `th` sub-key (the sampling rejection `THRESHOLD`) for the `ot` entry as described below.

If the final sampling Decision is `DROP` or `RECORD_ONLY`, the `th` entry MUST be removed.
If the sampling Decision is `RECORD_AND_SAMPLE`, and there's no `th` entry in any of the `TraceState` provided by the delegates that decided to `RECORD_AND_SAMPLE`, the `th` entry MUST be also removed.
Otherwise, the resulting `TraceState` MUST contain `th` entry with the `THRESHOLD` value being the minimum of all the `THRESHOLD` values as reported by those delegates that decided to `RECORD_AND_SAMPLE`.

Each delegate sampler MUST be given a chance to participate in the sampling decision as described above and MUST see the same _parent_ state. The order of the delegate samplers does not matter, as long as there's no overlap in the Attribute Keys or the trace state keys (other than `th`) that they use.
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved

### EachOf

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: While EachOf is fine, I am wondering if we can consider "AllOf" as an alternative name - it feels a bit stronger and contrasts well with the AnyOf model.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I had considered AllOf originally, but decided against it because it looks too much like AnyOf ... especially in small font and for tired eyes ... but definitely an option.


`EachOf` is a composite sampler which takes a non-empty list of Samplers (delegates) as the argument.

Upon invocation of its `shouldSample` method, it MUST go through the whole list and invoke `shouldSample` method on each delegate sampler, passing the same arguments as received.

`EachOf` sampler MUST return a `SamplingResult` which is constructed as follows:

- The sampling Decision is based on the delegate sampling Decisions. If all of the delegate Decisions are `RECORD_AND_SAMPLE`, the composite sampler MUST return `RECORD_AND_SAMPLE` Decision as well.
If any of the delegate Decisions is `DROP`, the composite sampler MUST return `DROP` Decision.
Otherwise, if any of the delegate Decisions is `RECORD_ONLY`, the composite sampler MUST return `RECORD_ONLY` Decision.
- The set of span Attributes to be added to the `Span` is the sum of the sets of Attributes as provided by the delegate samplers within their `SamplingResults`s.
- The `TraceState` to be used with the new `Span` is obtained by cumulatively applying all the potential modfications of the parent `TraceState` by the delegate samplers, with special handling of the `th` sub-key (the sampling rejection `THRESHOLD`) for the `ot` entry as described below.

If the final sampling Decision is `DROP` or `RECORD_ONLY`, the `th` entry MUST be removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all delegated samplers are consistent, it still makes sense to set the th value even in the case of negative sampling decisions.

One could perhaps say that, in general, for both EachOf and AnyOf, the th value must be set if and only if all delegated samplers are consistent (regardless of the sampling decision).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, but didn't we expect that th is removed after a negative sampling decision? Similarly to how the p-values were handled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If we never put th to the trace state for negative sampling decisions, which I think makes sense, then again what is the purpose of an empty th field representing 0% probability? @jmacd

Copy link
Contributor

@jmacd jmacd Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to read-through the latest iteration of this document before coming back to this thread--

I don't think an "empty" th field should be meaningful except to represent unknown information, in which case the field should be absent, not present w/ an empty value.

I guess the reason this might matter, the explicit unsetting of th-values for negative sampling decisions, is that for approach-1 spans where a RECORD_ONLY decision is reached, there will be a copy of the span for reading, in memory. When the span is accessed, both the former th value and the unset th value are somehow meaningful.

I would like us to come back to the invariants we believe there are, including the sampled flag.

If the sampled flag is NOT set and th is non-empty, I believe we have an inconsistency. This is why I believe we should unset th.

If the sampled flag is set and the th is unset (or empty), I believe we have unknown sampling.

If the sampling Decision is `RECORD_AND_SAMPLE`, and there's no `th` entry in any of the `TraceState` provided by the delegates that decided to `RECORD_AND_SAMPLE`, the `th` entry MUST be also removed.
Otherwise, the resulting `TraceState` MUST contain `th` entry with the `THRESHOLD` value being the maximum of all the `THRESHOLD` values as reported by those delegates that decided to `RECORD_AND_SAMPLE`.
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved

Each delegate sampler MUST be given a chance to participate in the sampling decision as described above and MUST see the same _parent_ state. The order of the delegate samplers does not matter, as long as there's no overlap in the Attribute Keys or the trace state keys (other than `th`) that they use.

### Conjunction

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Would the names Chain or Sequence better convey the intent here? Conjunction feels a bit too abstract (and seems to have more interpretations than the chaining/pipeline implied here).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Chain and Sequence suggest, IMHO, a possibility to have more steps (delegates) than just two. And I do not want to deal with more than 2 delegates - this kind of composition is rarely needed for two, and I do not have a use case for more than two. It is a bit like IfThen construct (I'm not suggesting this name though, this could be too confusing).
I think Chain is acceptable, and I'm open to any new suggestions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just Junction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditional, maybe, or Contingent?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a lot of text here to describe what is essentially boolean/tri-state conditions. We want to be able to AND and OR together a set of conditions.


`Conjunction` is a composite sampler which takes two Samplers (delegates) as the arguments. These delegate samplers will be hereby referenced as First and Second.
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved

Upon invocation of its `shouldSample` method, the Conjunction sampler MUST invoke `shouldSample` method on the First sampler, passing the same arguments as received, and examine the received sampling Decision. Upon receiving `DROP` or `RECORD_ONLY` decision it MUST return the SamplingResult from the First sampler, and it MUST NOT proceed with querying the Second sampler. If the sampling decision from the First sampler is `RECORD_AND_SAMPLE`, the Conjunction sampler MUST invoke `shouldSample` method on the Second sampler, effectively passing the `TraceState` received from the First sampler as the parent trace state.
jmacd marked this conversation as resolved.
Show resolved Hide resolved
If the sampling Decision from the Second sampler is `RECORD_AND_SAMPLE`, the Conjunction sampler MUST return a `SamplingResult` which is constructed as follows:

- The sampling Decision is `RECORD_AND_SAMPLE`.
- The set of span Attributes to be added to the `Span` is the sum of the sets of Attributes as provided by the First and the Second sampler.
- The `TraceState` to be used with the new `Span` is obtained by cumulatively applying the potential modfications from the First and Second sampler, with special handling of the `th` sub-key (the sampling rejection `THRESHOLD`) for the `ot` entry as described below.

If both First and Second samplers provided `th` entry in the returned `TraceState`, and the value of the `THRESHOLD` from the First sampler is `0`, then the resulting `TraceState` MUST contain `th` entry with the `THRESHOLD`as provided by the Second sampler. Otherwise, the `th` entry MUST be removed.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

If the sampling Decision from the Second sampler is `RECORD_ONLY` or `DROP`, the Conjunction sampler MUST return a `SamplingResult` which is constructed as follows:

- The sampling Decision is `DROP`.
- The set of span Attributes to be added to the `Span` is as provided by the First sampler.
- The `TraceState` to be used with the new `Span` is the `TraceState` provided by the First sampler, but with the `th` entry removed.

### RuleBased

`RuleBased` is a composite sampler which performs `Span` categorization (e.g. when sampling decision depends on Span attributes) and sampling.
The Spans can be grouped into separate categories, and each category can use a different Sampler.
Categorization of Spans is aided by `Predicates`.

#### Predicate

The Predicates represent logical expressions which can access Span Attributes (or anything else available when the sampling decision is to be made), and perform tests on the accessible values.
For example, one can test if the target URL for a SERVER span matches a given pattern.
`Predicate` interface allows users to create custom categories based on information that is available at the time of making the sampling decision.

##### SpanMatches

This is a routine/function/method for `Predicate`, which returns `true` if a given `Span` matches, i.e. belongs to the category described by the Predicate.

##### Required Arguments for Predicates

The arguments represent the values that are made available for `ShouldSample`.

- `Context` with parent Span.
- `TraceId` of the Span to be created.
- Name of the Span to be created.
- Initial set of Attributes of the Span to be created.
- Collection of links that will be associated with the Span to be created.

#### Required Arguments for RuleBased

- `SpanKind`
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved
- list of pairs (`Predicate`, `Sampler`)

For making the sampling decision, if the `Span` kind matches the specified kind, the sampler goes through the list in the provided order and calls `SpanMatches` on `Predicate`s passing the `Span` as the argument. If a call returns `true`, the corresponding `Sampler` will be called to make the final sampling decision. If the `SpanKind` does not match, or none of the calls to `SpanMatches` yield `true`, the final decision is `DROP`.

The order of `Predicate`s is essential. If more than one `Predicate` matches a `Span`, only the Sampler associated with the first matching `Predicate` will be used.

## Summary

### Example - sampling configuration 1

Going back to our example of sampling requirements, we can now configure the head sampler to support this particular case, using an informal notation of samplers and their arguments.
First, let's express the requirements for the ROOT spans as follows.

```
S1 = RuleBased(ROOT, {
(http.target == /healthcheck) => AlwaysOff,
(http.target == /checkout) => AlwaysOn,
true => TraceIdRatioBased(0.25)
})
```

In the next step, we can build the sampler to handle non-root spans as well:

```
S2 = ParentBased(S1)
```

The special case of calling service `/foo` can now be supported by:

```
S3 = AnyOf(S2, RuleBased(CLIENT, { (http.url == /foo) => AlwaysOn })
```

Finally, the last step is to put a limit on the stream of exported spans. One of the available rate limiting sampler that we can use is Jaeger [RateLimitingSampler](https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk-extensions/jaeger-remote-sampler/src/main/java/io/opentelemetry/sdk/extension/trace/jaeger/sampler/RateLimitingSampler.java):

```
S4 = Conjunction(S3, RateLimitingSampler(1000 * 60))
```

### Example - sampling configuration 2

Many users are interested in [Consistent Probability Sampling](https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/#consistent-probability-sampler) (CP), as it gives them a chance to calculate span-based metrics even when sampling is active. The configuration presented above uses the traditional samplers, which do not offer this benefit.

Here is how an equivalent configuration can be put together using [CP samplers](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md#samplers). In this example, the following implementations are used:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sampling configurations 1 and 2 are not equivalent. To demonstrate this, consider the following example:

Assume you have two types of root spans A and B, which should be sampled with probabilities of 40% and 20%, respectively. Overall, at most 750 spans should be sampled per minute. The corresponding configuration 1 for this would be

Conjunction(
  RuleBased(ROOT, {IsTypeA => TraceIdRatioBased(0.4), IsTypeB => TraceIdRatioBased(0.2)}), 
  RateLimitingSampler(750 * 60)
)

If the original rates are 1000/min for A and 2000/min for B, 325/min would be sampled for each of both, giving a total of exactly 750/min as desired.

Correspondingly, configuration 2 would be

EachOf(
  RuleBased(ROOT, {IsTypeA => ConsistentFixedThreshold(0.4), IsTypeB => ConsistentFixedThreshold(0.2)}), 
  ConsistentRateLimiting(750 * 60)
)

This would give 250/min for A and 400/min for B, which gives a total of 650/min. Hence, the result differs from that of configuration 1.

To make configuration 2 equivalent, we must replace EachOf with Conjunction. Furthermore, we would need a weighted variant of the ConsistentRateLimiting sampler, which analyzes the recent distribution of the th values given by the first sampler of the conjunction and which adapts the th values in such a way that the desired sampling rate is met. Ideally, the ConsistentWeightedRateLimiting would increase the thresholds for A and B from 0.6 to 0.675 and from 0.8 to 0.8375, respectively. (Assuming thresholds are values from [0,1].) This would again give 325/min for both, like configuration 1.

Conjunction(
  RuleBased(ROOT, {IsTypeA => ConsistentFixedThreshold(0.4), IsTypeB => ConsistentFixedThreshold(0.2)}), 
  ConsistentWeightedRateLimiting(750 * 60)
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EachOf(
RuleBased(ROOT, {IsTypeA => ConsistentFixedThreshold(0.4), IsTypeB => ConsistentFixedThreshold(0.2)}),
ConsistentRateLimiting(750 * 60)
)


This would give 250/min for A and 400/min for B, which gives a total of 650/min. Hence, the result differs from that of configuration 1.

The total of 650/min is a concern, I'm less worried about the individual distribution between type A and B. After all, Consistent samplers have very different behavior than leaky-bucket (balancing vs. proportional).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make configuration 2 equivalent, we must replace EachOf with Conjunction.

Yes, Conjunction will work better here. Which raises another question: do we need EachOf at all (neither in Approach One nor Approach Two)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, we would need a weighted variant of the ConsistentRateLimiting sampler, which analyzes the recent distribution of the th values given by the first sampler of the conjunction and which adapts the th values in such a way that the desired sampling rate is met. Ideally, the ConsistentWeightedRateLimiting would increase the thresholds for A and B from 0.6 to 0.675 and from 0.8 to 0.8375, respectively. (Assuming thresholds are values from [0,1].) This would again give 325/min for both, like configuration 1.

Looking at the th values only makes sense if the given sampler is used as the Second delegate in Conjunction.
So how about generalizing this idea by defining Weighted or Proportional sampler which analyzes the population of the rv values in the population to sample. We cannot use the sampling probability value if it is influenced by the span's rv, but it does not mean that it cannot be influenced by the rv values from the previous spans. Keeping track of the distribution of the rv values at a given sampling stage allows us not only to define a proportional variant of RateLimitingSampler, but also something like ProportionalFixedRate sampler. It would always reduce the volume of spans according to the defined factor. Not much use in head sampling perhaps, but I believe it would be useful in tail sampling.

More specifically, such a sampler would track the average rv of the incoming spans. For an unsampled population this average is expected to be 0.5. If the population has been pre-sampled, the spans with smaller rv would have been eliminated, and the observed average would be higher. This would allow the samplers with proportional behavior to adjust their sampling probability accordingly (just as you proposed in your example).


- [ConsistentAlwaysOffSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/consistent-sampling/src/main/java/io/opentelemetry/contrib/sampler/consistent56/ConsistentAlwaysOffSampler.java)
- [ConsistentAlwaysOnSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/consistent-sampling/src/main/java/io/opentelemetry/contrib/sampler/consistent56/ConsistentAlwaysOnSampler.java)
- [ConsistentFixedThresholdSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/consistent-sampling/src/main/java/io/opentelemetry/contrib/sampler/consistent56/ConsistentFixedThresholdSampler.java)
- [ConsistentParentBasedSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/consistent-sampling/src/main/java/io/opentelemetry/contrib/sampler/consistent56/ConsistentParentBasedSampler.java)
- [ConsistentRateLimitingSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/consistent-sampling/src/main/java/io/opentelemetry/contrib/sampler/consistent56/ConsistentRateLimitingSampler.java)

```
S = EachOf(
AnyOf(
ConsistentParentBased(
RuleBased(ROOT, {
(http.target == /healthcheck) => ConsistentAlwaysOff,
(http.target == /checkout) => ConsistentAlwaysOn,
true => ConsistentFixedThreshold(0.25)
}),
RuleBased(CLIENT, {
(http.url == /foo) => ConsistentAlwaysOn
}
),
ConsistentRateLimiting(1000 * 60)
)
```

### Limitations of composite samplers

Not all samplers can participate as components of composite samplers without undesired or unexpected effects. Some samplers require that they _see_ each `Span` being created, even if the span is going to be dropped. Some samplers update the trace state or maintain internal state, and for their correct behavior it it is assumed that their sampling decisions will be honored by the tracer at the face value in all cases. A good example for this are rate limiting samplers which have to keep track of the rate of created spans and/or the rate of positive sampling decisions.

A special attention is required for CP samplers. The sampling probability they record in trace-state is later used to calculate [_adjusted count_](https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/#adjusted-count), which, in turn, is used to calculate span-based metrics. While the composite samplers presented here are compatible with CP samplers, generally, mixing CP samplers with other types of samplers may lead to undefined or sometimes incorrect adjusted counts.

### Prior art

A number of composite samplers are already available as independent contributions
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved
([RuleBasedRoutingSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSampler.java),
[LinksBasedSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/samplers/src/main/java/io/opentelemetry/contrib/sampler/LinksBasedSampler.java)).
Also, historically, some Span categorization was introduced by [JaegerRemoteSampler](https://www.jaegertracing.io/docs/1.54/sampling/#remote-sampling).

This proposal aims at generalizing these ideas, and at providing a bit more formal specification for the behavior of the composite samplers.