Skip to content

Add exponential histogram support to CloudWatch PMD Exporter #1677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

dricross
Copy link
Contributor

@dricross dricross commented May 8, 2025

Description of the issue

The Cloudwatch/PMD exporter currently drops all exponential histogram metrics.

Description of changes

Adds support for exponential histogram to the CloudWatch/PMD exporter

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Setup something to what our OTLP -> AMP test does (see test repo).

Agent config:

{
  "agent": {
    "metrics_collection_interval": 15,
    "run_as_user": "root",
    "debug": true
  },
  "metrics": {
    "metrics_collected": {
      "otlp": {
        "http_endpoint": "127.0.0.1:1234"
      }
    }
  }
}

Use OTLP metrics and OTLP pusher from test repo to generate exponential histogram metrics.

In CloudWatch, we can see:

  • min: 0
  • max: 5
  • sample count: 72
  • Sum: 240

image

Requirements

Before commit the code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

@dricross dricross requested a review from a team as a code owner May 8, 2025 20:55

// ValuesAndCounts outputs two arrays representing the midpoints of each exponential histogram bucket and the
// counter of datapoints within the corresponding exponential histogram buckets
func (d *ExpHistogramDistribution) ValuesAndCounts() ([]float64, []float64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the order of the values matter? Does it have to be positive to negative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so. See PutMetricData API documentation here: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutMetricData.html. The order of values just needs to match the order of counts.

I selected this order based on the behavior of the awsemfexporter for exponential histograms, e.g. https://github.com/amazon-contributing/opentelemetry-collector-contrib/blame/aws-cwa-dev/exporter/awsemfexporter/datapoint_test.go#L1168

Comment on lines 80 to 85
posOffsetIndicies := make([]int, 0, len(d.positiveBuckets))
for offsetIndex := range d.positiveBuckets {
posOffsetIndicies = append(posOffsetIndicies, offsetIndex)
}
slices.Sort(posOffsetIndicies)
slices.Reverse(posOffsetIndicies)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of sorting and then reversing, we can use maps.Keys which returns an iterator and a sort function to reverse it while doing the sort.

https://pkg.go.dev/maps#Keys
https://pkg.go.dev/slices#SortedFunc

Suggested change
posOffsetIndicies := make([]int, 0, len(d.positiveBuckets))
for offsetIndex := range d.positiveBuckets {
posOffsetIndicies = append(posOffsetIndicies, offsetIndex)
}
slices.Sort(posOffsetIndicies)
slices.Reverse(posOffsetIndicies)
posOffsetIndicies := slices.SortedFunc(maps.Keys(d.positiveBuckets), func(a, b int) int {
return cmp.Compare(b, a)
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice, I'll update

// LowerBoundaryNegativeScale computes the lower boundary for index
// with scales <= 0.
func LowerBoundaryNegativeScale(index int, scale int) float64 {
return math.Ldexp(1, index<<-scale)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using math.Ldexp instead of just math.Exp2 if frac is always 1?

https://pkg.go.dev/math#Ldexp

func Ldexp(frac float64, exp int) float64

Ldexp is the inverse of Frexp. It returns frac × 2**exp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned later, it was pulled directly from the OTLP documentation: https://opentelemetry.io/docs/specs/otel/metrics/data-model/#negative-scale-extract-and-shift-the-exponent

}

func (d *ExpHistogramDistribution) Resize(_ int) []*ExpHistogramDistribution {
// for now, do not split data points into separate PMD requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we say "for now" does that mean this should be a TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something that needs to be done later, yes. Does marking it as TODO specifically do something special?

}

// MapToIndexScale0 computes a bucket index at scale 0.
func MapToIndexScale0(value float64) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these were all taken from https://opentelemetry.io/docs/specs/otel/metrics/data-model/#negative-scale-extract-and-shift-the-exponent. Consider putting in a separate file in the package (like mapping.go or math.go) and indicating the source at the top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can move those

@dricross dricross force-pushed the dricross/exponentialhistogram branch from 49857cf to 97c4271 Compare May 16, 2025 21:08
@dricross dricross force-pushed the dricross/exponentialhistogram branch from 97c4271 to 995902e Compare May 16, 2025 21:08
return d.AddEntryWithUnit(value, weight, "")
}

func (d *ExpHistogramDistribution) AddDistribution(other *ExpHistogramDistribution) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we trying to match the Distribution interface? It doesn't look like this would satisfy the interface as it is now. We would have to make the Distribution interface generic.

@@ -82,7 +83,7 @@ func ConvertOtelNumberDataPoints(
unit string,
scale float64,
entity cloudwatch.Entity,
) []*aggregationDatum {
) []*aggregationDatum { //nolint:revive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Instead of ignoring the lint, can we just not export the function? It doesn't look like it's used outside of this package.

}
// Assume function pointer is valid.
ad.expHistDistribution = exph.NewExpHistogramDistribution()
ad.expHistDistribution.ConvertFromOtel(dp, unit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If we store the unit in the MetricDatum already, why do we need to store it in the distribution?

return datums
}

func (c *CloudWatch) buildMetricDataumExph(metric *aggregationDatum, dimensionsList [][]*cloudwatch.Dimension) []*cloudwatch.MetricDatum {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Typo

Suggested change
func (c *CloudWatch) buildMetricDataumExph(metric *aggregationDatum, dimensionsList [][]*cloudwatch.Dimension) []*cloudwatch.MetricDatum {
func (c *CloudWatch) buildMetricDatumExph(metric *aggregationDatum, dimensionsList [][]*cloudwatch.Dimension) []*cloudwatch.MetricDatum {

Copy link
Contributor

This PR was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants