Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWATCH-2306: Create grafana dashboard for PAYG metrics #4158

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Sgitario
Copy link
Contributor

@Sgitario Sgitario commented Feb 5, 2025

Jira issue: SWATCH-2306

Description

The metrics to use for each are:

  • Metered, which is the first "phase". It's swatch_metrics_ingested_usage_total: Added by SWATCH-2297
  • Tallied, which is the second one: It's swatch_tally_tallied_usage_total: Added by SWATCH-2299
  • Covered by contract: It's swatch_contract_usage_total : Added by SWATCH-2300
  • Billing pending: It's swatch_billable_usage_total Added by SWATCH-2301
  • Remitted: It`s swatch_producer_metered_total Added by SWATCH-2302

The new "Metered end-to-end stats" panels look like as:

Screenshot From 2025-02-07 10-29-21

Which displays the values by the select product(s), metric and billing provider.

As part of this task, I found the following issues:

  • The metric "swatch_billable_usage_total" is being counting twice: SWATCH-3294
  • The metric "swatch_tally_tallied_usage_total" is always empty. I will dig further into this issue.

Additionally to the above panels, I've also added more panels to see this data by metric:

Screenshot From 2025-02-07 10-33-23

The main difference is that we can see the number for all the metrics at once without needing to pick up one concrete metric.

Testing

Extract the new dashboard json using oc extract -f .rhcicd/grafana/grafana-dashboard-subscription-watch-payg-metrics.configmap.yaml --confirm

Once you extract it from the .yaml that's checked into this repo, you can import it into the stage instance of grafana by going to Dashboards -> +Import from the left nav.

@Sgitario Sgitario added QE Unneeded Pull request does not need QE approval Dev Pull requests that need developer review labels Feb 5, 2025
@Sgitario Sgitario requested a review from kahowell February 5, 2025 06:40
@barnabycourt
Copy link
Collaborator

Is there a way to create the panels such that we can hover with the mouse on each day and see the value? Also, is it worth breaking out to have a panel for each product/metric instead of having to select them from the drop-down?

@Sgitario
Copy link
Contributor Author

Sgitario commented Feb 5, 2025

Is there a way to create the panels such that we can hover with the mouse on each day and see the value?

I like this idea. Wdyt @kahowell ?

Also, is it worth breaking out to have a panel for each product/metric instead of having to select them from the drop-down?

This is why I added an additional panel which is by product and metric (see the second screenshot from the PR description). In this case, selecting the metric from the drop-down will have no effects.

Copy link
Contributor

@kahowell kahowell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Instead of specifying a hard-coded range selector ([1d]), it's more useful to use the grafana global variable $__range, because then the time selection can be used to choose a specific time range (both the ending timestamp and the time period of the range).
  • Instead of hard-coding the product, metric_id, and billing_provider values, define them as Query variables, using ${datasource} and the metrics ingested metric e.g. (label_values(swatch_metrics_ingested_usage_total,product)).

@kahowell
Copy link
Contributor

kahowell commented Feb 5, 2025

Is there a way to create the panels such that we can hover with the mouse on each day and see the value?

I like this idea. Wdyt @kahowell ?

You'll have to switch to the time series panel type for that. I'm fine with that.

@kartikshahc kartikshahc self-requested a review February 5, 2025 19:19
@kartikshahc
Copy link
Contributor

kartikshahc commented Feb 6, 2025

@Sgitario I might be missing something here because none of the dashboard metrics meet with what we are getting from database. So please correct me if I am understanding these metrics correctly as these were mostly implemented while I was out in Dec. Also this issue could be with the cards implemented for capturing metrics and not with this card but, just wanted to bring it up.

Question: The metrics shown in dashboard is for 1 day or for the month?

Example for Rosa on stage for 1 day period:
For Cores metric I see 8 failed status but, dashboard says 533

Gabi result for 1 day and for this month below screenshot:
gabi "select sum(remitted_pending_value),status,metric_id from billable_usage_remittance where product_id='rosa' and remittance_pending_date>'2025-02-05' group by status, metric_id"

Screenshot From 2025-02-06 16-29-19

Dashboard screenshot:
Screenshot From 2025-02-06 16-28-01

If you notice none of the values in billing pending or remitted matches with the database values
Screenshot From 2025-02-06 16-34-06

Same applies for metered panel:
Gabi screenshot below and query shows 2464 Instance hours for the day Feb 6 and 4543 for this month but, doesn't match the dashboard value 5702.
gabi "select sum((data->'measurements'->0->'value')::float),data->'product_tag', data->'measurements'->0->>'metric_id' from events where timestamp>='2025-02-06T00:00:00Z' group by data->'product_tag', data->'measurements'->0->>'metric_id'"

Screenshot From 2025-02-06 16-51-24

Dashboard results:
Screenshot From 2025-02-06 16-54-50

@Sgitario Sgitario marked this pull request as draft February 7, 2025 04:21
@Sgitario
Copy link
Contributor Author

Sgitario commented Feb 7, 2025

@Sgitario I might be missing something here because none of the dashboard metrics meet with what we are getting from database. So please correct me if I am understanding these metrics correctly as these were mostly implemented while I was out in Dec. Also this issue could be with the cards implemented for capturing metrics and not with this card but, just wanted to bring it up.

Question: The metrics shown in dashboard is for 1 day or for the month?

Example for Rosa on stage for 1 day period: For Cores metric I see 8 failed status but, dashboard says 533

Gabi result for 1 day and for this month below screenshot: gabi "select sum(remitted_pending_value),status,metric_id from billable_usage_remittance where product_id='rosa' and remittance_pending_date>'2025-02-05' group by status, metric_id"

Screenshot From 2025-02-06 16-29-19

Dashboard screenshot: Screenshot From 2025-02-06 16-28-01

If you notice none of the values in billing pending or remitted matches with the database values Screenshot From 2025-02-06 16-34-06

Same applies for metered panel: Gabi screenshot below and query shows 2464 Instance hours for the day Feb 6 and 4543 for this month but, doesn't match the dashboard value 5702. gabi "select sum((data->'measurements'->0->'value')::float),data->'product_tag', data->'measurements'->0->>'metric_id' from events where timestamp>='2025-02-06T00:00:00Z' group by data->'product_tag', data->'measurements'->0->>'metric_id'"

Screenshot From 2025-02-06 16-51-24

Dashboard results: Screenshot From 2025-02-06 16-54-50

Yes, the numbers are very wrong that's why I started digging into this and found issues like double or triple counting some metrics, incremeting the counters using different units or using a different format in the counters.

Still, thank you very much for the thorough review! This is really valuable. After addressing all these issues in #4159 and having it merged in stage, I will update the dashboard and update it after the feedback from Kevin and Barnaby, and verify that everything look good.

The metrics to use for each are:
- Metered, which is the first "phase". It's swatch_metrics_ingested_usage_total: Added by SWATCH-2297
- Tallied, which is the second one: It's swatch_tally_tallied_usage_total: Added by SWATCH-2299
- Covered by contract: It's swatch_contract_usage_total : Added by SWATCH-2300
- Billing pending: It's swatch_billable_usage_total Added by SWATCH-2301
- Remitted: It`s swatch_producer_metered_total  Added by SWATCH-2302
@Sgitario Sgitario force-pushed the jcarvaja/SWATCH-2306 branch from 3eb5f59 to 489ebe0 Compare February 7, 2025 09:20
@Sgitario
Copy link
Contributor Author

Sgitario commented Feb 7, 2025

@barnabycourt @kahowell @kartikshahc , I've updated the dashboard with the following changes:

  • use the grafana global variable $__range for all the panels instead of the hardcoded range of "1d"
  • Instead of hard-coding the product, metric_id, and billing_provider values, define them as Query variables, using ${datasource} and the metrics ingested metric
  • changed the panels to use timed series, so we'll see the plots by days
  • additionally, I've included a couple of new panels to check the remittance plots by failure unknown and failure subscription not found

I've updated the screenshots in the PR description.

Note @kartikshahc that the numbers are not trustable yet because #4159 is not merged yet.

@Sgitario Sgitario force-pushed the jcarvaja/SWATCH-2306 branch from 489ebe0 to bd1a27e Compare February 7, 2025 09:32
@Sgitario Sgitario requested a review from kahowell February 7, 2025 09:34
@Sgitario Sgitario marked this pull request as ready for review February 7, 2025 09:35
@InsightsDroid
Copy link
Collaborator

IQE Tests Summary Report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dev Pull requests that need developer review QE Unneeded Pull request does not need QE approval
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants