Skip to content

Updates and fixes to recording rules subcommand of ceems_tool #397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/step_images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ jobs:

- name: Push README to registry
uses: christian-korneck/update-container-description-action@d36005551adeaba9698d8d67a296bd16fa91f8e8 # v1
if: (github.ref == 'refs/heads/main' || (github.event_name == 'push' && contains(github.ref, 'refs/tags/'))) && github.repository_owner == 'ceems' # Don't run this workflow on forks.
if: (github.ref == 'refs/heads/main' || (github.event_name == 'push' && contains(github.ref, 'refs/tags/'))) && github.repository_owner == 'ceems-dev' # Don't run this workflow on forks.
env:
# For dockerhub registry
DOCKER_USER: ${{ secrets.login }}
Expand Down
15 changes: 14 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# Changelog

## 0.11.0 / 2025-*-*
## 0.11.0 / 2025-*

### Breaking Changes

#### CEEMS Exporter

- Collector `rapl` is disabled by default now and to enable it add `--collector.rapl` to CLI arguments.
- Collector `ipmi_dcmi` has been renamed to `ipmi` as more functionality beyond DCMI has been added to the collector.
- Following metric labels have been renamed to be more consistent with Prometheus naming convention:
* `ceems_ipmi_dcmi_current_watts` -> `ceems_ipmi_dcmi_power_current_watts`
Expand All @@ -17,6 +18,18 @@
* `ceems_redfish_max_watts` -> `ceems_redfish_power_max_watts`
* `ceems_redfish_avg_watts` -> `ceems_redfish_power_avg_watts`

#### CEEMS tool

- Several minor bugs in recording rules have been fixed. Please regenerate the recording rules with new version of `ceems_tool`.
- GPU profiling metrics have been renamed to have `prof` in the metric label. For instance, `uuid:ceems_gpu_sm_active:ratio` became
`uuid:ceems_gpu_prof_sm_active:ratio`.
- NVIDIA profiling metrics suffix has been corrected to use `sum` instead of `ratio` for NVLink, PCIe traffic metrics. Thus, metrics
have been renamed as follows:
* `uuid:ceems_gpu_pcie_tx_bytes:ratio` -> `uuid:ceems_gpu_prof_pcie_tx_bytes:sum`
* `uuid:ceems_gpu_pcie_rx_bytes:ratio` -> `uuid:ceems_gpu_prof_pcie_rx_bytes:sum`
* `uuid:ceems_gpu_nvlink_tx_bytes:ratio` -> `uuid:ceems_gpu_prof_nvlink_tx_bytes:sum`
* `uuid:ceems_gpu_nvlink_rx_bytes:ratio` -> `uuid:ceems_gpu_prof_nvlink_rx_bytes:sum`

## 0.10.2 / 2025-08-07

- [BUGFIX] Fix bpf code to work with LLVM 20 [#393](https://github.com/mahendrapaipuri/ceems/pull/393) ([@mahendrapaipuri](https://github.com/mahendrapaipuri))
Expand Down
2 changes: 1 addition & 1 deletion cmd/ceems_tool/relabel.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ import (
var gpuSeries = []string{
"DCGM_FI_DEV_POWER_USAGE_INSTANT",
"amd_gpu_power",
"gpu_power_usage",
"GPU_POWER_USAGE",
}

// MetricRelabelConfig contains the Prometheus metric relabel config.
Expand Down
Loading
Loading