Allocation Metrics no longer emitted #24339

dosera · 2024-10-31T09:51:18Z

Nomad version

Nomad v1.9.1
BuildDate 2024-10-21T09:00:50Z
Revision d9ec23f0c1035401e9df6c64d6ffb8bffc555a5e

Operating system and Environment details

AlmaLinux release 9.4 (Seafoam Ocelot)

/etc/nomad/base.hcl:

telemetry {
..
  prometheus_metrics = "true"
  publish_allocation_metrics = "true"
  publish_node_metrics = "true"

..
}

Issue

After upgrading nomad from 1.8.4 to 1.9.1 allocation metrics like nomad_client_allocs_memory_usage appear not to be properly emitted (or only for several seconds - i am using prometheus to scrape them).

Reproduction steps

Enable telemetry
Start any allocation
Collect metrics

Expected Result

Allocation metrics are available.

Actual Result

Allocation metrics are available sporadically (at best). See the attached screenshot - nomad was upgraded on Oct 24th.

This is how it looks in prometheus since then:

The text was updated successfully, but these errors were encountered:

Himura2la · 2024-11-01T08:53:41Z

We have the same issue with allocation metrics after updating from 1.8.2 to 1.9.1:

query: avg_over_time(nomad_client_allocs_memory_usage[10m]) / avg_over_time(nomad_client_allocs_memory_allocated[10m])

They are returned irregularly, and often values are zero.

our configs:

telemetry {
  collection_interval = "30s"
  disable_hostname = true
  prometheus_metrics = true
  publish_allocation_metrics = true
  publish_node_metrics = true
}

global:
  scrape_interval:     1m
  scrape_timeout:     20s
  evaluation_interval: 1m

  - job_name: nomad
    static_configs:
      - targets:
        # ...
    scheme: https
    metrics_path: /v1/metrics
    params:
      format: [prometheus]
    tls_config:
      insecure_skip_verify: true
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
      # ...

Operating system: Debian 12, amd64

jrasell · 2024-11-04T11:39:41Z

Hi @Himura2la and @dosera and thanks for raising this issue. I have been unable to reproduce this on macOS or Ubuntu 24.04 using the steps below. Could you please provide a minimal job specification that would help reproduce this along with any other relevant information?

Example agent config:

telemetry {
  prometheus_metrics         = "true"
  publish_allocation_metrics = "true"
  publish_node_metrics       = "true"
}

Example job:

job "example" {

  group "cache" {
    network {
      mode = "bridge"
      port "db" {
        to = 6379
      }
    }

    task "redis" {
      driver = "docker"

      config {
        image          = "redis:7"
        ports          = ["db"]
        auth_soft_fail = true
      }

      identity {
        env  = true
        file = true
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Example test command:

while true; do
> curl -s localhost:4646/v1/metrics | jq '.Gauges | .[] | select(.Name | contains("nomad.client.allocs.memory.usage")) |.Value'
> curl -s 'localhost:4646/v1/metrics?format=prometheus' |grep 'nomad.client.allocs.memory.usage{' |awk '{print $2}'
> sleep 2
> done

Himura2la · 2024-11-04T13:46:31Z

I'm afraid your validation method is not relevant to the issue, because it's best visible on a long term run.
I think you can simply follow this tutorial https://developer.hashicorp.com/nomad/tutorials/manage-clusters/prometheus-metrics and see how it's going with 5-10 jobs that consume something for several hours

jedd · 2024-11-05T07:43:18Z

I'm seeing the same thing with 4 x boxes I upgraded to 1.9.1 from 1.9.0
2024-10-18 - upgraded from 1.8.4 to 1.9.0 - no issues observed at this time
2024-10-25 - upgraded from 1.9.0 to 1.9.1 - from this point I am seeing a lot of gaps in the data - image attached

I'm using Debian testing on all four machines.

My Prometheus scrapes a telegraf (prometheus export mode) on the same machines, and I'm seeing no drops in those data.

Scrape interval for both is set to 1m. Boxes had no other packages upgraded at that time.

More importantly, I'm not seeing CPU or Memory data in the Nomad GUI in either Task or Allocation view. I will occasionally get a red line indicating what I presume is 'current' X MiB / Total memory used, but no history at all.
CPU graph consistently shows 0MHz (and no history of course, either).

jrasell · 2024-11-06T11:30:12Z

Hi @jedd @Himura2la and @dosera. I ran two longer tests using Debian 12 and Ubuntu 24.04 and was still unable to reproduce the problem. Each test ran for several hours, include several jobs, and used Prometheus for metric collection. I'll include the two example graphs below.

I'll mark this issue for further investigation and add it to our backlog. If anyone is able to provide a minimal reproduction that would be greatly appreciated.

jedd · 2024-11-06T12:19:37Z

Thanks for diving into that, @jrasell -- I'll see if I can distil a prometheus scrape option, but really you've probably already matched that - it's just an interval of 1m, and 4 x hosts on :4646 (no tls, auth). As per my graphs some data is being made available periodically, so it's not an auth issue.

Can you tell me what you saw on the task / allocation cpu & memory graphs?

Opening these up and letting them sit for more than a minute, and I see no change - usually I'd expect to see the graph start to populate and scroll within a few seconds.

Also oddly is the times shown on the X axis - it's 23:17 local here, but in one of these I see 00:00:00 for ALL x points, and on the other 08:44:00 to 08:48:00. These are taken on the same host (my 'master'), but the screenshots are from a job running on this, and the other running on one of my client-only nomad hosts.

dmclf · 2024-11-07T04:55:39Z

Also seem to be suffering from this issue.

development-cluster, consisting of 5 Ubuntu 22.04 machines, 3 servers, 2 clients.
Servers upgraded from 1.9.0-1 to 1.9.1-1 on October 30, 02:00 UTC
Clients downgraded from 1.9.0-1 to 1.8.3-1 on Oct 20, upgraded to 1.9.1-1 on Oct 30.

nomad version

Nomad v1.9.1
BuildDate 2024-10-21T09:00:50Z
Revision d9ec23f0c1035401e9df6c64d6ffb8bffc555a5e

nomad config

telemetry {
  disable_hostname = true
  prometheus_metrics = true
  publish_allocation_metrics = true
  publish_node_metrics = true
}

tls {
  http = true
  rpc  = true
[..]
vault {
  enabled               = true
[..]
consul {
  allow_unauthenticated = true
  ssl                 = true
}

prometheus 2.53.1
docker 27.2.1
linux kernel 5.15.0-121-generic

distorted picture for several nomad metrics for allocations

nomad_client_allocs_memory_usage
nomad_client_allocs_memory_allocated
nomad_client_allocs_cpu_allocated
nomad_client_allocs_cpu_user

non-nomad stats, ie, cadvisor / node-exporters are clearly more consistent

(this may also prove scraping process is mostly stable)

cadvisor - machine_memory_bytes
node-exporter - node_memory_Active_bytes

nomad-metrics process_virtual_memory_bytes / nomad_client_host_memory_free

and other metrics provided by the nomad-metrics also seem consistent

process_virtual_memory_bytes
nomad_client_host_memory_free

so something seems odd with the per-task statistics
also when graphing totals of occurrences of metrics for nomad_client_allocs_memory_usage for a specific alloc_id (to also showcase its not restarted/stopped) one clearly sees decreases.

avg(count_over_time(nomad_client_allocs_memory_usage{job="nomad_metrics",task_group="demo",alloc_id="3026530 4-3d02-1b82-1023-de6f68af7816"}[10m]))

curl testing

which can also easily be tracked by just doing some curl'ing
ie

curl -k "https://${x}:4646/v1/metrics?format=prometheus" 2>/dev/null|grep ^nomad_client_allocs_memory_usage

where sometimes there is no reply, sometimes some replies, but never consistently all are included.

curl comparison 1.9.1 vs 1.8.3

if doing checks like these, there are also clear inconsistencies between my dev-cluster on 1.9.1
and the staging/prod clusters on 1.8.3

export host=nomad-development-client-1
curl -k "https://${host}:4646/v1/metrics?format=prometheus" 2>/dev/null|grep -e ^nomad_client_tasks_running
echo "wc -l @ nomad_client_allocs_memory_usage" $(curl -k "https://${host}:4646/v1/metrics?format=prometheus" 2>/dev/null | grep -e "^nomad_client_allocs_memory_usage.*task=" | wc -l)"

development 1.9.1 (random client)

nomad_client_tasks_running{datacenter="nomad24339",node_class="none",node_id="20591b4f-586c-cf24-0706-b17640b6ea1b",node_pool="default",node_scheduling_eligibility="eligible",node_status="ready"} 23
wc -l @ nomad_client_allocs_memory_usage 2

= 23 vs 2 = inconsistent ❌

staging 1.8.3 (random client)

nomad_client_tasks_running{datacenter="nomad24339",node_class="none",node_id="64102b51-0ebc-893a-3c2d-08e538bb4a66",node_pool="default",node_scheduling_eligibility="eligible",node_status="ready"} 331
wc -l @ nomad_client_allocs_memory_usage 331

= 331 vs 331 = consistent ✅

production 1.8.3 (random client)

nomad_client_tasks_running{datacenter="nomad24339",node_class="none",node_id="95d3df8d-a82c-0357-60aa-fd6cb952fb2b",node_pool="default",node_scheduling_eligibility="eligible",node_status="ready"} 240
wc -l @ nomad_client_allocs_memory_usage 240

= 240 vs 240 = consistent ✅

- ugly - hopefully can help with hashicorp/nomad#24339

dmclf · 2024-11-08T12:57:30Z

Hi @jedd @Himura2la and @dosera. I ran two longer tests using Debian 12 and Ubuntu 24.04 and was still unable to reproduce the problem. Each test ran for several hours, include several jobs, and used Prometheus for metric collection. I'll include the two example graphs below.

I'll mark this issue for further investigation and add it to our backlog. If anyone is able to provide a minimal reproduction that would be greatly appreciated.

@jrasell in a quick-n-dirty attempt to make something that can mimic what I'm seeing I made up something
https://github.com/dmclf/nomad-debug/tree/main/gh-24339

not exactly the most elegant, but hopefully it can aid somehow on your end.
(also not sure if this is guaranteed to work 100%, or needs a rerun every now and then, it is very much quick, dirty n rough)

jedd · 2024-11-09T11:44:02Z

I have basically zero golang, and because it's an intermittent problem I'm not 100% sure that my git bisect activities have narrowed things down, but I'm cautiously confident the problem manifested with:

commit 6c3f222

Looking at git show 6c3f222fe154a66369e84789828f4a8d36a20c7b -- drivers/docker/stats.go

there's a comment about a move to streaming stats API, and a bunch of changes to the collection mechanism.

I don't know how fragile that stream may be under various circumstances, and really out of my depth here trying to guess at whether this may be r/c for an intermittent problem like we're seeing, but perhaps @dmclf if you're able to do the https://developer.hashicorp.com/nomad/docs/install#compiling-from-source process on one of your clients, and can validate that commit is where things go pear shaped on your system too?

I noted - but I think it's entirely coincidental - that the problem seemed to occur shortly after journalctl reported client.gc operation was skipped:

client.gc: garbage collection skipped because no terminal allocations: reason="disk usage of 97 is over gc threshold of 80"

(I'm pretty sure this is entirely coincidental, but it was the only log entry emitted by Nomad.)

I did a bisect manually, given intermittent nature of the problem, and waited for 10 minutes with some regular curl'ing to see if the problem manifested - so I caught definite failures, but may have accidentally given a 'good' to some commits that just didn't fail fast enough.

dmclf · 2024-11-09T15:46:29Z

@jedd right... Ok, I suppose I can try that (i'm not much of a golang person nor developer either)

but if you say you may suspect 6c3f222fe154a66369e84789828f4a8d36a20c7b

and then, request me to, from release/1.9.1 try build from a checkout of:
e5e26c8eb34c40263d1557f7e6f948c92fc57ca8 (the one before)

and see if that works better?

dmclf · 2024-11-09T15:50:49Z

@jedd to confirm, from my end at least.

git clone -b release/1.9.1 https://github.com/hashicorp/nomad.git 
git checkout e5e26c8eb34c40263d1557f7e6f948c92fc57ca8 && make bootstrap && make dev && cp bin/nomad /usr/local/bin/

-> works fine ✅

git clone -b release/1.9.1 https://github.com/hashicorp/nomad.git 
git checkout 6c3f222fe154a66369e84789828f4a8d36a20c7b   && make bootstrap && make dev && cp bin/nomad /usr/local/bin/

-> gives this ticket's odd behaviour ❌

related to hashicorp/nomad#24339

dmclf · 2024-11-11T06:34:44Z

@jedd for completeness

updated 1 of the 2 clients on the dev-cluster too to check behavior there with the 2 versions

results:

so initially graph starts with

Nomad v1.9.1
BuildDate 2024-10-21T09:00:50Z
Revision d9ec23f0c1035401e9df6c64d6ffb8bffc555a5e

after updated to

Nomad v1.9.1-dev
BuildDate 2024-10-17T09:13:46Z
Revision e5e26c8eb34c40263d1557f7e6f948c92fc57ca8

and then lastly changed to

# nomad version
Nomad v1.9.1-dev
BuildDate 2024-10-17T12:47:17Z
Revision 6c3f222fe154a66369e84789828f4a8d36a20c7b

dmclf · 2024-11-11T06:46:32Z

and also confirming issue still exists with most recent v1.9.2

Nomad v1.9.2
BuildDate 2024-11-08T08:58:16Z
Revision de8a30ba86a2ed8dfb7188da10c250056d165fea

jedd · 2024-11-11T07:10:57Z

As @jrasell wasn't able to reproduce this on their lab - perhaps some other confounding factors are at play?

Last reference I could find to a non-trivial docker stats API change was 2015 - with the introduction of one-shot metrics rather than just streaming (which is the other direction to the change in that commit above).

FWIW these are the two systems I'm seeing this problem present:

Debian bullseye / linux 5.10.0 amd64 / Xeon / docker 20.10.5 / containerd 1.4.13

Debian bookworm / linux 6.1.0 amd64 / Intel i5 / docker 20.10.24 / containerd 1.6.20

dmclf · 2024-11-11T07:33:51Z

@jedd

would you be willing to share the details of the systems used in the lab that were unsuccessful in reproducing the issue?
(os/kernel/arch/docker/containerd versions / amount of tasks running ? )

from my end the problem would be present on (pass -> ❌ indicates problem present)

pass	OS	Version	Kernel / Arch	CPU	Dockerd	Containerd	machine	user
❌	Alpine (container)	3.20.3	6.8.0-45 amd64	Intel i5	27.2.1 (host)	1.7.22 (host)	vmware	dmclf
❌	Ubuntu (container)	24.04	6.8.0-45 amd64	Intel i5	27.2.1 (host)	1.7.22 (host)	vmware	dmclf
❌	Ubuntu	22.04	5.15.0-121 amd64	Xeon	27.2.1	1.7.22	esxi	dmclf
❌	Ubuntu	22.04	5.15.0-121 arm64	Graviton2	27.2.1	1.7.22	ec2	dmclf
❌	RHEL	8.5 (Ootpa)	4.18.0-348 x86_64	Xeon	20.10.12	1.4.12	physical	dmclf
❌	MacOS	15.1 (Sequoia) host / Ubuntu 24.04 container	24.1.0 arm64 host / 6.10.11-linuxkit container	Apple M1 Pro	27.3.1 (MacOS host)	-	physical	dmclf
❌	Debian	bullseye	5.10.0 amd64	Xeon	20.10.5	1.4.13	?	jedd
❌	Debian	bookworm	6.1.0 amd64	Intel i5	20.10.24	1.6.20	?	jedd
❌	AlmaLinux	9.4 (Seafoam Ocelot)	?	?	?	?	?	dosera
❌	Debian	12	?	amd64	?	?	?	Himura2la
✅	macOS	?	?	?	?	?	?	jrasell
✅	Ubuntu	24.04	?	?	?	?	?	jrasell
✅	Debian	12	?	?	?	?	?	jrasell
✅	Ubuntu `nomad agent -dev`	24.04	6.8.0-45 amd64	Intel i5	27.2.1 (host)	1.7.22 (host)	vmware	dmclf

jedd · 2024-11-11T07:48:10Z

@dmclf - so far I've not had any systems present metrics data with Nomad 1.9.1 - but I'm not running a wide range of environments, as noted above.
We have a RHEL platform at work, and this has paused plans to upgrade those - I'll look at a throwaway dev env there to see if the problem presents / doesn't.

dmclf · 2024-11-11T07:55:02Z

@jedd ok, and the details of the system that @jrasell used to try to reproduce in the lab, but failed?

perhaps an easy sample job @jrasell can try
(or increase count to 50 if things work fine with 20)

should be easy to see within 2-3 minutes if stats are being returned properly after containers are reported 'healthy' by nomad.

I can potentially extend scope to RHEL and ARM64 too to see if those are immune but might be nice to get a sense of direction as to what kind of system @jrasell uses in that case.

job "example-job" {
  group "cache" {
    count = 20
    task "redis" {
      driver = "docker"
      config {
        image          = "redis:7"
        auth_soft_fail = true
      }

      identity {
        env = true
        file = true
      }

      resources {
        cpu    = 100
        memory = 64
      }
    }
  }
}

dmclf · 2024-11-11T13:59:36Z

@jedd updated #24339 (comment) table
with a just performed RHEL test , ARM/Graviton2 and Macos/M1 .

tested with official 1.9.1 binary(Revision d9ec23f)
all start failing transmitting reliable stats under a minute after starting the nomad daemon.

dmclf · 2024-11-12T02:37:57Z

As extra observation, but nomad agent -dev is not a valid method to troubleshoot this issue ⚠️
(just tried, and nomad agent -dev seems to work fine and not show this ticket's issue, but that won't be how people normally use nomad)

You will need a minimal 'normal' cluster for this, like 3 servers + 1 client
which has been the case in all my failed tests

versions of servers does not seem to matter, as it happens with servers on 1.8.3 and 1.9.1, whilst client on 1.9.1 is so far always faulty.

jedd · 2024-11-12T03:45:57Z

Oh, that's interesting.

FWIW I did my bisect analysis on my workstation which is a single-node (server/client) system, and it was definitely seeing the problem manifest.

(I'll have to update later on whether that machine was in dev mode at the time.)

dmclf · 2024-11-12T06:16:08Z

@jedd ok, it might be single server with bootstrap_expect=1 and single client may also work?

-- update, confirming on my end, single server + single client (but not in dev-mode) also shows same symptoms.

jrasell · 2024-11-12T07:24:14Z

Hi all, just to circle back here and let you know that this issue has been prioritised internally and is waiting for an engineer to to become available and pick it up. Once they do, we will use the comments and details to try and get a reproduction and fix.

(cleanup after yourself)

* fix: dont stream the docker stats, read them one by one * func: add a NewSafeTicker to the herlper functions * style: remove commented code

valeriansaliou · 2024-12-06T22:23:57Z

I am also having the same issue here, Nomad latest in a single server/client setup.

EDIT: this issue seems to have been fixed, it is just not yet part of a Nomad release -> #24525

It works for a few minutes/hours and then starts not reporting metrics for some allocations, if not all. Some allocations still have metrics, but they may stop working after some time. Those are all using the Docker driver.

When looking at DEBUG logs in the Monitor section of the Nomad client in Nomad Web UI, I am seeing a flood of those logs, which start occurring when the metrics are not reported anymore:

2024-12-06T23:19:26.799+0100 [DEBUG] client.driver_mgr.docker: error decoding stats data from container: container_id=323eaaabe9155e561b6f7f685ee6050b6c274a0e9dd285740d91a8d86903ad1b driver=docker error="json: cannot unmarshal string into Go value of type container.Stats"
2024-12-06T23:19:26.800+0100 [DEBUG] client.driver_mgr.docker: error decoding stats data from container: container_id=7c19e49d5a185498f773e701140e0e73c7526bd84249cb463ccaba1a21f1712d driver=docker error="invalid character ',' looking for beginning of value"
2024-12-06T23:19:26.851+0100 [DEBUG] client.driver_mgr.docker: error decoding stats data from container: container_id=18ca409554162b014ded4fba5d74ccba0da2f4ec9af08c850684a3c93e375374 driver=docker error="invalid character 'm' looking for beginning of value"
2024-12-06T23:19:26.889+0100 [DEBUG] client.driver_mgr.docker: error decoding stats data from container: container_id=87825133c0788d7547df887418d4a62f8c53d0afa9eee24c5a0d6f0dcb434e73 driver=docker error="invalid character 'a' looking for beginning of value"

If configuring the Nomad server to report telemetry every 5s (with the telemetry configuration block in nomad.hcl), after restarting Nomad those logs can immediately be seen. If using the default setting of 1s, they don’t appear immediately. That may help with finding the reason of the issue.

dosera added the type/bug label Oct 31, 2024

jrasell added this to Nomad - Community Issues Triage Oct 31, 2024

github-project-automation bot moved this to Needs Triage in Nomad - Community Issues Triage Oct 31, 2024

jrasell added the stage/waiting-reply label Nov 4, 2024

jrasell removed the stage/waiting-reply label Nov 4, 2024

jrasell added theme/client theme/metrics stage/needs-investigation hcc/jira labels Nov 6, 2024

jrasell moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Nov 6, 2024

dmclf added a commit to dmclf/nomad-debug that referenced this issue Nov 8, 2024

initial commit

bef07fd

- ugly - hopefully can help with hashicorp/nomad#24339

dmclf added a commit to dmclf/nomad-debug that referenced this issue Nov 10, 2024

results for tag 1.9.1 / commit e5e26c8eb34c40263d1557f7e6f948c92fc57ca

7dac719

related to hashicorp/nomad#24339

dmclf added a commit to dmclf/nomad-debug that referenced this issue Nov 12, 2024

add very basic non-docker testing for hashicorp/nomad#24339

1d1993a

(cleanup after yourself)

Juanadelacuesta mentioned this issue Nov 21, 2024

[gh-24339] Move from streaming stats to polling for docker #24525

Merged

6 tasks

Juanadelacuesta added a commit that referenced this issue Nov 21, 2024

[gh-24339] Move from streaming stats to polling for docker (#24525)

a9e7166

* fix: dont stream the docker stats, read them one by one * func: add a NewSafeTicker to the herlper functions * style: remove commented code

hc-github-team-nomad-core mentioned this issue Nov 21, 2024

Backport of [gh-24339] Move from streaming stats to polling for docker into release/1.9.x #24527

Merged

6 tasks

SCORP111 mentioned this issue Nov 27, 2024

No metrics-values for nomad.client.allocs.memory.cache and wrong values of resource utilization in the nomad-ui after upgrading to Ubuntu 24.04 #24524

Open

jrasell self-assigned this Nov 29, 2024

valeriansaliou mentioned this issue Dec 10, 2024

Docker metrics collector fail to parse JSON from Docker API due to buffering multiple JSON payloads #24639

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocation Metrics no longer emitted #24339

Allocation Metrics no longer emitted #24339

dosera commented Oct 31, 2024 •

edited

Loading

Himura2la commented Nov 1, 2024 •

edited

Loading

jrasell commented Nov 4, 2024

Himura2la commented Nov 4, 2024 •

edited

Loading

jedd commented Nov 5, 2024 •

edited

Loading

jrasell commented Nov 6, 2024

jedd commented Nov 6, 2024

dmclf commented Nov 7, 2024 •

edited

Loading

dmclf commented Nov 8, 2024

jedd commented Nov 9, 2024

dmclf commented Nov 9, 2024

dmclf commented Nov 9, 2024

dmclf commented Nov 11, 2024

dmclf commented Nov 11, 2024

jedd commented Nov 11, 2024

dmclf commented Nov 11, 2024 •

edited

Loading

jedd commented Nov 11, 2024

dmclf commented Nov 11, 2024

dmclf commented Nov 11, 2024 •

edited

Loading

dmclf commented Nov 12, 2024 •

edited

Loading

jedd commented Nov 12, 2024

dmclf commented Nov 12, 2024 •

edited

Loading

jrasell commented Nov 12, 2024

valeriansaliou commented Dec 6, 2024 •

edited

Loading

Allocation Metrics no longer emitted #24339

Allocation Metrics no longer emitted #24339

Comments

dosera commented Oct 31, 2024 • edited Loading

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Expected Result

Actual Result

Himura2la commented Nov 1, 2024 • edited Loading

jrasell commented Nov 4, 2024

Himura2la commented Nov 4, 2024 • edited Loading

jedd commented Nov 5, 2024 • edited Loading

jrasell commented Nov 6, 2024

jedd commented Nov 6, 2024

dmclf commented Nov 7, 2024 • edited Loading

distorted picture for several nomad metrics for allocations

non-nomad stats, ie, cadvisor / node-exporters are clearly more consistent

nomad-metrics process_virtual_memory_bytes / nomad_client_host_memory_free

curl testing

curl comparison 1.9.1 vs 1.8.3

dmclf commented Nov 8, 2024

jedd commented Nov 9, 2024

dmclf commented Nov 9, 2024

dmclf commented Nov 9, 2024

dmclf commented Nov 11, 2024

dmclf commented Nov 11, 2024

jedd commented Nov 11, 2024

dmclf commented Nov 11, 2024 • edited Loading

jedd commented Nov 11, 2024

dmclf commented Nov 11, 2024

dmclf commented Nov 11, 2024 • edited Loading

dmclf commented Nov 12, 2024 • edited Loading

jedd commented Nov 12, 2024

dmclf commented Nov 12, 2024 • edited Loading

jrasell commented Nov 12, 2024

valeriansaliou commented Dec 6, 2024 • edited Loading

dosera commented Oct 31, 2024 •

edited

Loading

Himura2la commented Nov 1, 2024 •

edited

Loading

Himura2la commented Nov 4, 2024 •

edited

Loading

jedd commented Nov 5, 2024 •

edited

Loading

dmclf commented Nov 7, 2024 •

edited

Loading

dmclf commented Nov 11, 2024 •

edited

Loading

dmclf commented Nov 11, 2024 •

edited

Loading

dmclf commented Nov 12, 2024 •

edited

Loading

dmclf commented Nov 12, 2024 •

edited

Loading

valeriansaliou commented Dec 6, 2024 •

edited

Loading