-
We are running multiple pushgateway pods in our k8s clusters for redundancy and use k8s service to load balance the requests. However, this makes querying metrics difficult because each pushgateway expose the metrics it received and pushing metrics only update one metrics. Is there recommended way to run multiple pushgateway instances for redundancy and get only lastly-pushed metrics like running one pushgateway instance. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
tl:dr: Sadly, there is no good way of running multiple Pushgateway instances in a proper HA mode. Details: I guess you could just push to n Pushgateway instances, scrape them all, and then do some PromQL magic to extract the relevant metric. But this is quite specific to what you are querying and quite cumbersome. Generally, the Pushgateway is meant for things like infrequently occurring batch jobs (e.g. your daily DB backup), for which you would set up merely ticketing alerts and not pages that wake someone up. In those cases, it's also OK to not make a broken Pushgateway a page-worthy issue. Then it's kind of OK to rely on K8s to replace a dead instance of the Pushgateway eventually, and manually intervene during work hours if something breaks hard and brings the Pushgateway down for good. This is of course not great. But the Pushgateway is fundamentally designed for a fairly niche use case in a fairly simplistic way. The many problems users run into with it are often caused by using the Pushgateway for more involved use cases for which it was never intended, including but not limited to monitoring serverless applications, pushing metrics with the expectation of true persistence/guaranteed delivery, turning Prometheus into a push-based metrics collection system, distributed counting, … |
Beta Was this translation helpful? Give feedback.
-
Relevant discussion (which happened in an issue before we had GH discussions for this repo): #241 |
Beta Was this translation helpful? Give feedback.
-
I think this can be achieve with some sorcery. What's needed is a NFS so you can store the persistence.file and allow multiple r/w although you will prevent concurrent writes. You can run an active and inactive pushgateway instance and setup a failover where traffic gets routed to the inactive instance, technically making it active, when the active instance goes down. There's many tools out there that can achieve this. You won't be able to achieve 100% uptime but can get close depending on the failover capability. Since you are using k8, here's a similar pattern that's already implemented : https://wdmartins.medium.com/active-passive-kubernetes-deployment-for-high-availability-stateful-applications-b7e6fa068944 |
Beta Was this translation helpful? Give feedback.
tl:dr: Sadly, there is no good way of running multiple Pushgateway instances in a proper HA mode.
Details:
I guess you could just push to n Pushgateway instances, scrape them all, and then do some PromQL magic to extract the relevant metric. But this is quite specific to what you are querying and quite cumbersome.
Generally, the Pushgateway is meant for things like infrequently occurring batch jobs (e.g. your daily DB backup), for which you would set up merely ticketing alerts and not pages that wake someone up. In those cases, it's also OK to not make a broken Pushgateway a page-worthy issue. Then it's kind of OK to rely on K8s to replace a dead instance of the Pushgateway eventually, an…