receive: Why load distribution in Thanos Receivers are not evenly distributed? #3794
Replies: 7 comments 1 reply
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
-
This is actually good and expected behavior. The same time series should always hash to the same node in a hashring of a given size. This means that for a given set of time series, a hashring should always expect roughly the same load distribution. The fact that one replica sees consistently higher load than the others is most likely due to some inherent lumpiness in the time series being sent. Thanos decides which replica should ingest data by hashing the name and label-value pairs of a time series and picking a corresponding replica from the ring to handle this hash. It seems that the data being sent simply has more time series that hash to one replica. There is no guarantee that data will be distributed uniformly across replicas, however, for a good hash function, the greater the number of time series and the more random the names and labels of those time series, statistically, we should coverage towards a uniform distribution. Unfortunately I don't think there is actually a bug here :/ take for example the case where a Prometheus server only produces a single time series and remote writes a million samples/second. We would rightfully expect super high load on a single hashring replica. This is essentially an extreme version of the case we see here. One thing I could imagine would be to submit a feature proposal to improve the randomness of replica selection and thus drive load distribution towards uniformity even in the degenerate single-time-series case by adding a random value to the hash that changes for every request. It would be important that replicas forward this metadata along with the request so that once it is set, the value is fixed and replicas can agree on who should ultimately ingest a sample. WDYT? This would help solve load distribution for lumpy data. |
Beta Was this translation helpful? Give feedback.
-
Thanos, Prometheus and Golang version used:
Thanos v15.0
Object Storage Provider:
S3
What happened:
I have installed - 3 Thanos Receivers, 1 compactor, 1 store, 2 querier.
1 pod is installed on each 1 node server.
When I re-installed the Thanos Receivers, 1 pod consumes 2x the memory of the other 2 pods. The problem with this is that, since these 3 pods of Thanos Receivers are part of a single hashring, the 3rd pod (thanos-receive-2) gets oomkilled immediately when the other 2 pods still have enough resource space and no metrics are being shown in thanos query.
Note: I did re-install many times, but the behavior is still the same, the 3rd pod still consumes twice the memory than the other 2 pods. I did check the pod itself and there is no other app running on that pod and there is no error in the logs.
What you expected to happen:
The load/memory should be at least (close) distributed evenly.
Beta Was this translation helpful? Give feedback.
All reactions