You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes a new collector gets deployed and it doesn't work, or more commonly it only works on a small subset of hosts and it doesn't properly exit(13) on the hosts where it's not supposed to run. What would be nice is to have a dead-simple karma point system:
When the collector is first discovered and first started, it gets X karma points.
Each time the collector crashes, it loses C karma points.
Every N seconds that elapse, the collector gains G karma points, up to an upper bound of Gmax points.
Whenever a collector crashes, we check its karma, if it's negative, we mark it as dead and don't restart it anymore
The idea is that if a collector crashes too often, we want to give up on it, instead of spamming the logs. But if a collector has been up for a while, and all of a sudden it starts crashing a few times in a row, it's worth trying some more before giving up on it.
The text was updated successfully, but these errors were encountered:
Sometimes a new collector gets deployed and it doesn't work, or more commonly it only works on a small subset of hosts and it doesn't properly
exit(13)
on the hosts where it's not supposed to run. What would be nice is to have a dead-simple karma point system:The idea is that if a collector crashes too often, we want to give up on it, instead of spamming the logs. But if a collector has been up for a while, and all of a sudden it starts crashing a few times in a row, it's worth trying some more before giving up on it.
The text was updated successfully, but these errors were encountered: