You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On the real world cloud based stateful platform, host underlying the Pinot container would run in dynamic status. Host/Node replacement is very frequent. Such operation ideally should be fully transparent to users even without Pinot admins' awareness.
However, Pinot nowadays, does not have a really graceful (enough) way to handle the node replacement. Though it is usually with multiple replicas, running in a under replication status would make the system stressful and risky. For example, if the table is with 2 replicas, during node replacement, we have to experience:
Many segments are only under 1 replica, the query load on it would go double.
For segments running with 1 replica, we are experiencing a very high risk that the data might lose or query might fail if another node goes down due to hardware or network issues.
Though we would experience same issue during node restart, node replacement is far slower than node restart especially with high data volume. For example, for a large node with 5+TB data, the entire single node replacement might take 5+ hours to complete. This is far longer than the node restart which might only take minutes. The longer the node replacement is, the longer node downtime we have to endure, the higher the risk is introduced.
Hence, reducing node replacement downtime is crucial for smooth large scale production maintenance.
The speed is far slower than a node restart because the node has to download the missing segment data from either deep store or peers before loading them into memory.
Therefore, a straightforward and effective way to reduce the downtime is that, before bringing down the old node (ON), we'd better pre-download all required segments on the new node (NN). Afterwards, bringing down the ON and starting up the NN would be same as the lightweight node restart.
The text was updated successfully, but these errors were encountered:
On the real world cloud based stateful platform, host underlying the Pinot container would run in dynamic status. Host/Node replacement is very frequent. Such operation ideally should be fully transparent to users even without Pinot admins' awareness.
However, Pinot nowadays, does not have a really graceful (enough) way to handle the node replacement. Though it is usually with multiple replicas, running in a under replication status would make the system stressful and risky. For example, if the table is with 2 replicas, during node replacement, we have to experience:
Though we would experience same issue during node restart, node replacement is far slower than node restart especially with high data volume. For example, for a large node with 5+TB data, the entire single node replacement might take 5+ hours to complete. This is far longer than the node restart which might only take minutes. The longer the node replacement is, the longer node downtime we have to endure, the higher the risk is introduced.
Hence, reducing node replacement downtime is crucial for smooth large scale production maintenance.
During the downtime, we would observe
The speed is far slower than a node restart because the node has to download the missing segment data from either deep store or peers before loading them into memory.
Therefore, a straightforward and effective way to reduce the downtime is that, before bringing down the old node (ON), we'd better pre-download all required segments on the new node (NN). Afterwards, bringing down the ON and starting up the NN would be same as the lightweight node restart.
The text was updated successfully, but these errors were encountered: