Replies: 2 comments 4 replies
-
Been there and this is not the way to do it. What you did is enable host HA and out-of-band management, not VM HA, and host HA, to my knowledge, does not work as it's intended. What you want is for the VMs to start on a different host in case of failure of the original host. For that, you need to have the offering with HA and a nfs primary storage in disabled mode named HA, a simple folder with this name shared via nfs. That's all. You'll see that in the event of host failure, being it power related, network, etc., the vms that are on that host and are HA enabled will power up on other hosts after some time - no idea where to configure these timings, for me the HA process starts well after 10-12 minutes after the host failure. |
Beta Was this translation helpful? Give feedback.
-
We run into the same situation and did not understand HA in Cloudstack correctly, too. It is really hard for CS to be sure that a host is really down. There are so many situations where the management server is unable to connect to the host, but all VMs are still running. If CS no is trying to start the VMs on other hosts it will end in a mess.
Even if possible, we are not going to automate this. We want control over this and one big reason is that we also run SDS (linstor) on all hosts and so it will impact our primary storage, too. Hope that helps! |
Beta Was this translation helpful? Give feedback.
-
To ensure the availability of virtual machines and enable the virtual machines on a physical host to be automatically recovered on other hosts in case of a physical host failure, I conducted the following tests:
1. compute offer config
Enabled HA in the compute offer, and now "Offer HA" is set to true
2. cluster config
Enabled HA in the cluster, and the result is as shown.
3. host config
For the host, configured out-of-band management, ensured the power status showing on the web is on, configured "Configure HA" and selected "KVMHAProvider" as the Provider, and then enabled HA. The result is as follows.
4. do the test
Recently, I shut down the host using stop power via iDRAC. After that, the instance on the host cannot be logged into, but its status remains "Running". Approximately 5 minutes later, its status is still "Running" and cannot log in. The instance show with the message: "The Control Plane Status of this
Instance is Offline. Some actions on this Instance will fail, if so please wait a while and retry
result
In my opinion, this instance should been restart on another host. But it was not.
Is there any issues with my configuration?
Thanks a lot.
Beta Was this translation helpful? Give feedback.
All reactions