-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hanging LVMoISCSISR processes during snapshot operations #285
Comments
Are you snapshotting all the VMs at once? This is something we avoided in Xen Orchestra since a while, because all LVM related SR (local LVM, HBA, LVMoiSCSI) are prone to race conditions. So XO is doing snapshots 2 by 2. I would also suggest to use XO or to improve your script to avoid this. Another alternative is to use a file based SR, which are far more stable in those conditions. |
We loop over all running VMs and only do one at the time, so that hopefully that avoids any race conditions. The script is in use since xen server 6.X, but we only noticing this behaviour since we upgraded from xen server 7.3 to xcp-ng 8. Could this be a bug? |
We aren't aware of a behavior change. It might be related to your iSCSI configuration that was reset, like timeouts or max requests etc. Double check there and modify the values. We strongly suggest to switch to NFS if you can. |
Changing from iSCSI to NFS does not seem like a solution for the problem, no offence :) Our storage subsystem does not support it anyways. I found a similar issue here: I still suspect it to be a bug. |
Switching to NFS is a solution to get rid of the mess regarding how SMAPIv1 is written and how it deals with iSCSI. We have almost 0 support ticket in XCP-ng Support related to NFS, unlike a usual flow of people having problems on LVMoiSCSI. They probably introduced an even worse bug, but as this shitty code is full of race conditions everywhere, it might be anything. If @BogdanRudas want to share the fix you had from Citrix with us, we could probably apply the patch on your XCP-ng install and see the result. |
Hi! There is XS71ECU2019 for zombies, however coalescing still bad. I guess that every that zombie is caused by failed coalescing attempt. |
XS71ECU2019 is already on the bugtracker - we're at least 2, where the patch isn't helping! |
Watching the ticket, let's see how it goes. |
@nagilum99 I had some host died because zombies exhausted PID limit in dom0 (32k), thus a patch rather better then nothing. However I still have to do vm-pause for certain VMs to let coalescing do it's job on underloaded SSD RAID. I'm nearly sure that thing were good prior at the level of XS71ECU2007 |
It did not help at all - over here. |
@nagilum99 I have a call with Citrix support rep. regarding coalescing issues and will have another case# for that. As I understand current implementation can't handle moderate write workload during coalescing. So for current 7.6 installation (both Citrix and XCP-ng) I'm going to keep on current version unless some improvement will come from either side. |
Can you confirm this issue is only on 8.0 and not on 7.6? |
See also #298 We released an update for 8.0 which solves most of the issue though the last element of the leaf chain remains. It's based on this patch this patch which is the same patch as what was released by Citrix in the latest 7.1 LTS hotfix. |
After installing the latest xcp-ng 8.0 rpms I can confirm we have no more LVMoISCSISR zombies. |
Closing as per last comment. |
We use a simple shell script to create snapshots of our vms and export them to have VM Image backup (basically
xe vm-snapshot
,xe vm-export
,xe vm-uninstall
). Since migrating to xcp-ng 8 we observe hanging LVMoISCSISR processes and lots of defuncts of the same.The text was updated successfully, but these errors were encountered: