-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in drbd 9.1.5 on CentOS 7 #26
Comments
With 9.1.4 all right as before. "Online verify done" without errors. modinfo drbd |
I can confirm this problem with kmod-drbd90-9.1.5-1.el7_9.elrepo.x86_64 with both kernel-3.10.0-1160.53.1 and kernel-3.10.0-1160.49.1. Also that it occurs only for raid10 mdraid devices. The previous 9.1.4 kmod works with both of those kernels plus the latest kernel-3.10.0-1160.59.1. No errors are logged for the raid10 device when using the 9.1.4 kmod, only when using the 9.1.5 version. |
Does this issue also occur with 9.1.6? Does it occur when you build DRBD from a release tarball instead of using the elrepo packages? |
TLDR : DRBD 9.1.7 NO-GO on CentOS7 if using raid10 md arrays for the underlying drbd disk. Just want to state I have the exact same issue. My cluster failed after upgrading from 9.0.x(don't know the exact version) to 9.1.7. Running on 3.10.0-1160.66.1.el7.x86_64 kernel. [ 1996.269915] drbd storage: Committing cluster-wide state change 2875711901 (0ms) I also tried compiling it from sources, same issue. Tried on Rocky Linux 8, works like a charm. Writing this in case anyone encounters same issues as me and hope they will not lose 20 hours of debugging as I did. |
I'm still getting the "make_request bug: can't convert block across chunks or bigger than 512k" with Centos 7 kernel 3.10.0-1160.71.1 using kernel module from DRBD 9.1.7 when the backing store is an LVM partition where the PV is a md RAID device. Also, the 9.1.7 kmod has the same error with several previous versions of the Centos kernel. By contrast, the 9.1.4 kmod works with all Centos kernels since at least 3.10.0-1160.49.1. Btw, other LVs in that same VG (also on the same md RAID10 PV), have no issues. Only the LVs being used as DRBD backing storage and only with DRBD > 9,1,4. Something in newer DRBD versions kmod breaks md devices. I see no LVM messages, only the md error, and of course that leads to the DRBD messages re. moving from UpToDate to Failed to Diskless. |
Update: This issue still persists in 9.1.12. The issue always starts with a md raid10 error: Info on the DRBD device (and underlying LVM and md raid10 devices) causing the issue for me is below. Note that other LVs on this same raid10 PV that are locally mounted or used for iSCSI (ie. not used for DRBD backing storage) work just fine. Also note that the raid10 device chink size is 512k [root@cnode3 drbd.d]# uname -r [root@cnode3 drbd.d]# cat r13_access_home.res [root@cnode3 ~]# lvdisplay /dev/vg_b/lv_access_home open 2LV Size 350.00 GiB
[root@cnode3 ~]# pvdisplay /dev/md125 [root@cnode3 ~]# mdadm -D /dev/md125
Working Devices : 4
Consistency Policy : bitmap
|
This bug still persists in 9.1.13, with a caveat. As of 9.1.13 it works with a md raid10 backing device as long as the DRBD device is secondary and resync works at startup. However, when the DRBD device is made primary, the same errors persists. Tested on Centos 7 with latest kernel 3.10.0-1160.83.1.el7. Kernel log messages: Mar 1 08:43:12 cnode2 kernel: drbd drbd_access_home: Preparing cluster-wide state change 3028602266 (1->-1 3/1) |
I faced similiar issue with 9.1.16. |
Hello.
I don't sure it's drbd problem, but, after I upgrade packages kmod-drbd90-9.1.4 -> kmod-drbd90-9.1.5 from elrepo. I have а error in my message on a md raid.
My block stack is:
mdraid -> lvm -> drbd -> vdo -> lvm
I have trouble only with raid devices with chunks (usually 512K size) raid0, raid10. With raid1 no problem.
Please, could you give me a hint where could be the error?
Feb 11 02:48:58 arh kernel: md/raid10:md124: make_request bug: can't convert block across chunks or bigger than 512k 2755544 32
Feb 11 02:48:58 arh kernel: drbd r1/0 drbd2: disk( UpToDate -> Failed )
Feb 11 02:48:58 arh kernel: drbd r1/0 drbd2: Local IO failed in drbd_request_endio. Detaching...
Feb 11 02:48:58 arh kernel: drbd r1/0 drbd2: local READ IO error sector 2752472+64 on dm-3
Feb 11 02:48:58 arh kernel: drbd r1/0 drbd2: sending new current UUID: 3E82544B6FC832F1
Feb 11 02:48:59 arh kernel: drbd r1/0 drbd2: disk( Failed -> Diskless )
Feb 11 02:48:59 arh kernel: drbd r1/0 drbd2: Should have called drbd_al_complete_io(, 4294724168, 4096), but my Disk seems to have failed :(
After this, the primary worked in diskless mode. If primary on raid1, all works normal, and secondary UpToDate, even if the secondary is on a raid0.
drbd90-utils-9.19.1-1.el7.elrepo.x86_64
kmod-drbd90-9.1.5-1.el7_9.elrepo.x86_64
I don't try revert to kmod-9.1.4 yet, but with previous kernel and 9.1.5 I get the same.
The text was updated successfully, but these errors were encountered: