-
Notifications
You must be signed in to change notification settings - Fork 100
/
ChangeLog
349 lines (325 loc) · 16.6 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
Latest:
------
For even more detail, use "git log" or visit https://github.com/LINBIT/drbd/commits/master.
9.1.23 (api:genl2/proto:86-101,118-121/transport:18)
--------
* Fix a corner case that can happen when DRBD establishes multiple
connections in parallel, which could lead one connection to end up in
an inconsistent replication state of WFBitMapT/Established
* Fix a corner case in which a reconciliation resync ends up in
WFBitMapT/Established
* Restrict protocol compatibility to the most recent 8.4 and 9.0 releases
* Fix a corner case causing a module ref leak on drbd_transport_tcp;
if it hits, you can not rmmod it
* rate-limit resync progress while resync is paused
* resync-target inherits history UUIDs when resync finishes,
this can prevent unexpected "unrelared data" events later
* Updated compatibility code for Linux 6.11 and 6.12
9.1.22 (api:genl2/proto:86-121/transport:18)
--------
* Upgrade from partial resync to a full resync if necessary when the
user manually resolves a split-brain situation
* Fix a potential NULL deref when a disk fails while doing a
forget-peer operation.
* Fix a rcu_read_lock()/rcu_read_unlock() imbalance
* Restart the open() syscall when a process auto promoting a drbd device gets
interrupted by a signal
* Remove a deadlock that caused DRBD to connect sometimes
exceptionally slow
* Make detach operations interruptible
* Added dev_is_open to events2 status information
* Improve log readability for 2PC state changes and drbd-threads
* Updated compability code for Linux 6.9
9.1.21 (api:genl2/proto:86-121/transport:18)
--------
* fix a deadlock that can trigger when deleting a connection and
another connection going down in parallel. This is a regression of
9.1.20
* Fix an out-of-bounds access when scanning the bitmap. It leads to a
crash when the bitmap ends on a page boundary, and this is also a
regression in 9.1.20.
9.1.20 (api:genl2/proto:86-121/transport:18)
--------
* Fix a kernel crash that is sometimes triggered when downing drbd
resources in a specific, unusual order (was triggered by the
Kubernetes CSI driver)
* Fix a rarely triggering kernel crash upon adding paths to a
connection by rehauling the path lists' locking
* Fix the continuation of an interrupted initial resync
* Fix the state engine so that an incapable primary does not outdate
indirectly reachable secondary nodes
* Fix a logic bug that caused drbd to pretend that a peer's disk is
outdated when doing a manual disconnect on a down connection; with
that cured impact on fencing and quorum.
* Fix forceful demotion of suspended devices
* Rehaul of the build system to apply compatibility patches out of
place that allows one to build for different target kernels from a
single drbd source tree
* Updated compability code for Linux 6.8
9.1.19 (api:genl2/proto:86-121/transport:18)
--------
* Fix a resync decision case where drbd wrongly decided to do a full
resync, where a partial resync was sufficient; that happened in a
specific connect order when all nodes were on the same data
generation (UUID)
* Fix the online resize code to obey cached size information about
temporal unreachable nodes
* Fix a rare corner case in which DRBD on a diskless primary node
failed to re-issue a read request to another node with a backing
disk upon connection loss on the connection where it shipped the
read request initially
* Make timeout during promotion attempts interruptible
* No longer write activity-log updates on the secondary node in a
cluster with precisely two nodes with backing disk; this is a
performance optimization
* Reduce CPU usage of acknowledgment processing
9.1.18 (api:genl2/proto:86-121/transport:18)
--------
* Fixed connecting nodes with differently sized backing disks,
specifically when the smaller node is primary, before establishing
the connections
* Fixed thawing a device that has I/O frozen after loss of quorum
when a configuration change eases its quorum requirements
* Properly fail TLS if requested (only available in drbd-9.2)
* Fixed a race condition that can cause auto-demote to trigger right
after an explicit promote
* Fixed a rare race condition that could mess up the handshake result
before it is committed to the replication state.
* Preserve "tiebreaker quorum" over a reboot of the last node (3-node
clusters only)
* Update compatibility code for Linux 6.6
9.1.17 (api:genl2/proto:86-121/transport:18)
--------
* fix a potential crash when configuring drbd to bind to a
non-existent local IP address (this is a regression of drbd-9.1.8)
* Cure a very seldom triggering race condition bug during
establishing connections; when you triggered it, you got an OOPS
hinting to list corruption
* fix a race condition regarding operations on the bitmap while
forgetting a bitmap slot and a pointless warning
* Fix handling of unexpected (on a resource in secondary role) write
requests
* Fix a corner case that can cause a process to hang when closing the
DRBD device, while a connection gets re-established
* Correctly block signal delivery during auto-demote
* Improve the reliability of establishing connections
* Do not clear the transport with `net-options --set-defaults`. This
fix avoids unexpected disconnect/connect cycles upon an `adjust`
when using the 'lb-tcp' or 'rdma' transports in drbd-9.2.
* New netlink packet to report path status to drbdsetup
* Improvements to the content and rate-limiting of many log messages
* Update compatibility code and follow Linux upstream development
until Linux 6.5
9.1.16 (api:genl2/proto:86-121/transport:18)
--------
* shorten times DRBD keeps IRQs on one CPU disabled. Could lead
to connection interruption under specific conditions
* fix a corner case where resync did not start after resync-pause
state flapped
* fix online adding of volumes/minors to an already connected resource
* fix a possible split-brain situation with quorum enabled with
ping-timeout set to (unusual) high value
* fix a locking problem that could lead to kernel OOPS
* ensure resync can continue (bitmap-based) after interruption
also when it started as a full-resync first
* correctly handle meta-data when forgetting diskless peers
* fix a possibility of getting a split-brain although quorum enabled
* correctly propagate UUIDs after resync following a resize operation.
Consequence could be a full resync instead of a bitmap-based one
* fix a rare race condition that can cause a drbd device to end up
with WFBitMapS/Established replication states
9.1.15 (api:genl2/proto:86-121/transport:18)
--------
* fix how flush requests are marked when submitted to the Linux IO
stack on the secondary node
* when establishing a connection failed with a two-pc timeout, a
receiver thread deadlocked, causing drbdsetup calls to block on
that resource (difficult to trigger)
* fixed a NULL-ptr deref (a OOPS) caused by a rare race condition
while taking a resource down
* fix a possible hard kernel-lockup, can only be triggerd when a
CPU-mask is configured
* updated kernel compatibility to at least Linux head and also fixed
a bug in the compat checks/rules that caused OOPSes of the previous
drbd releases when compiled with Linux-6.2 (or on RHEL 9.2 kernel).
* fix an aspect of the data-generation (UUID) handling where DRBD
failed to do a resync when a diskless node in the remaining
partition promotes and demotes while a diskful node is isolated
* fix an aspect of the data-generation (UUID) handling where DRBD
considered a node to have unrelated data; this bug was triggered by
a sequence involving removing two nodes from a cluster and readding
one with the "day-0" UUIDs.
* do not block specific state changes (promote, demote, attach, and
detach) when only some nodes add a new minor
9.1.14 (api:genl2/proto:86-121/transport:18)
--------
* fix a race with concurrent promotion and demotion, which can
lead to an unexpected "split-brain" later on
* fix a specific case where promotion was allowed where it should not
* fix a race condition between auto-promote and a second two-phase
commit that can lead to a DRBD thread locking up in an endless loop
* fix several bugs with "resync-after":
- missing resync-resume when minor numbers run in opposite
direction as the resync-after dependencies
- a race that might lead to an OOPS in add_timer()
* fix an OOPS when reading from in_flight_summary in debugfs
* fix a race that might lead to an endless loop of printing
"postponing start_resync" while starting a resync
* fix diskless node with a diskfull with a 4KiB backend
* simplify remembering two-pc parents, maybe fixing a one-time-seen bug
* derive abort_local_transaction timeout from ping-timeout
9.1.13 (api:genl2/proto:86-121/transport:18)
--------
* when calculating if a partition has quorum, take into account if
the missing nodes might have quorum
* fix forget-peer for diskless peers
* clear the resync_again counter upon disconnect
* also call the unfence handler when no resync happens
* do not set bitmap bits when attaching to an up-to-date disk (late)
* work on bringing the out-of-tree DRBD9 closer to DRBD in the upstream
kernel; Use lru_cahche.ko from the installed kernel whenever possible
9.1.12 (api:genl2/proto:86-121/transport:18)
--------
* fix a race that could result in connection attempts getting aborted
with the message "sock_recvmsg returned -11"
* rate limit messages in case the peer can not write the backing storage
and it does not finish the necessary state transitions
* reduced the receive timeout during connecting to the intended 5 seconds
(ten times ping-ack timeout)
* losing the connection at a specific point in time during establishing
a connection could cause a transition to StandAlone; fixed that, so
that it keeps trying to connect
* fix a race that could lead to a fence-peer handler being called
unexpectedly when the fencing policy is changed at the moment before
promoting
9.1.11 (api:genl2/proto:86-121/transport:18)
--------
* The change introduced with 9.1.10 created another problem that might
lead to premature request completion (kernel crash); reverted that
change and fix it in another way
9.1.10 (api:genl2/proto:86-121/transport:18)
--------
* fix a regression introduced with 9.1.9; using protocol A on SMP
with heavy IO can might cause kernel crash
9.1.9 (api:genl2/proto:86-121/transport:18)
--------
* fix a mistake in the compat generation code; it broke DRBD on
partitions on kernel older than linux 5.10 (this was introduced
with drbd-9.1.8; not affected: logical volumes)
* fix for a bug (introduced with drbd-9.0.0), that caused possible
inconsistencies in the mirror when using the 'resync-after' option
* fix a bug that could cause a request to get stuck after an unlucky
timing with a loss of connection
* close a very small timing window between connect and promote that
could lead to the new-current-uuid not being transmitted to the
concurrently connecting peer, which might lead to denied connections
later on
* fix a recently introduced OOPS when adding new volumes to a
connected resource
* fix online attach when the connection to a 3rd node is down
9.1.8 (api:genl2/proto:86-121/transport:18)
--------
* restore protocol compatibility with drbd-8.4
* detect peers that died silently when starting a two-phase-commit
* correctly abort two-phase-commits when a connection breaks between
phases 1 and 2
* allow re-connect to a node that was forced into secondary role and
where an opener is still present from the last time it was primary
* fix a race condition that allowed to configure two peers with the
same node id
* ensure that an open() call fails within the auto-promote timeout
if it can not succeed
* build fixes for RHEL9
* following upstream changes to DRBD up to Linux 5.17 and updated compat
9.1.7 (api:genl2/proto:110-121/transport:18)
--------
* avoid deadlock upon trying to down an io-frozen DRBD device that
has a file system mounted
* fix DRBD to not send too large resync requests at multiples of 8TiB
* fix for a "forgotten" resync after IO was suspended due to lack of quorum
* refactored IO suspend/resume fixing several bugs; the worst one could
lead to premature request completion
* disable discards on diskless if diskful peers do not support it
* make demote to secondary a two-phase state transition; that guarantees that
after demotion, DRBD will not write to any meta-data in the cluster
* enable "--force primary" in for no-quorum situations
* allow graceful recovery of primary lacking quorum and therefore
have forzen IO requests; that includes "--force secondary"
* following upstream changes to DRBD up to Linux 5.15 and updated compat
9.1.6 (api:genl2/proto:110-121/transport:17)
--------
* fix IO to internal meta-data for backing device larger than 128TB
* fix resending requests towards diskless peers, this is relevant when
fencing is enabled, but the connection is re-established before fencing
succeeds; when the bug triggered it lead to "stuck" requests
* remove lockless buffer pages handling; it still contained very hard to
trigger bugs
* make sure DRBD's resync does not cause unnecessary allocation in
a thinly provisioned backing device on a resync target node
* avoid unnecessary resync (or split-brain) due to a wrongly generated
new current UUID when an already IO frozen DBRD gets new writes
* small fix to autopromote, when an application tries a read-only open
before it does a read-write open immediately after the peer primary
vanished ungracefully
* split out the secure boot key into a package on its own, that is
necessary to allow installation of multiple drbd kernel module packages
* Support for building containers for flacar linux
9.1.5 (api:genl2/proto:110-121/transport:17)
--------
* merged all changes from drbd-9.0.32
- fix a read-access-after-free, that could cause an OOPs; triggers with
an unusual configuration with a secondary having a smaller AL than
the primary or a diskless primary and heavy IO
- avoid a livelock that can cause long IO delays during resync on a
primary resync-target node
- following upstream changes to DRBD up to Linux 5.14 and updated compat
(including RHEL9-beta)
- fix module override for Oracle-Linux
* fixed a locking regression of the 9.1 branch, only relevant in
the moment a local backing device delivers an IO error to drbd
* removed compat support for kernel older than Linux-3.10 (RHEL7)
* code cleanups and refactoring
9.1.4 (api:genl2/proto:110-121/transport:17)
--------
* merged all changes from drbd-9.0.31
* enabled dynamic debug on some additional log messages
* remove (broken) write conflict resolution, replace it with warning
about the fact
* debugfs entry for the interval tree
9.1.3 (api:genl2/proto:110-120/transport:17)
--------
* merged all fixes from drbd-9.0.30-0rc1
* fix a corner-case NULL deref in the lockless buffer pages handling; the
regression was introduced with 9.1.0 (released Feb 2021); To my knowledge
it took 6 months until someone triggered it for the first time
* fix sending a P_PEERS_IN_SYNC packet into a fresh connection (before
handshake packets); this problem was introduced when the drbd-8.x
compatibility code was removed
* remove sending a DRBD-barrier packet when processing a REQ_PREFLUSH
request, that improves IO-depth and improves performance with that
9.1.2 (api:genl2/proto:110-120/transport:17)
--------
* merged all fixes from drbd-9.0.29; other than that no changes in this branch
9.1.1 (api:genl2/proto:110-119/transport:17)
--------
* fix a temporal deadlock you could trigger when you exercise promotion races
and mix some read-only openers into the test case
* fix for bitmap-copy operation in a very specific and unlikely case where
two nodes do a bitmap-based resync due to disk-states
* fix size negotiation when combining nodes of different CPU architectures
that have different page sizes
* fix a very rare race where DRBD reported wrong magic in a header
packet right after reconnecting
* fix a case where DRBD ends up reporting unrelated data; it affected
thinly allocated resources with a diskless node in a recreate from day0
event
* changes to socket buffer sizes get applied to established connections immediately;
before it was applied after a re-connect
* add exists events for path objects
* fix a merge-mistake that broke compatibility with 5.10 kernels
9.1.0 (api:genl2/proto:110-119/transport:16)
--------
* was forked off from drbd 9.0.19
* has all changes up to 9.0.28-1
* locking in the IO-submit code path was considerably improved,
allowing multiple CPU to submit in parallel