|
| 1 | +.. contents:: Table of contents |
| 2 | + :depth: 3 |
| 3 | + |
| 4 | +Overview |
| 5 | +======== |
| 6 | + |
| 7 | +Ploop library provides API to manage image files in **ploop** or **qcow2** format. |
| 8 | +Device mapper layer is used to crate block device and work with image as a device, |
| 9 | + |
| 10 | +Mount |
| 11 | +================= |
| 12 | + |
| 13 | +The mount action maps image to block device. |
| 14 | + |
| 15 | +ploop |
| 16 | +----- |
| 17 | +* Create block device on ploop image |
| 18 | + |
| 19 | + $ dmsetup create <DEV> --table "0 <size> <block_size> [falloc_new_clu] ploop <fd> [... <fd>]" |
| 20 | + |
| 21 | +* Load CBT if present (see `Set CBT for device`_) |
| 22 | + |
| 23 | +qcow2 |
| 24 | +----- |
| 25 | +* Create block device on qcow2 image |
| 26 | + |
| 27 | + $ dmsetup create <DEV> --table "0 <size> qcow2 <fd>" |
| 28 | + |
| 29 | +* Load CBT if present (see `Store/load dirty bitmap to/from qcow2 image`_) |
| 30 | + |
| 31 | +Unmount |
| 32 | +======= |
| 33 | + |
| 34 | +Sync data to image file and remove block device. |
| 35 | + |
| 36 | +ploop |
| 37 | +----- |
| 38 | + |
| 39 | +* Store CBT if present (see `Get CBT from device`_) |
| 40 | +* Remove device |
| 41 | + |
| 42 | + $ dmsetup remove <DEV> |
| 43 | + |
| 44 | +qcow2 |
| 45 | +----- |
| 46 | + |
| 47 | +* Store CBT if present (see `Move bitmap from ploop to qcow2`_) |
| 48 | +* Remove device |
| 49 | + |
| 50 | + $ dmsetup remoe <DEV> |
| 51 | + |
| 52 | +Resize |
| 53 | +====== |
| 54 | + |
| 55 | +Grow |
| 56 | +---- |
| 57 | + |
| 58 | +* Grow device |
| 59 | +* resize GPT partition if exists |
| 60 | +* resize file system |
| 61 | + |
| 62 | +Shrink |
| 63 | +------ |
| 64 | +* Get balloon file fd |
| 65 | + |
| 66 | + fd = ioctl(fd, XFS_IOC_OPEN_BALLOON, 0) |
| 67 | + |
| 68 | + fd = ioctl(fd, EXT4_IOC_OPEN_BALLOON, 0 |
| 69 | + |
| 70 | +* Inflate balloon file |
| 71 | + |
| 72 | + fallocate(fd, size) |
| 73 | + |
| 74 | + |
| 75 | +Create snapshot |
| 76 | +=============== |
| 77 | + |
| 78 | +Create a checkpoint and start new changes from that point. |
| 79 | +This allows to revert to that point in time. |
| 80 | + |
| 81 | +ploop |
| 82 | +----- |
| 83 | + |
| 84 | +The create snapshot action adds extra image on top of the active |
| 85 | +image and set it as active, the previous active image became 'ro' |
| 86 | + |
| 87 | +* Create new image |
| 88 | +* Suspend device |
| 89 | + |
| 90 | + $ dmsetup suspend <DEV> |
| 91 | + |
| 92 | +* Reload device with new image |
| 93 | + |
| 94 | + $ dmsetup reload <DEV> --table "0 <size> ploop <block_size> <fd> ... <new_fd>" |
| 95 | + |
| 96 | +* resume |
| 97 | + |
| 98 | +qcow2 |
| 99 | +----- |
| 100 | +* Suspend device |
| 101 | + |
| 102 | + $ dmsetup suspend <DEV> |
| 103 | + |
| 104 | +* Create image snapshot |
| 105 | + |
| 106 | + $ qemu-img snapshot -c <UUID> driver=qcow2,file.driver=file,file.filename=<IMAGE>,file.locking=off |
| 107 | + |
| 108 | +* Reload device to apply new changes |
| 109 | + |
| 110 | + $ dmsetup reload <DEV> --table "0 <size> qcow2 <fd>" |
| 111 | + |
| 112 | +* Resume |
| 113 | + |
| 114 | + $ dmsetup resume DEV |
| 115 | + |
| 116 | +Delete snaphot |
| 117 | +============== |
| 118 | + |
| 119 | +ploop |
| 120 | +----- |
| 121 | + |
| 122 | +The delete snapshot action merges data from child to parent image. |
| 123 | +There are three cases of online snapshot deletion |
| 124 | + |
| 125 | +1. Child and parent images are 'ro' |
| 126 | + |
| 127 | + * Copy changed blocks from child to parent. |
| 128 | + * Reload device without child image |
| 129 | + * Remove child image |
| 130 | + |
| 131 | +2. Child is TOP image and there are more than two images. |
| 132 | + |
| 133 | + * Merge TOP image |
| 134 | + |
| 135 | + $ dmsetup message DEV 0 merge |
| 136 | + |
| 137 | + * Remove TOP image |
| 138 | + |
| 139 | +3. There only 2 images the BASE an the TOP. |
| 140 | + |
| 141 | + * Switch the BASE image to 'rw' mode |
| 142 | + * Set deny to resume flag on device |
| 143 | + |
| 144 | + $ dmsetup message <DEV> 0 set_noresume 1 |
| 145 | + |
| 146 | + * Suspend device |
| 147 | + * Mark base image in 'zeroed' transition state |
| 148 | + * Zero clusters in BAT of the BASE image which preset in the TOP image |
| 149 | + * Swap images, BASE will be TOP |
| 150 | + |
| 151 | + $ dmsetup message <DEV> 0 flip_upper_deltas |
| 152 | + |
| 153 | + * Drop deny to resume flag |
| 154 | + |
| 155 | + $ dmsetup message <DEV> 0 set_noresume 0 |
| 156 | + |
| 157 | + * Resume device |
| 158 | + * Merge TOP image |
| 159 | + |
| 160 | + $ dmsetup message DEV 0 merge |
| 161 | + |
| 162 | + * Remove TOP image |
| 163 | + |
| 164 | +qcow2 |
| 165 | +_____ |
| 166 | + |
| 167 | +* Suspend device |
| 168 | + |
| 169 | + $ dmsetup suspend <DEV> |
| 170 | + |
| 171 | +* Delete image snapshot |
| 172 | + |
| 173 | + $ qemu-img snapshot -d <ID> <IMAGE> |
| 174 | + |
| 175 | +* Reload device |
| 176 | + |
| 177 | + $ dmsetup reload <DEV> --table "0 <size> qcow2 <fd> |
| 178 | + |
| 179 | +* Resume device |
| 180 | + |
| 181 | + $ dmsetup resume <DEV> |
| 182 | + |
| 183 | +Switch snapshot |
| 184 | +=============== |
| 185 | + |
| 186 | +Revert to a previously created snapshot. |
| 187 | + |
| 188 | +ploop |
| 189 | +------ |
| 190 | + |
| 191 | +* Suspend device |
| 192 | + |
| 193 | + $ dmsetup suspend <DEV> |
| 194 | + |
| 195 | +* Switch image snapshot |
| 196 | + |
| 197 | + 1. create new TOP image |
| 198 | + 2. add TOP image on top of image with snapshot ID we switched on |
| 199 | + |
| 200 | +* Reload device |
| 201 | + |
| 202 | + $ dmsetup reload <DEV> --table "0 <size> qcow2 <fd> [... <top_fd>] |
| 203 | + |
| 204 | +* Resume device |
| 205 | + |
| 206 | + $ dmsetup resume <DEV> |
| 207 | + |
| 208 | +qcow2 |
| 209 | +----- |
| 210 | + |
| 211 | +* Suspend device |
| 212 | + |
| 213 | + $ dmsetup suspend <DEV> |
| 214 | + |
| 215 | +* Switch image snapshot |
| 216 | + |
| 217 | + $ qemu-img snapshot -a <ID> <IMAGE> |
| 218 | + |
| 219 | +* Reload device |
| 220 | + |
| 221 | + $ dmsetup reload <DEV> --table "0 <size> qcow2 <fd> |
| 222 | + |
| 223 | +* Resume device |
| 224 | + |
| 225 | + $ dmsetup resume <DEV> |
| 226 | + |
| 227 | + |
| 228 | +Store/load dirty bitmap to/from qcow2 image |
| 229 | +=========================================== |
| 230 | +qemu-kvm is used to manage dirty bitmap in qcow2 image. |
| 231 | + |
| 232 | +Start QEMU |
| 233 | +---------- |
| 234 | + |
| 235 | +Start QEMU with two block devices: raw ploop device, so that QEMU can get CBT by ioctl and qcow2 node (so that QEMU can store bitmaps to it). We know, that ploop is backed by same qcow2 file, but QEMU doesn't know it and consider them as different files. |
| 236 | + |
| 237 | +To pass different files we define two different fd sets. |
| 238 | + |
| 239 | +qemu-kvm -add-fd fd=10,set=1,opaque="qcow2-path" -add-fd fd=11,set=2,opaque="ploop" |
| 240 | +:: |
| 241 | + |
| 242 | + qemu-kvm -S -nodefaults -nographic \ |
| 243 | + -add-fd fd=14,set=1,opaque="ro:/path/to/ploop/device" \ # FD of ploop device. It will be used only call ioctl to get the CBT |
| 244 | + -add-fd fd=15,set=2,opaque="rw:/path/to/disk.qcow2" \ # FD of qcow2. It will be used to store the CBT into it |
| 245 | + -blockdev '{"node-name": "vz-ploop", "driver": "host_device", "filename": "/dev/fdset/1"}' \ # block-node of ploop |
| 246 | + -blockdev '{"node-name": "vz-protocol-node", "driver": "file", "filename": "/dev/fdset/2", "locking": "off"} \ # protocol node of qcow2 file. Note locking=off, as lock is held by ploop utility. Used only to create qcow2 node on top of it, we'll not manipulate with protocol node directly |
| 247 | + -blockdev '{"node-name": "vz-qcow2-node", "driver": "qcow2", "file": "vz-protocol-node", "__vz_keep-dirty-bit": true} # format node of qcow2 file. |
| 248 | + |
| 249 | +Note: |
| 250 | + |
| 251 | +* we disable locking on qcow2 file |
| 252 | +* we use __vz_keep-dirty-bit=true so that Qemu don't touch qcow2 dirty bit: don't check on start, don't reset it neither on start nor on stop. |
| 253 | +* driver: host_device is used for opening the device, not driver: file, like for regular files. |
| 254 | + |
| 255 | +Move bitmap from ploop to qcow2 |
| 256 | +------------------------------- |
| 257 | + |
| 258 | +`start QEMU`_ |
| 259 | + |
| 260 | +move CBT by qmp command |
| 261 | +:: |
| 262 | + |
| 263 | + qmp transaction { |
| 264 | + block-dirty-bitmap-add {"node": "vz-qcow2-node", "name": "UUID", "persistent": true} |
| 265 | + block-dirty-bitmap-merge { "node": "vz-qcow2-node", "target": "UUID", "bitmaps": [{"node": "vz-ploop", "name": "UUID", "__vz_pull": true}]} |
| 266 | + } |
| 267 | + |
| 268 | +Note: |
| 269 | + |
| 270 | +* persistent=true - this means that bitmap should be saved on Qemu stop. |
| 271 | + |
| 272 | +Move bitmap from qcow2 to ploop node |
| 273 | +------------------------------------ |
| 274 | + |
| 275 | +`Start QEMU`_ |
| 276 | + |
| 277 | +start CBT and set it by command: |
| 278 | +:: |
| 279 | + |
| 280 | + qmp: block-dirty-bitmap-merge { "node": "my-ploop", "target": "name-of-dirty-bitmap", "__vz_push": true, "bitmaps": [{"node": "my-qcow2-node", "name": "UUID"}]} |
| 281 | + |
| 282 | +Kernal interface to manage CBT |
| 283 | +============================== |
| 284 | + |
| 285 | +Set CBT for device |
| 286 | +------------------ |
| 287 | + |
| 288 | +1. Start CBT |
| 289 | +:: |
| 290 | + |
| 291 | + ioctl(fd, BLKCBTSTART, struct blk_user_cbt_info*ci) |
| 292 | + ci.ci_blksize is block size (usually 64K). |
| 293 | + ci.ci_uuid is CBT. |
| 294 | + The rest ci fields has to be zeroed. |
| 295 | + |
| 296 | + ERRORS: Any error is critical. |
| 297 | + |
| 298 | +2. Load CBT mask |
| 299 | +:: |
| 300 | + |
| 301 | + ioctl(fd, BLKCBTSET, struct blk_user_cbt_info *ci) |
| 302 | + ci.ci_extent_count = CBT_MAX_EXTENTS (ci.ci_extent_count is number of passed extents) |
| 303 | + ci.ci_mapped_extents is equal to ci.ci_extent_count |
| 304 | + ci.ci_extents are array of dirty extents you want to pass |
| 305 | + ci.ci_uuid is the same as in BLKCBTSTART |
| 306 | + The rest of fields has to be zeroed. |
| 307 | + |
| 308 | + ERRORS: Any error is critical (we should either drop CBT from image or break start). |
| 309 | + |
| 310 | +Get CBT from device |
| 311 | +------------------- |
| 312 | + |
| 313 | +1, Merge CBT snapshot back. It exists in case of there was failed backup, |
| 314 | +:: |
| 315 | + |
| 316 | + ioctl(fd, BLKCBTMISC, struct blk_user_cbt_misc_info *cmi) |
| 317 | + cmi.action = CBT_SNAP_MERGE_BACK; |
| 318 | + cmi.uuid = uuid; |
| 319 | + |
| 320 | + ERRORS: |
| 321 | + |
| 322 | + -ENODEV is not critical, it means (there is no a snapshot). |
| 323 | + |
| 324 | + The rest of errors are critical (we stop CT without saving CBT). |
| 325 | + |
| 326 | +2. Get CBT mask. |
| 327 | +:: |
| 328 | + |
| 329 | + ioctl(fd, BLKCBTGET, struct blk_user_cbt_info *ci): |
| 330 | + ci.ci_extent_count is number of extents (max is CBT_MAX_EXTENTS == 512) |
| 331 | + ci.ci_start is start of range you interested in bytes |
| 332 | + ci.ci_length is length of that range |
| 333 | + |
| 334 | + On exit the ioctl returns extents in ci.ci_extents and populates ci.ci_uuid. |
| 335 | + |
| 336 | +ERRORS: Any error is critical |
| 337 | + |
| 338 | +3. Stop CBT |
| 339 | +:: |
| 340 | + |
| 341 | + ioctl(fd, BLKCBTSTOP, NULL) |
| 342 | + |
| 343 | + ERRORS: Errors are not critical |
| 344 | + |
| 345 | + |
| 346 | +Online image migration |
| 347 | +====================== |
| 348 | + |
| 349 | +Online image migration logic consist from 3 stages |
| 350 | + |
| 351 | + 1. start tracking and copy allocated blocks |
| 352 | + 2. iteratively copy changed blocks |
| 353 | + 3. suspend device and copy changed blocks |
| 354 | + |
| 355 | +Block allocation information is taken from image header. |
| 356 | +Chaned block tracking is based based on dm-tracking driver. |
| 357 | + |
| 358 | +Tracking API: |
| 359 | + |
| 360 | + * create tracking device |
| 361 | + |
| 362 | + dmsetup create tracking_dev --table "0 <device_size_secs> tracking <clu_size_secs> DEV" |
| 363 | + |
| 364 | + * start tracking |
| 365 | + |
| 366 | + dmsetup supend tracking_dev |
| 367 | + |
| 368 | + dmsetup message tracking_dev 0 tracking_start |
| 369 | + |
| 370 | + dmsetup resume tracking_dev |
| 371 | + |
| 372 | + * stop tracking |
| 373 | + |
| 374 | + dmsetup message tracking_dev 0 tracking_stop |
| 375 | + |
| 376 | + * get next changed cluster |
| 377 | + |
| 378 | + dmsetup message tracking_dev 0 tracking_get_next |
| 379 | + |
| 380 | + |
| 381 | + |
| 382 | + |
| 383 | + |
0 commit comments