Skip to content

Commit c7723fc

Browse files
author
Igor Sukhih
committed
docs/overview.rst: add documentation
Describe ploop actions Signed-off-by: Igor Sukhih <[email protected]>
1 parent 5371115 commit c7723fc

File tree

1 file changed

+383
-0
lines changed

1 file changed

+383
-0
lines changed

docs/overview.rst

Lines changed: 383 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,383 @@
1+
.. contents:: Table of contents
2+
:depth: 3
3+
4+
Overview
5+
========
6+
7+
Ploop library provides API to manage image files in **ploop** or **qcow2** format.
8+
Device mapper layer is used to crate block device and work with image as a device,
9+
10+
Mount
11+
=================
12+
13+
The mount action maps image to block device.
14+
15+
ploop
16+
-----
17+
* Create block device on ploop image
18+
19+
$ dmsetup create <DEV> --table "0 <size> <block_size> [falloc_new_clu] ploop <fd> [... <fd>]"
20+
21+
* Load CBT if present (see `Set CBT for device`_)
22+
23+
qcow2
24+
-----
25+
* Create block device on qcow2 image
26+
27+
$ dmsetup create <DEV> --table "0 <size> qcow2 <fd>"
28+
29+
* Load CBT if present (see `Store/load dirty bitmap to/from qcow2 image`_)
30+
31+
Unmount
32+
=======
33+
34+
Sync data to image file and remove block device.
35+
36+
ploop
37+
-----
38+
39+
* Store CBT if present (see `Get CBT from device`_)
40+
* Remove device
41+
42+
$ dmsetup remove <DEV>
43+
44+
qcow2
45+
-----
46+
47+
* Store CBT if present (see `Move bitmap from ploop to qcow2`_)
48+
* Remove device
49+
50+
$ dmsetup remoe <DEV>
51+
52+
Resize
53+
======
54+
55+
Grow
56+
----
57+
58+
* Grow device
59+
* resize GPT partition if exists
60+
* resize file system
61+
62+
Shrink
63+
------
64+
* Get balloon file fd
65+
66+
fd = ioctl(fd, XFS_IOC_OPEN_BALLOON, 0)
67+
68+
fd = ioctl(fd, EXT4_IOC_OPEN_BALLOON, 0
69+
70+
* Inflate balloon file
71+
72+
fallocate(fd, size)
73+
74+
75+
Create snapshot
76+
===============
77+
78+
Create a checkpoint and start new changes from that point.
79+
This allows to revert to that point in time.
80+
81+
ploop
82+
-----
83+
84+
The create snapshot action adds extra image on top of the active
85+
image and set it as active, the previous active image became 'ro'
86+
87+
* Create new image
88+
* Suspend device
89+
90+
$ dmsetup suspend <DEV>
91+
92+
* Reload device with new image
93+
94+
$ dmsetup reload <DEV> --table "0 <size> ploop <block_size> <fd> ... <new_fd>"
95+
96+
* resume
97+
98+
qcow2
99+
-----
100+
* Suspend device
101+
102+
$ dmsetup suspend <DEV>
103+
104+
* Create image snapshot
105+
106+
$ qemu-img snapshot -c <UUID> driver=qcow2,file.driver=file,file.filename=<IMAGE>,file.locking=off
107+
108+
* Reload device to apply new changes
109+
110+
$ dmsetup reload <DEV> --table "0 <size> qcow2 <fd>"
111+
112+
* Resume
113+
114+
$ dmsetup resume DEV
115+
116+
Delete snaphot
117+
==============
118+
119+
ploop
120+
-----
121+
122+
The delete snapshot action merges data from child to parent image.
123+
There are three cases of online snapshot deletion
124+
125+
1. Child and parent images are 'ro'
126+
127+
* Copy changed blocks from child to parent.
128+
* Reload device without child image
129+
* Remove child image
130+
131+
2. Child is TOP image and there are more than two images.
132+
133+
* Merge TOP image
134+
135+
$ dmsetup message DEV 0 merge
136+
137+
* Remove TOP image
138+
139+
3. There only 2 images the BASE an the TOP.
140+
141+
* Switch the BASE image to 'rw' mode
142+
* Set deny to resume flag on device
143+
144+
$ dmsetup message <DEV> 0 set_noresume 1
145+
146+
* Suspend device
147+
* Mark base image in 'zeroed' transition state
148+
* Zero clusters in BAT of the BASE image which preset in the TOP image
149+
* Swap images, BASE will be TOP
150+
151+
$ dmsetup message <DEV> 0 flip_upper_deltas
152+
153+
* Drop deny to resume flag
154+
155+
$ dmsetup message <DEV> 0 set_noresume 0
156+
157+
* Resume device
158+
* Merge TOP image
159+
160+
$ dmsetup message DEV 0 merge
161+
162+
* Remove TOP image
163+
164+
qcow2
165+
_____
166+
167+
* Suspend device
168+
169+
$ dmsetup suspend <DEV>
170+
171+
* Delete image snapshot
172+
173+
$ qemu-img snapshot -d <ID> <IMAGE>
174+
175+
* Reload device
176+
177+
$ dmsetup reload <DEV> --table "0 <size> qcow2 <fd>
178+
179+
* Resume device
180+
181+
$ dmsetup resume <DEV>
182+
183+
Switch snapshot
184+
===============
185+
186+
Revert to a previously created snapshot.
187+
188+
ploop
189+
------
190+
191+
* Suspend device
192+
193+
$ dmsetup suspend <DEV>
194+
195+
* Switch image snapshot
196+
197+
1. create new TOP image
198+
2. add TOP image on top of image with snapshot ID we switched on
199+
200+
* Reload device
201+
202+
$ dmsetup reload <DEV> --table "0 <size> qcow2 <fd> [... <top_fd>]
203+
204+
* Resume device
205+
206+
$ dmsetup resume <DEV>
207+
208+
qcow2
209+
-----
210+
211+
* Suspend device
212+
213+
$ dmsetup suspend <DEV>
214+
215+
* Switch image snapshot
216+
217+
$ qemu-img snapshot -a <ID> <IMAGE>
218+
219+
* Reload device
220+
221+
$ dmsetup reload <DEV> --table "0 <size> qcow2 <fd>
222+
223+
* Resume device
224+
225+
$ dmsetup resume <DEV>
226+
227+
228+
Store/load dirty bitmap to/from qcow2 image
229+
===========================================
230+
qemu-kvm is used to manage dirty bitmap in qcow2 image.
231+
232+
Start QEMU
233+
----------
234+
235+
Start QEMU with two block devices: raw ploop device, so that QEMU can get CBT by ioctl and qcow2 node (so that QEMU can store bitmaps to it). We know, that ploop is backed by same qcow2 file, but QEMU doesn't know it and consider them as different files.
236+
237+
To pass different files we define two different fd sets.
238+
239+
qemu-kvm -add-fd fd=10,set=1,opaque="qcow2-path" -add-fd fd=11,set=2,opaque="ploop"
240+
::
241+
242+
qemu-kvm -S -nodefaults -nographic \
243+
-add-fd fd=14,set=1,opaque="ro:/path/to/ploop/device" \ # FD of ploop device. It will be used only call ioctl to get the CBT
244+
-add-fd fd=15,set=2,opaque="rw:/path/to/disk.qcow2" \ # FD of qcow2. It will be used to store the CBT into it
245+
-blockdev '{"node-name": "vz-ploop", "driver": "host_device", "filename": "/dev/fdset/1"}' \ # block-node of ploop
246+
-blockdev '{"node-name": "vz-protocol-node", "driver": "file", "filename": "/dev/fdset/2", "locking": "off"} \ # protocol node of qcow2 file. Note locking=off, as lock is held by ploop utility. Used only to create qcow2 node on top of it, we'll not manipulate with protocol node directly
247+
-blockdev '{"node-name": "vz-qcow2-node", "driver": "qcow2", "file": "vz-protocol-node", "__vz_keep-dirty-bit": true} # format node of qcow2 file.
248+
249+
Note:
250+
251+
* we disable locking on qcow2 file
252+
* we use __vz_keep-dirty-bit=true so that Qemu don't touch qcow2 dirty bit: don't check on start, don't reset it neither on start nor on stop.
253+
* driver: host_device is used for opening the device, not driver: file, like for regular files.
254+
255+
Move bitmap from ploop to qcow2
256+
-------------------------------
257+
258+
`start QEMU`_
259+
260+
move CBT by qmp command
261+
::
262+
263+
qmp transaction {
264+
block-dirty-bitmap-add {"node": "vz-qcow2-node", "name": "UUID", "persistent": true}
265+
block-dirty-bitmap-merge { "node": "vz-qcow2-node", "target": "UUID", "bitmaps": [{"node": "vz-ploop", "name": "UUID", "__vz_pull": true}]}
266+
}
267+
268+
Note:
269+
270+
* persistent=true - this means that bitmap should be saved on Qemu stop.
271+
272+
Move bitmap from qcow2 to ploop node
273+
------------------------------------
274+
275+
`Start QEMU`_
276+
277+
start CBT and set it by command:
278+
::
279+
280+
qmp: block-dirty-bitmap-merge { "node": "my-ploop", "target": "name-of-dirty-bitmap", "__vz_push": true, "bitmaps": [{"node": "my-qcow2-node", "name": "UUID"}]}
281+
282+
Kernal interface to manage CBT
283+
==============================
284+
285+
Set CBT for device
286+
------------------
287+
288+
1. Start CBT
289+
::
290+
291+
ioctl(fd, BLKCBTSTART, struct blk_user_cbt_info*ci)
292+
ci.ci_blksize is block size (usually 64K).
293+
ci.ci_uuid is CBT.
294+
The rest ci fields has to be zeroed.
295+
296+
ERRORS: Any error is critical.
297+
298+
2. Load CBT mask
299+
::
300+
301+
ioctl(fd, BLKCBTSET, struct blk_user_cbt_info *ci)
302+
ci.ci_extent_count = CBT_MAX_EXTENTS (ci.ci_extent_count is number of passed extents)
303+
ci.ci_mapped_extents is equal to ci.ci_extent_count
304+
ci.ci_extents are array of dirty extents you want to pass
305+
ci.ci_uuid is the same as in BLKCBTSTART
306+
The rest of fields has to be zeroed.
307+
308+
ERRORS: Any error is critical (we should either drop CBT from image or break start).
309+
310+
Get CBT from device
311+
-------------------
312+
313+
1, Merge CBT snapshot back. It exists in case of there was failed backup,
314+
::
315+
316+
ioctl(fd, BLKCBTMISC, struct blk_user_cbt_misc_info *cmi)
317+
cmi.action = CBT_SNAP_MERGE_BACK;
318+
cmi.uuid = uuid;
319+
320+
ERRORS:
321+
322+
-ENODEV is not critical, it means (there is no a snapshot).
323+
324+
The rest of errors are critical (we stop CT without saving CBT).
325+
326+
2. Get CBT mask.
327+
::
328+
329+
ioctl(fd, BLKCBTGET, struct blk_user_cbt_info *ci):
330+
ci.ci_extent_count is number of extents (max is CBT_MAX_EXTENTS == 512)
331+
ci.ci_start is start of range you interested in bytes
332+
ci.ci_length is length of that range
333+
334+
On exit the ioctl returns extents in ci.ci_extents and populates ci.ci_uuid.
335+
336+
ERRORS: Any error is critical
337+
338+
3. Stop CBT
339+
::
340+
341+
ioctl(fd, BLKCBTSTOP, NULL)
342+
343+
ERRORS: Errors are not critical
344+
345+
346+
Online image migration
347+
======================
348+
349+
Online image migration logic consist from 3 stages
350+
351+
1. start tracking and copy allocated blocks
352+
2. iteratively copy changed blocks
353+
3. suspend device and copy changed blocks
354+
355+
Block allocation information is taken from image header.
356+
Chaned block tracking is based based on dm-tracking driver.
357+
358+
Tracking API:
359+
360+
* create tracking device
361+
362+
dmsetup create tracking_dev --table "0 <device_size_secs> tracking <clu_size_secs> DEV"
363+
364+
* start tracking
365+
366+
dmsetup supend tracking_dev
367+
368+
dmsetup message tracking_dev 0 tracking_start
369+
370+
dmsetup resume tracking_dev
371+
372+
* stop tracking
373+
374+
dmsetup message tracking_dev 0 tracking_stop
375+
376+
* get next changed cluster
377+
378+
dmsetup message tracking_dev 0 tracking_get_next
379+
380+
381+
382+
383+

0 commit comments

Comments
 (0)