1
- # Automatic conversion of docker images into thin format
1
+ # Automatic conversion of docker images into the thin format
2
2
3
3
This utility will automatically convert normal docker images into the thin
4
4
format.
5
5
6
6
## Vocabulary
7
7
8
- There are several concept to keep track in this process, and none of them are
9
- very common, so before to dive in we can agree on a share vocabulary.
8
+ There are several concepts to keep track in this process, and none of them is
9
+ very common, so before to dive in we can agree on a shared vocabulary.
10
10
11
11
** Registry** does refer to the docker image registry, with protocol extensions,
12
12
common examples are:
13
13
14
14
* https://registry.hub.docker.com
15
15
* https://gitlab-registry.cern.ch
16
16
17
- ** Repository** This specify a containers of images, each image will be indexed,
17
+ ** Repository** This specifies a class of images, each image will be indexed,
18
18
then by tag or digest. Common examples are:
19
19
20
20
* library/redis
@@ -26,16 +26,16 @@ and may change in a feature. Common examples are:
26
26
* 4
27
27
* 3-alpine
28
28
29
- ** Digest** is another way to identify images inside a repository, digest are
30
- ** immutable** , since they are the result of an hash function to the content of
31
- the image. Thanks to this technique the images are content addreassable .
29
+ ** Digest** is another way to identify images inside a repository, digests are
30
+ ** immutable** , since they are the result of a hash function to the content of
31
+ the image. Thanks to this technique the images are content addressable .
32
32
Common examples are:
33
33
34
34
* sha256:2aa24e8248d5c6483c99b6ce5e905040474c424965ec866f7decd87cb316b541
35
35
* sha256:d582aa10c3355604d4133d6ff3530a35571bd95f97aadc5623355e66d92b6d2c
36
36
37
37
38
- An ** image** belong to a repository -- which in turns belongs to a registry --
38
+ An ** image** belongs to a repository -- which in turns belongs to a registry --
39
39
and it is identified by a tag, or a digest or both, if you can choose is always
40
40
better to identify the image using at least the digest.
41
41
@@ -75,7 +75,7 @@ ideally specifying both the tag and the digest.
75
75
On the other end, you cannot be so specific for the output image, simple
76
76
because is impossible to know the digest before to generate the image itself.
77
77
78
- Finally we use model the repository as an append only structure, deleting
78
+ Finally we model the repository as an append only structure, deleting
79
79
layers could break some images actually running.
80
80
81
81
## Commands
@@ -92,7 +92,7 @@ add-desiderata --input-image $INPUT_IMAGE --output-image $OUTPUT_IMAGE --reposit
92
92
Will add a new ` desiderata ` to the internal database, then it will try to
93
93
convert the regular image into a thin image.
94
94
95
- The users are the one that will try tpo log into the registry, you can add
95
+ The users are the one that will try to log into the registry, you can add
96
96
users (so usernames, password and registry) using the ` add-user ` command.
97
97
98
98
### add-image
@@ -128,8 +128,7 @@ migrate-database
128
128
Apply all the migration to the database up to the newest version of the
129
129
software.
130
130
131
- As first run is necessary to run this function and to run it as root since it
132
- will create the necessary directory for the database in ` /var/lib/ `
131
+ At the first run is necessary to run this function.
133
132
134
133
### download-manifest
135
134
@@ -156,7 +155,7 @@ This command will try to convert all the desiderata in the internal database.
156
155
loop
157
156
```
158
157
159
- This command is equivalent to call ` convert ` in an infinite loop, usefull to
158
+ This command is equivalent to call ` convert ` in an infinite loop, useful to
160
159
make sure that all the images are up to date.
161
160
162
161
@@ -166,7 +165,7 @@ This section will go into the detail of what happens when you try to add a
166
165
desiderata.
167
166
168
167
The very first step is the parse of both the input and output image, if any of
169
- those parse fails the whole command fail and we immediately return an error.
168
+ those parse fails the whole command fails and we immediately return an error.
170
169
171
170
Then we check if the desiderata we are trying to add is already in the
172
171
database, if it is we are not going to add it again and we simply return an
@@ -175,27 +174,27 @@ error.
175
174
The next step is trying to download the input image manifest, if we are not
176
175
able to access the input manifest we return an error.
177
176
178
- Finally if every check completely successfully we add the desiderata to the
177
+ Finally if every check completed successfully we add the desiderata to the
179
178
internal database.
180
179
181
180
## convert workflow
182
181
183
182
The goal of convert is to actually create the thin images starting from the
184
- regurlar one.
183
+ regular one.
185
184
186
185
In order to convert we iterate for every desiderata.
187
186
188
- In general some desiderata will be already converted while others will need to
187
+ In general, some desiderata will be already converted while others will need to
189
188
be converted ex-novo.
190
189
191
190
The first step is then to check if the desiderata is already been converted.
192
- In order to do this check we download the input image manifest and check
191
+ In order to do this check, we download the input image manifest and check
193
192
against the internal database if the input image digest is already been
194
193
converted, if it is we can safely skip such conversion.
195
194
196
- Then, every image is made of different layers, some of them could already been
195
+ Then, every image is made of different layers, some of them could already be
197
196
on the repository.
198
- In order to avoid expensive CVMFS transaction, before to downloand and ingest
197
+ In order to avoid expensive CVMFS transaction, before to download and ingest
199
198
the layer we check if it is already in the repository, if it is we do not
200
199
download nor ingest the layer.
201
200
@@ -206,4 +205,113 @@ Such images can be used by docker with the plugins.
206
205
207
206
## General workflow
208
207
209
- TODO
208
+ This section explains how this utility is intended to be used.
209
+
210
+ Internally this utility invokes ` cvmfs_server ` and ` docker ` commands, so it is
211
+ necessary to use it in a stratum0 that also have docker installed.
212
+
213
+ The docker dependency can be dropped, but it would require some amount of work,
214
+ so for this first release, as long as it is not a big hurdle, we are going to
215
+ keep it.
216
+
217
+ The first time the utility is launched is necessary to create the SQLite
218
+ database, to do so you can call the command ` migrate-database ` or its alias,
219
+ ` init ` .
220
+
221
+ This command, create as SQLite database called ` docker2cvmfs_archive.sqlite ` ,
222
+ the utility will require this file to always be on ` . ` , the directory from
223
+ where you are calling the utility itself, this requirements will be dropped in
224
+ future releases.
225
+
226
+ Once the database is been created we can start adding users, images and
227
+ desideratas.
228
+
229
+ The conversion is quite straightforward, we first download the input image, we
230
+ store each layer on the cvmfs repository, we create the output image and
231
+ finally we upload the output image to the registry.
232
+
233
+ For downloading an image the credentials can be not necessary, while for
234
+ uploading it they are mandatory.
235
+
236
+ Also, you may want to have different users upload different images to the same
237
+ docker registry, maybe even one user for image.
238
+
239
+ The first step is so to call ` add-user ` .
240
+
241
+ ```
242
+ $ ./daemon init
243
+ INFO[0000] Made migrations n=2
244
+ $ ./daemon add-user --username foo --password secret --registry docker.foo.bar.com
245
+ $ ./daemon list-users
246
+ +------+--------------------+
247
+ | USER | REGISTRY |
248
+ +------+--------------------+
249
+ | foo | docker.foo.bar.com |
250
+ +------+--------------------+
251
+ ```
252
+
253
+ I wasn't able to figure out a reliable way to get authentication tokens so
254
+ to avoid storing the password as clear text in the database, the suggestion at
255
+ the moment is to use disposable users with very limited capabilities so that
256
+ if the database get compromised (a third party has access to it) we are able to
257
+ limit the treats.
258
+
259
+ The next step is to add a desiderata, to do so:
260
+
261
+ ```
262
+ $ ./daemon add-desiderata \
263
+ --input-image https://registry.hub.docker.com/library/redis:4 \
264
+ --output-image https://gitlab-registry.cern.ch/smosciat/containerd/thin/redis:4 \
265
+ --repository cd.cern.ch \
266
+ --user-output smosciat
267
+ WARN[0000] Unable to retrieve the password, trying to get the manifest anonymously. error="sql: no rows in result set"
268
+ Auth to: Bearer realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull"
269
+ https://auth.docker.io/token?scope=repository%3Alibrary%2Fredis%3Apull&service=registry.docker.io
270
+
271
+ $ ./daemon list-desideratas
272
+ +----+----------------+-------------------------------------------------+------------+-----------------+------------------------------------------------------------------+
273
+ | ID | INPUT IMAGE ID | INPUT IMAGE NAME | CVMFS REPO | OUTPUT IMAGE ID | OUTPUT IMAGE NAME |
274
+ +----+----------------+-------------------------------------------------+------------+-----------------+------------------------------------------------------------------+
275
+ | 1 | 1 | https://registry.hub.docker.com/library/redis:4 | cd.cern.ch | 2 | https://gitlab-registry.cern.ch/smosciat/containerd/thin/redis:4 |
276
+ +----+----------------+-------------------------------------------------+------------+-----------------+------------------------------------------------------------------+
277
+ ```
278
+
279
+ Of ocurse you can add as many desideratas as you wish.
280
+
281
+ Now that all the desideratas are in place you can simply start converting them:
282
+
283
+ ```
284
+ $ ./daemon convert
285
+ ```
286
+
287
+ The above command should provide enough logs to be able to infer what is
288
+ happening and to debug any error.
289
+
290
+ Make sure that the user is able to start a cvmfs transaction and that is able
291
+ to communicate with docker, anyway this errors should be pretty self evidentds
292
+ in the logs.
293
+
294
+ The above command is quite cheap, it avoids to convert an images that is
295
+ already been converted and it avoid to download layers that are already been
296
+ downloaded, command line flags can change this behaviour if necessary.
297
+
298
+ You may want to keep the above command running in a loop, hence it will
299
+ automatically pick up changes in the input images and start the conversion.
300
+
301
+ We are basically polling the registries for changings in the input image, again
302
+ there was not a reliable and easy way to get updates from the registry, not
303
+ even from the one inside CERN that we manage.
304
+
305
+ In order to run the conversion in a loop you can simply use:
306
+
307
+ ```
308
+ $./daemon loop
309
+ ```
310
+
311
+ While the daemon is running in a loop you should be able to iteract with the
312
+ utility without any issue, so you should be able to add users, images and even
313
+ desideratas.
314
+
315
+ Only be careful to don't leave the CVMFS repository in an inconsistet state
316
+ (abort the program Ctrl-C while it is doing a transaction).
317
+
0 commit comments