Aggregate an image tile set into a single SQLite database for easier and faster access.
Querying several million of small files is not super efficient. This script creates a single SQLite database containing metadata and the image tiles as binary data to speed up file handling and tile querying.
This script can aggregate tile sets from the Gigapan Downloader out of the box and the created database can be imported into HiGlass Server to be viewed in HiGlass.
Prerequirements:
- Python
v3.6
git clone https://github.com/flekschas/image-tiles-to-sqlite && cd image-tiles-to-sqlite
mkvirtualenv -a $(pwd) -p python3 im2db // Not necessary but recommended
pip install --upgrade -r ./requirements.txt
usage: im2db.py [-h] [-o OUTPUT] [-i INFO] [-t {jpg,png,gif}] [-v] dir
positional arguments:
dir directory of image tiles to be converted
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
name of the sqlite database to be generated
-i INFO, --info INFO name of the tile set info file
-t {jpg,png,gif}, --imtype {jpg,png,gif}
image tile data type
-v, --verbose increase output verbosity
Example:
./im2db.py test/54825
// -> 54825.imtiles
Tests:
This runs an end-to-end test on the test data (test/54825
)
./run_test.sh
Take a look at im2db.py; trust me, it's a short file. Under the hood the script creates a SQLite database holding following two tables:
- tileset_info
- tiles
tileset_info
is an extension of clodius's metadata table and holds the following columns:
- zoom_step [INT]: not used
- max_length [INT]: not used
- assembly [TEXT]: not used
- chrom_names [TEXT]: not used
- chrom_sizes [TEXT]: not used
- tile_size [INT]: Size in pixel of the tiles
- max_zoom [INT]: Max. zoom level.
- max_size [INT]: Max. width, i.e.,
tile_size * 2^max_zoom
. - width [INT]: Width of the image
- height [INT]: Height of the image
- dtype [TEXT]: Data type of the images. Either jpg, png, or gif.
tiles
is storing the tiles's binary image data and position and consist of the following columns. The primary key is composed of z
, y
, and x
.
- z [INT]: Z position of the tile.
- y [INT]: Y position of the tile.
- x [INT]: X position of the tile.
- image [BLOB]: The binary image data of a tile.
./manage.py ingest_tileset \
--filename imtiles/<IMTILES-NAME>.imtiles \
--filetype imtiles \
--datatype <jpg,png,gif> \
--coordSystem pixel \
--coordSystem2 pixel \
--uid <IMTILES-NAME> \
--name '<IMTILES-NAME>' \
--no-upload
usage: snapshots2db.py [-h] [-o OUTPUT] [-i INFO] [-m MAX] [-p]
[--pre-fetch-file PRE_FETCH_FILE]
[--pre-fetch-zoom-from PRE_FETCH_ZOOM_FROM]
[--pre-fetch-zoom-to PRE_FETCH_ZOOM_TO]
[--pre_fetch_max_size PRE_FETCH_MAX_SIZE]
[--from-x FROM_X] [--to-x TO_X] [--from-y FROM_Y]
[--to-y TO_Y] [--xlim-rel] [--ylim-rel] [--limit-excl]
[-w] [-v]
file
positional arguments:
file snapshots file to be converted
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
name of the sqlite database to be generated
-i INFO, --info INFO name of the tile set info file
-m MAX, --max MAX maximum number of annotations per tile
-p, --pre-fetch preload and store an image pyramind for every
annotation
--pre-fetch-file PRE_FETCH_FILE
imtiles files to preload the annotations from
--pre-fetch-zoom-from PRE_FETCH_ZOOM_FROM
initial zoom of for preloading (farthest zoomed out)
--pre-fetch-zoom-to PRE_FETCH_ZOOM_TO
final zoom of for preloading (farthest zoomed in)
--pre_fetch_max_size PRE_FETCH_MAX_SIZE
max size (in pixel) for preloading a snapshot
--from-x FROM_X only include tiles which end-x is greater than this
value
--to-x TO_X only include tiles which start-x is smaller than this
value
--from-y FROM_Y only include tiles which end-y is greater than this
value
--to-y TO_Y only include tiles which start-y is smaller than this
value
--xlim-rel x limits, defined via `--from-x` etc., are in
percentage relative to the full size
--ylim-rel y limits, defined via `--from-y` etc., are in
percentage relative to the full size
--limit-excl if limits are defined via `--from-x` etc. elements
have to be fully inside them
-w, --overwrite overwrite output if exist
-v, --verbose increase output verbosity
Take a look at snapshots2db.py. Under the hood the script creates a SQLite database holding following three tables:
- tileset_info
- tiles
tileset_info
is an extension of clodius's metadata table and holds the following columns:
- zoom_step [INT]: not used
- max_length [INT]: not used
- assembly [TEXT]: not used
- chrom_names [TEXT]: not used
- chrom_sizes [TEXT]: not used
- tile_size [INT]: Size in pixel of the tiles
- max_zoom [INT]: Max. zoom level.
- max_size [INT]: Max. width, i.e.,
tile_size * 2^max_zoom
. - width [INT]: Width of the image
- height [INT]: Height of the image
intervals
is storing the tiles's binary image data and position and consist of the following columns. The primary key is composed of z
, y
, and x
.
- id [INT]: Primary key
- zoomLevel [INT]: Zoom level
- importance [REAL]: Number of views
- fromX [INT]: Start x position
- toX [INT]: End x position
- fromY [INT]: Start y position
- toY [INT]: End y position
- chrOffset [INT]: not used
- uid [TEXT]: Random uuid
- fields [TEXT]: Other fields; currently holding the snapshot description
position_index
is storing the tiles's binary image data and position and consist of the following columns. The primary key is composed of z
, y
, and x
.
- id [INT]: Primary key
- rFromX [INT]: Start x position
- rToX [INT]: End x position
- rFromY [INT]: Start y position
- rToY [INT]: End y position
./manage.py ingest_tileset \
--filename imtiles/<IMTILES-NAME>.snapshots.db \
--filetype 2dannodb \
--datatype 2d-rectangle-domains \
--coordSystem pixel \
--coordSystem2 pixel \
--uid <IMTILES-NAME>-snapshots \
--name '<IMTILES-NAME> Snapshots' \
--no-upload