-
Notifications
You must be signed in to change notification settings - Fork 13
Internals for Developers
Since version 0.4 LSD implements transactions. Transactions are used to:
- prevent database corruption in case of failed updates
- provide a consistent view of the database to readers while updates are ongoing
Any operation that modifies table data or metadata can only be done from within a transaction. This is achieved using DB.transaction() context manager, that automatically commits on exit. For example:
import lsd, lsd.smf
db = lsd.DB("test")
with db.transaction():
db.create_table('ps1_det', lsd.smf.det_table_def)
As a part of the committing process, the table's neighbor cache will be automatically updated to keep it in a consistent state, as well as it's catalog (a list of which datafiles make up the table data).
LSD implements transactions using a variant of the [http://en.wikipedia.org/wiki/Snapshot_isolation snapshot isolation] technique. Each LSD table has a 'snapshots' directory, with subdirectories storing snapshot data. Snapshots can either be opened or committed; a committed snapshot contains special file '.committed', as a marker of its state.
The data logically contained in the table consists of a union of contents of all committed snapshot directories, made from oldest to newest committed snapshot, where contents (files) of newer snapshots overwrite eponymous files from older ones. For example, imagine a table 'table1', with two snapshots, '0001' and '0002', containing the following:
table1/snapshots/0001/tablets/+0.5+0.5/T55555/main.h5
table1/snapshots/0001/tablets/+0.5+0.5/T55556/main.h5
table1/snapshots/0002/tablets/+0.5+0.5/T55556/main.h5
table1/snapshots/0002/tablets/+0.5+0.5/T55557/main.h5
Logically, this table is equivalent to the one having:
table1/tablets/+0.5+0.5/T55555/main.h5 # file from 0001
table1/tablets/+0.5+0.5/T55556/main.h5 # file from 0002
table1/tablets/+0.5+0.5/T55557/main.h5 # file from 0002
LSD does this "directory merging" automatically, and caches the results for fast lookup in {{{catalog.pkl}}} files stored in each snapshot's directory. Also, actual "snapshot IDs" (the 0001 and 0002 in the example above) are times when the transaction was created, formatted as "YYYYMMDDHHmmss.ssssss".
As a consequence of this implementation:
- Rolling back to an older snapshot can be achieved by removing directories containing newer snapshots. Actually, in principle the directories don't even have to be removed -- LSD just needs to be told to look for a specific snapshot -- but this is not implemented yet.
- To read a given snapshots, all older snapshots must be present. You can view each snapshot as a "diff" between the current and previous state of the database, going back to the beginning; all diffs have to be present to construct the current state.
- If anything goes wrong in a transaction, the snapshot directory created by the transaction will be left in the snapshots/ subdirectory, but won't have a '.committed' file, and therefore be ignored by LSD. They can be safely removed, either manually ({{{'rm -rf'}}}), or using {{{lsd-vacuum}}}.
- Queries to the database don't see the data added by the current transaction; they see the database state as it was when the transaction was started. For example, if you have a table with 10 rows, begin a transaction, add or modify some rows and, without committing, query that table again, you will get the original 10 rows as a result. Only after you've called db.commit() will your queries begin returning the new data.
- Upon commit, LSD will do the necessary housekeeping, including the updating of table catalogs ({{{catalog.pkl}}}), as well as intelligently updating the neighbor caches for the cells that were modified by the transaction.