-
Notifications
You must be signed in to change notification settings - Fork 2
BETL Advanced manual for Datavault
Suppose you want to refresh a very big hub or link. For performance reasons you might want to refresh only the records that have changed since your last refresh. I will show you how to do this using delta loading. What the datavault templates do is the following:
- insert new hub or link keys (found in source and not in target).
- insert new and changed records in satelite (found in source and target, but with at least one different attribute). End date old satelites.
- delete detection. Insert deleted records in satelite. End date old satelites. Deleted records are records found in target, but not in source.
- Modify your source (view or table) to contain only your delta. This is custom code. For example for your integrated datavault you could decide to include only changed records in your raw datavault (etl_load_dt > max(etl_load_dt) in your target). Step 1 and 2 above will just work fine. Step 3 however will give a problem because now all records not present in your source will be marked as deleted.
- Of course you could just simply disable delete detection by setting property 'delete_detection' = 1, for this specific source table. But now you are missing the deletes.
- A better solution is to set the property 'filter_delete_detection' for this source. This property can contain a filter definition in TSQL that is added to the delete detection template. It limits the scope of the delete detection and should be identical to the delta filter applied on your source, but in a form that can be applied on your target so that it can contain deleted records.
insert into <trg_full_obj_name>(<col_mapping_trg_str>, etl_deleted_flg)
insert into <trg_full_obj_name>(<col_mapping_trg_str>, etl_deleted_flg)
select <col_mapping_src_str_delete_detectie>, 1 etl_deleted_flg
from <trg_full_obj_name> trg
left join (select src.* <key_domain_id>
from <full_object_name> src
<key_domain_sql> ) src
on <nat_prim_key_match> <key_domain_match_upd_old_sat>
where
trg.etl_active_flg = 1 and trg.etl_deleted_flg = 0
<filter_delete_detection>
and src.<nat_prim_key1> is null -- key does not exist anymore in src
- note that AND is added to the filter_delete_detection property when it is not empty.
The tricky part is to define this filter. For example you could create a list of all customers that have changed, or all employees with changes, etc. It should be defined on your target not on your source or else you would not detect any deletes.