Skip to content

Toolkit for DWH modeling and Data loading using highly Normalized hybrid Model

Notifications You must be signed in to change notification settings

pixl4tech/anchor-click

 
 

Repository files navigation

codecov Release Code style: black Downloads Downloads Visitors

hNhM(highly Normalized hybrid Model) – data modeling methodology based on Anchor modeling and Data Vault. Implementation of this package is based on report "How we implemented our data storage model — highly Normalized hybrid Model" by Evgeny Ermakov and Nikolai Grebenshchikov. 1) Yandex Report, habr.com. 2) SmartData Conference, youtube.com

Documentation

https://marchinho11.github.io/hnhm

Basic hNhM concepts

Logical level

  • Entity: business entity (User, Review, Order, Booking)
  • Link: relationship between Entities (UserOrder, UserBooking)
  • Flow: helps to load data from the stage layer to Entities and Links

Physical level - tables

  • Hub: hub table contains Entity's Business Keys and Surrogate Key(MD5 hash of concatenated business keys)
  • Attribute: attribute table contains FK to Entity's surrogate key, history of attribute changes, and the valid_from column
  • Group: group table contains FK to Entity's surrogate key, history of changes to group attributes, and the valid_from column
  • Link: link table contains FKs to Entities surrogate keys. Historicity by SCD2

Change types of Attributes and Groups

  • IGNORE: insert the latest new data, ignore updates
  • UPDATE: insert the latest new data, update
  • NEW: full history using SCD2. Adds the valid_to column

Quick Start

Install hnhm library

pip install hnhm

Create a directory with the name dwh and put the __hnhm__.py file there with the following contents:

# dwh/__hnhm__.py
from hnhm import (
    Layout,
    LayoutType,
    String,
    Integer,
    ChangeType,
    HnHm,
    HnhmEntity,
    HnhmRegistry,
    FileState,
    PostgresPsycopgSql
)


class User(HnhmEntity):
    """User data."""

    __layout__ = Layout(name="user", type=LayoutType.HNHM)

    user_id = String(comment="User ID.", change_type=ChangeType.IGNORE)
    age = Integer(comment="Age.", change_type=ChangeType.UPDATE)
    first_name = String(comment="First name.", change_type=ChangeType.NEW, group="name")
    last_name = String(comment="Last name.", change_type=ChangeType.NEW, group="name")

    __keys__ = [user_id]

sql=PostgresPsycopgSql(database="hnhm", user="postgres")

registry = HnhmRegistry(
    entities=[User()],
    hnhm=HnHm(
        state=FileState("state.json"),
        sql=sql,
    ),
)

Apply the changes to your DWH:

$ hnhm apply dwh

Plan:

+ entity 'HNHM.user'
  + view 'user'
  + hub 'user'
  + attribute 'age'
  + group 'name'
    |attribute 'first_name'
    |attribute 'last_name'

Apply migrations? [y/N]: y
Applied!

The physical result of applied changes:

view: entity__user
┌────────────────────────────────────────────────────────────────┐
│┌───────────────────┐   ┌────────────────┐   ┌─────────────────┐│
 │ group__user__name │   │ hub__user      │   │ attr__user__age │
 │                   │   │                │   │                 │
 │ + user_sk (FK)    ├──►│ + user_sk (PK) │◄──┤ + user_sk (FK)  │
 │ + first_name      │   │ + user_id_bk   │   │ + age           │
 │ + last_name       │   │ + valid_from   │   │ + valid_from    │
 │ + valid_from      │   │ + _hash        │   │ + _hash         │
 │ + valid_to        │   │ + _source      │   │ + _source       │
 │ + _hash           │   │ + _loaded_at   │   │ + _loaded_at    │
 │ + _source         │   └────────────────┘   └─────────────────┘
 │ + _loaded_at      │
 └───────────────────┘

About

Toolkit for DWH modeling and Data loading using highly Normalized hybrid Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.3%
  • PLpgSQL 5.5%
  • Makefile 0.2%