All notable changes to this project are documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Update dlt to v0.5.2, with associated duckdb client update
- Pin duckdb==0.9.2 to avoid breaking changes
- POSSIBLE BREAKING CHANGE: Update dbt to accommodate DuckDB v1.0. Rewquires newer versions of DuckDB,
- Create new pipeline version/name
- Release version for sentry.io
- Improve query parameters to identify dates for which data exists
- Make some date logic more efficient
- inline test for date functionality
- Added logging of dlt job to sentry.io. Need to specify env variable
- Switched of loading of trace data into database as this starts to slow down the jobs significantly
- Correct missing dates with flexible start and end dates (add where clause to query)
- Date conversion to integer and back
- Functionality to get a list of date for which no data exists in the target
- Sample/WIP code to load this data
- BREAKING CHANGE: Update various package versions, including dlt.
- Update and fix duckdb to v0.9.2 (poetry add duckdb==0.9.2)
- Setting duckdb version using venv.run_module("pip", "install", "duckdb==0.9.2")
- Either dlt CLI and/or filesystem, or the new dlt version result in higher memory footprint.
- Store data in Cloudflare R2 destination as parquet format (note: dbt transformations will result in an error)
- duckdb CLI but only for development (duckdb file in main folder)
- dlt CLI is installed - not sure if we want to keep this
- dlt filesystem extra is installed
- Version/patch bump for dlt, fixing MotherDuck/DuckDB destination (v0.9.1)
- Version/patch updates for all required packages
- Use duckdb version >=0.9.1
- Undo the fix duckdb to v0.8.1 (did not get desired result)
- Setting duckdb version using venv.run_module("pip", "install", "duckdb==0.8.1")
- Cleanup logging messages.
- Fix DuckDB to v0.8.1 (v0.9 breaks motherduck)
- Code cleanup.
- Additional logging, more frequent.
- Additional logging in landing and reporting pipeline
- Forcing reporting pipeline to run
- Logging entry for reporting pipeline
- dbt transform files are now included with the distribution package
- Initial version. Loads data from GIE REST API into motherduck.
- API key and motherduck token need to be set in environment variables,
ENV_GIE_XKEY
andDESTINATION__MOTHERDUCK__CREDENTIALS
respectively. - Published on pypi as ternyxmimosa:
pip install ternyxmimosa
- Within a Python code, use (for example):
import mimosa.cli as mimosa
-
Setup core dependencies:
- poetry add dlt
- poetry add dlt[duckdb]
- poetry add python-dotenv
- poetry add loguru
- poetry add streamlit
- Temporary conflict with DuckDB version. Solved for now by fixing pandas to the latest (DuckDB) compatible version:
- poetry add pandas=2.0.3
- poetry add streamlit
- Check back end of sep 2023 as the latest version of DuckDB should address this issue.
- Temporary conflict with DuckDB version. Solved for now by fixing pandas to the latest (DuckDB) compatible version:
- poetry add dlt[motherduck]
-
dlt pipeline to load GIE EU gas data into DuckDB
- load load_info (lineage related) data into destination database
- materialize as table in dbt_project.yml
- Re-create as class structure
-
Adding motherduck as a destination.
-
dbt structure for data transformations
- Loading from source.yml loads all data in stage
- loading from stage_gas_staging loads just the last dtl loaded data (as it is a dbt full load, just the new data is loaded)
- adding country reporting table - for annual consumption data
- Consider loading from stage_gas directly: should load everything again.
- Consider using dbt incremental load
-
In pyproject.toml, set tool.poetry.name different from the packag name (add 'ternyx.' prefix)
-
LINKING NOTES:
- stage_gas._load_info__loads_ids.value = storage._dlt_load_id
-
PIPELINE DOES NOT LOAD ANY DATA ANYMORE. No code changes at all since last night; at least not that I realize.
- remove this logic: created_at=dlt.sources.incremental("gasDayStart", initial_value=
- it solved it; but changing the initial_value date did not seem to have any positive impact.
- Mostly done using VS Code devcontainer and Poetry
- Separately:
- pipx install dbt-duckdb --include-deps
- None
- None
- None