Skip to content

feat: [SQL Query transformer] Improve attribute definitions in yaml templates #528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

OnkarVO7
Copy link
Collaborator

Changelog

Improve attribute definitions in yaml templates

Before we required the attribute definitions in the yaml templates to be in flat structure with dot separator for nested columns
e.g

- name: attributes.name
  source_query: database_name
- name: attributes.qualifiedName
  source_query: concat(connection_qualified_name, '/', database_name)
  source_columns: [connection_qualified_name, database_name]
- name: attributes.connectionQualifiedName
  source_query: connection_qualified_name

After this change, the nested attributes can be defined normally as per yaml structures.
E.g

attributes:
  name:
    source_query: database_name
  qualifiedName:
    source_query: concat(connection_qualified_name, '/', database_name)
    source_columns: [connection_qualified_name, database_name]
  connectionQualifiedName:
    source_query: connection_qualified_name

Additional context (e.g. screenshots, logs, links)

https://atlanhq.atlassian.net/browse/APP-6727

Checklist

  • Additional tests added
  • All CI checks passed
  • Relevant documentation updated

Copyleft License Compliance

  • Have you used any code that is subject to a Copyleft license (e.g., GPL, AGPL, LGPL)?
  • If yes, have you modified the code in the context of this project? please share additional details.

abhishekagrawal-atlan and others added 30 commits May 5, 2025 09:56
Added the following features to logger_adaptor :
- convert file to parquet
- store to objectstore as parquet
Updated logger_adaptor logic
added route in application-sdk called observability which would decode the logs and dispaly as parquet.
updated Dapr client to use objectstore binding
- Added logic to check if duckdb is already up and running or not. If yes, then connect to the already running server and use that connection to connect to duckdb ui
- added batching logic which will flush logs to dapr objectstore either per 100 log records or at every 10 seconds asynchronously
- these constants are defined in constants.py and are configurable
- the /observability api endpoint will redirect to the duckdb ui automatically which by default is accessible on 0.0.0.0:9000
- Added purge logic which runs once in a day in a rolling manner. It will purge all log records which are older than the set LOG_RETENTION_DAYS const.
- Redirected to 4213 instead of 9000 which is the correct UI port for Duckdb.
- Added Unit tests to check if the logs are going to the dapr's object store or not
- Added documentation for the enhanced functionality of LoggerAdaptor
- Created a Observability superclass which would be inherited by all the Adaptors. It will have function to push to objectStore, log retention capabilities etc.
Updated documentation
Following changes are made :
- used env defined in const.py throughout
- update env names
- created a pydantic model for logRecords being sent to objectstore
- Added a MetricsAdaptor on the lines of LoggerAdaptor
- It uses functionality from Observability as its super class
- Added tests for this
- Added sample metrics in __init__.py for testing
- Added relevant documentation for this
- Made observability a generic superclass which can be used by any adaptor
- Added TraceAdaptor
- Added corresponding tests
- Added corresponding documentation
- Added the following metrics to application-sdk :

1. Metrics which calculate time for whole workflow run
2. Metrics to calculate number of records / chunks written and in what format
3. Metrics to calculate time of steps like pre-flight check, authentication etc.
- added hierarchical file structure for observability
- added signal capturing and flushing buffer capability in case any error is encountered.
feat: add logging adapter with objectstore sink
- fixed redeclaration of methods
- Updated log directory to /tmp/observability
- Updated extra fields model
feat : Add minimal set of metrics for all apps
Copy link

snyk-io-us bot commented May 14, 2025

🎉 Snyk checks have passed. No issues have been found so far.

@OnkarVO7 OnkarVO7 changed the title feat: [SQL based transformer] Improve attribute definitions in yaml templates feat: [SQL Query transformer] Improve attribute definitions in yaml templates May 14, 2025
Copy link

github-actions bot commented May 14, 2025

📜 Docstring Coverage Report

RESULT: PASSED (minimum: 30.0%, actual: 72.9%)

Detailed Coverage Report
======= Coverage for /home/runner/work/application-sdk/application-sdk/ ========
----------------------------------- Summary ------------------------------------
| Name                                                                              | Total | Miss | Cover | Cover% |
|-----------------------------------------------------------------------------------|-------|------|-------|--------|
| application_sdk/__init__.py                                                       |     1 |    0 |     1 |   100% |
| application_sdk/constants.py                                                      |     1 |    0 |     1 |   100% |
| application_sdk/version.py                                                        |     1 |    0 |     1 |   100% |
| application_sdk/worker.py                                                         |     4 |    0 |     4 |   100% |
| application_sdk/activities/__init__.py                                            |     8 |    0 |     8 |   100% |
| application_sdk/activities/common/__init__.py                                     |     1 |    1 |     0 |     0% |
| application_sdk/activities/common/models.py                                       |     2 |    0 |     2 |   100% |
| application_sdk/activities/common/utils.py                                        |     5 |    1 |     4 |    80% |
| application_sdk/activities/metadata_extraction/__init__.py                        |     1 |    1 |     0 |     0% |
| application_sdk/activities/metadata_extraction/rest.py                            |     1 |    1 |     0 |     0% |
| application_sdk/activities/metadata_extraction/sql.py                             |    15 |    1 |    14 |    93% |
| application_sdk/activities/query_extraction/__init__.py                           |     1 |    1 |     0 |     0% |
| application_sdk/activities/query_extraction/sql.py                                |    10 |    1 |     9 |    90% |
| application_sdk/application/__init__.py                                           |     8 |    1 |     7 |    88% |
| application_sdk/application/metadata_extraction/sql.py                            |     7 |    1 |     6 |    86% |
| application_sdk/clients/__init__.py                                               |     4 |    0 |     4 |   100% |
| application_sdk/clients/sql.py                                                    |    14 |    0 |    14 |   100% |
| application_sdk/clients/temporal.py                                               |    19 |    1 |    18 |    95% |
| application_sdk/clients/utils.py                                                  |     2 |    1 |     1 |    50% |
| application_sdk/clients/workflow.py                                               |     9 |    2 |     7 |    78% |
| application_sdk/common/__init__.py                                                |     1 |    1 |     0 |     0% |
| application_sdk/common/aws_utils.py                                               |     4 |    1 |     3 |    75% |
| application_sdk/common/logger_adaptors.py                                         |    30 |   11 |    19 |    63% |
| application_sdk/common/metrics_adaptor.py                                         |    16 |    2 |    14 |    88% |
| application_sdk/common/observability.py                                           |    25 |    4 |    21 |    84% |
| application_sdk/common/traces_adaptor.py                                          |    24 |    2 |    22 |    92% |
| application_sdk/common/utils.py                                                   |    12 |    2 |    10 |    83% |
| application_sdk/docgen/__init__.py                                                |     5 |    2 |     3 |    60% |
| application_sdk/docgen/exporters/__init__.py                                      |     1 |    1 |     0 |     0% |
| application_sdk/docgen/exporters/mkdocs.py                                        |     7 |    3 |     4 |    57% |
| application_sdk/docgen/models/__init__.py                                         |     1 |    1 |     0 |     0% |
| application_sdk/docgen/models/export/__init__.py                                  |     1 |    1 |     0 |     0% |
| application_sdk/docgen/models/export/page.py                                      |     2 |    1 |     1 |    50% |
| application_sdk/docgen/models/manifest/__init__.py                                |     2 |    1 |     1 |    50% |
| application_sdk/docgen/models/manifest/customer.py                                |     3 |    1 |     2 |    67% |
| application_sdk/docgen/models/manifest/internal.py                                |     2 |    1 |     1 |    50% |
| application_sdk/docgen/models/manifest/metadata.py                                |     2 |    1 |     1 |    50% |
| application_sdk/docgen/models/manifest/page.py                                    |     2 |    1 |     1 |    50% |
| application_sdk/docgen/models/manifest/section.py                                 |     2 |    1 |     1 |    50% |
| application_sdk/docgen/parsers/__init__.py                                        |     1 |    1 |     0 |     0% |
| application_sdk/docgen/parsers/directory.py                                       |    13 |    2 |    11 |    85% |
| application_sdk/docgen/parsers/manifest.py                                        |     6 |    1 |     5 |    83% |
| application_sdk/handlers/__init__.py                                              |     6 |    1 |     5 |    83% |
| application_sdk/handlers/sql.py                                                   |    16 |    3 |    13 |    81% |
| application_sdk/inputs/__init__.py                                                |     6 |    1 |     5 |    83% |
| application_sdk/inputs/iceberg.py                                                 |     7 |    3 |     4 |    57% |
| application_sdk/inputs/json.py                                                    |     8 |    2 |     6 |    75% |
| application_sdk/inputs/objectstore.py                                             |     4 |    1 |     3 |    75% |
| application_sdk/inputs/parquet.py                                                 |     8 |    1 |     7 |    88% |
| application_sdk/inputs/secretstore.py                                             |     3 |    1 |     2 |    67% |
| application_sdk/inputs/sql_query.py                                               |    10 |    1 |     9 |    90% |
| application_sdk/inputs/statestore.py                                              |     4 |    1 |     3 |    75% |
| application_sdk/outputs/__init__.py                                               |    10 |    0 |    10 |   100% |
| application_sdk/outputs/eventstore.py                                             |    10 |    0 |    10 |   100% |
| application_sdk/outputs/iceberg.py                                                |     5 |    1 |     4 |    80% |
| application_sdk/outputs/json.py                                                   |     8 |    1 |     7 |    88% |
| application_sdk/outputs/objectstore.py                                            |     4 |    1 |     3 |    75% |
| application_sdk/outputs/parquet.py                                                |     7 |    1 |     6 |    86% |
| application_sdk/outputs/secretstore.py                                            |     3 |    1 |     2 |    67% |
| application_sdk/outputs/statestore.py                                             |     4 |    1 |     3 |    75% |
| application_sdk/server/__init__.py                                                |     4 |    0 |     4 |   100% |
| application_sdk/server/fastapi/__init__.py                                        |    24 |    6 |    18 |    75% |
| application_sdk/server/fastapi/models.py                                          |    18 |   18 |     0 |     0% |
| application_sdk/server/fastapi/utils.py                                           |     2 |    0 |     2 |   100% |
| application_sdk/server/fastapi/middleware/logmiddleware.py                        |     4 |    4 |     0 |     0% |
| application_sdk/server/fastapi/routers/__init__.py                                |     1 |    1 |     0 |     0% |
| application_sdk/server/fastapi/routers/server.py                                  |     8 |    2 |     6 |    75% |
| application_sdk/test_utils/__init__.py                                            |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/workflow_monitoring.py                                 |     3 |    0 |     3 |   100% |
| application_sdk/test_utils/e2e/__init__.py                                        |    14 |    2 |    12 |    86% |
| application_sdk/test_utils/e2e/base.py                                            |    16 |    2 |    14 |    88% |
| application_sdk/test_utils/e2e/client.py                                          |    10 |    2 |     8 |    80% |
| application_sdk/test_utils/e2e/conftest.py                                        |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/e2e/utils.py                                           |     3 |    1 |     2 |    67% |
| application_sdk/test_utils/hypothesis/__init__.py                                 |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/__init__.py                      |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/sql_client.py                    |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/temporal.py                      |     6 |    1 |     5 |    83% |
| application_sdk/test_utils/hypothesis/strategies/clients/__init__.py              |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/clients/sql.py                   |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/common/__init__.py               |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/common/logger.py                 |     3 |    0 |     3 |   100% |
| application_sdk/test_utils/hypothesis/strategies/handlers/__init__.py             |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/handlers/sql/__init__.py         |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/handlers/sql/sql_metadata.py     |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/handlers/sql/sql_preflight.py    |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/inputs/__init__.py               |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/inputs/json_input.py             |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/outputs/__init__.py              |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/outputs/json_output.py           |     2 |    1 |     1 |    50% |
| application_sdk/test_utils/hypothesis/strategies/outputs/statestore.py            |     3 |    1 |     2 |    67% |
| application_sdk/test_utils/hypothesis/strategies/server/__init__.py               |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/hypothesis/strategies/server/fastapi/__init__.py       |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/scale_data_generator/__init__.py                       |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/scale_data_generator/config_loader.py                  |    10 |    4 |     6 |    60% |
| application_sdk/test_utils/scale_data_generator/data_generator.py                 |    10 |    3 |     7 |    70% |
| application_sdk/test_utils/scale_data_generator/driver.py                         |     3 |    3 |     0 |     0% |
| application_sdk/test_utils/scale_data_generator/output_handler/__init__.py        |     1 |    1 |     0 |     0% |
| application_sdk/test_utils/scale_data_generator/output_handler/base.py            |     7 |    3 |     4 |    57% |
| application_sdk/test_utils/scale_data_generator/output_handler/csv_handler.py     |     5 |    5 |     0 |     0% |
| application_sdk/test_utils/scale_data_generator/output_handler/json_handler.py    |     5 |    5 |     0 |     0% |
| application_sdk/test_utils/scale_data_generator/output_handler/parquet_handler.py |     6 |    6 |     0 |     0% |
| application_sdk/transformers/__init__.py                                          |     3 |    1 |     2 |    67% |
| application_sdk/transformers/atlas/__init__.py                                    |     6 |    1 |     5 |    83% |
| application_sdk/transformers/atlas/sql.py                                         |    25 |    4 |    21 |    84% |
| application_sdk/transformers/common/__init__.py                                   |     1 |    1 |     0 |     0% |
| application_sdk/transformers/common/utils.py                                      |     6 |    0 |     6 |   100% |
| application_sdk/transformers/query/__init__.py                                    |    11 |    2 |     9 |    82% |
| application_sdk/workflows/__init__.py                                             |     4 |    0 |     4 |   100% |
| application_sdk/workflows/metadata_extraction/__init__.py                         |     2 |    2 |     0 |     0% |
| application_sdk/workflows/metadata_extraction/sql.py                              |     7 |    0 |     7 |   100% |
| application_sdk/workflows/query_extraction/__init__.py                            |     2 |    2 |     0 |     0% |
| application_sdk/workflows/query_extraction/sql.py                                 |     4 |    0 |     4 |   100% |
| examples/application_custom_fastapi.py                                            |    14 |   14 |     0 |     0% |
| examples/application_fastapi.py                                                   |     9 |    9 |     0 |     0% |
| examples/application_hello_world.py                                               |     8 |    8 |     0 |     0% |
| examples/application_sql.py                                                       |     5 |    4 |     1 |    20% |
| examples/application_sql_miner.py                                                 |     5 |    4 |     1 |    20% |
| examples/application_sql_with_custom_pyatlan_transformer.py                       |    11 |    9 |     2 |    18% |
| examples/application_sql_with_custom_transformer.py                               |     9 |    8 |     1 |    11% |
| examples/application_subscriber.py                                                |    14 |   14 |     0 |     0% |
| examples/run_examples.py                                                          |     2 |    1 |     1 |    50% |
| tests/__init__.py                                                                 |     1 |    1 |     0 |     0% |
| tests/unit/__init__.py                                                            |     1 |    1 |     0 |     0% |
| tests/unit/worker.py                                                              |     5 |    5 |     0 |     0% |
| tests/unit/activities/__init__.py                                                 |     1 |    1 |     0 |     0% |
| tests/unit/clients/__init__.py                                                    |     1 |    1 |     0 |     0% |
| tests/unit/clients/test_async_sql_client.py                                       |    13 |   13 |     0 |     0% |
| tests/unit/clients/test_sql_client.py                                             |    26 |    6 |    20 |    77% |
| tests/unit/clients/test_temporal_client.py                                        |    17 |   11 |     6 |    35% |
| tests/unit/common/test_logger_adapter.py                                          |    19 |    1 |    18 |    95% |
| tests/unit/common/test_metrics_adapter.py                                         |    14 |    1 |    13 |    93% |
| tests/unit/common/test_traces_adaptor.py                                          |    10 |    1 |     9 |    90% |
| tests/unit/common/test_utils.py                                                   |    25 |    5 |    20 |    80% |
| tests/unit/docgen/parsers/test_directory_parser.py                                |    14 |    3 |    11 |    79% |
| tests/unit/docgen/parsers/test_manifest_parser.py                                 |    12 |   12 |     0 |     0% |
| tests/unit/handlers/__init__.py                                                   |     1 |    1 |     0 |     0% |
| tests/unit/handlers/sql/test_auth.py                                              |    10 |    4 |     6 |    60% |
| tests/unit/handlers/sql/test_check_schemas_and_databases.py                       |    13 |    4 |     9 |    69% |
| tests/unit/handlers/sql/test_extract_allowed_schemas.py                           |    11 |    3 |     8 |    73% |
| tests/unit/handlers/sql/test_metadata.py                                          |    27 |   10 |    17 |    63% |
| tests/unit/handlers/sql/test_preflight_check.py                                   |    16 |   15 |     1 |     6% |
| tests/unit/handlers/sql/test_prepare_metadata.py                                  |    14 |    4 |    10 |    71% |
| tests/unit/handlers/sql/test_tables_check.py                                      |     9 |    6 |     3 |    33% |
| tests/unit/handlers/sql/test_validate_filters.py                                  |    12 |    4 |     8 |    67% |
| tests/unit/inputs/test_json_input.py                                              |     4 |    4 |     0 |     0% |
| tests/unit/outputs/test_iceberg.py                                                |    11 |    4 |     7 |    64% |
| tests/unit/outputs/test_json_output.py                                            |     7 |    6 |     1 |    14% |
| tests/unit/outputs/test_objectstore.py                                            |     9 |    8 |     1 |    11% |
| tests/unit/outputs/test_output.py                                                 |    19 |    6 |    13 |    68% |
| tests/unit/outputs/test_statestore.py                                             |    11 |   11 |     0 |     0% |
| tests/unit/server/__init__.py                                                     |     1 |    1 |     0 |     0% |
| tests/unit/server/fastapi/test__init__.py                                         |    11 |    6 |     5 |    45% |
| tests/unit/server/fastapi/routers/__init__.py                                     |     1 |    1 |     0 |     0% |
| tests/unit/server/fastapi/routers/server.py                                       |     1 |    1 |     0 |     0% |
| tests/unit/transformers/__init__.py                                               |     1 |    1 |     0 |     0% |
| tests/unit/transformers/atlas/__init__.py                                         |     1 |    1 |     0 |     0% |
| tests/unit/transformers/atlas/test_column.py                                      |    17 |    6 |    11 |    65% |
| tests/unit/transformers/atlas/test_database.py                                    |     8 |    6 |     2 |    25% |
| tests/unit/transformers/atlas/test_function.py                                    |     9 |    5 |     4 |    44% |
| tests/unit/transformers/atlas/test_procedure.py                                   |     7 |    6 |     1 |    14% |
| tests/unit/transformers/atlas/test_schema.py                                      |     8 |    6 |     2 |    25% |
| tests/unit/transformers/atlas/test_table.py                                       |    13 |    6 |     7 |    54% |
| tests/unit/transformers/query/test_sql_transformer.py                             |    14 |    4 |    10 |    71% |
| tests/unit/transformers/query/test_sql_transformer_output_validation.py           |     5 |    2 |     3 |    60% |
| tests/unit/workflows/sql/__init__.py                                              |     1 |    1 |     0 |     0% |
| tests/unit/workflows/sql/test_workflow.py                                         |     9 |    4 |     5 |    56% |
|-----------------------------------------------------------------------------------|-------|------|-------|--------|
| TOTAL                                                                             |  1173 |  458 |   715 |  61.0% |
---------------- RESULT: PASSED (minimum: 30.0%, actual: 61.0%) ----------------

@OnkarVO7 OnkarVO7 marked this pull request as draft May 14, 2025 13:26
Copy link

github-actions bot commented May 14, 2025

🛠 Docs available at: https://k.atlan.dev/application-sdk/sql_tranformer_impr_attr

Copy link

github-actions bot commented May 14, 2025

📦 Trivy Vulnerability Scan Results

Schema Version Created At Artifact Type
2 2025-05-16T16:03:03.298752063Z . filesystem

Report Summary

Target Type Vulnerabilities . filesystem ✅ None found

Scan Result Details

✅ No vulnerabilities found during the scan for ..

Copy link

github-actions bot commented May 14, 2025

📦 Trivy Secret Scan Results

Schema Version Created At Artifact Type
2 2025-05-16T16:03:14.76713274Z . filesystem

Report Summary

Target Type Secrets . filesystem ✅ None found

Scan Result Details

✅ No secrets found during the scan for ..

@OnkarVO7 OnkarVO7 added e2e-test run-examples Run examples on the Pull Request labels May 14, 2025
@atlan-ci
Copy link
Collaborator

atlan-ci commented May 14, 2025

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
4983 2654 53% 0% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
application_sdk/transformers/common/utils.py 94% 🟢
application_sdk/transformers/query/init.py 90% 🟢
TOTAL 92% 🟢

updated for commit: 6b21273 by action🐍

Copy link

github-actions bot commented May 14, 2025

🛠 Full Test Coverage Report: https://k.atlan.dev/coverage/application-sdk/pr/528

Copy link

github-actions bot commented May 14, 2025

📦 Example workflows test results

  • This workflow runs all the examples in the examples directory.

Example Status Time Taken
application_sql COMPLETED 🟢 18.09 seconds
application_sql_with_custom_transformer COMPLETED 🟢 13.04 seconds
application_sql_miner COMPLETED 🟢 173.29 seconds
application_hello_world COMPLETED 🟢 5.03 seconds

This is an automatically generated file. Please do not edit directly.

@OnkarVO7 OnkarVO7 marked this pull request as ready for review May 14, 2025 14:01
@OnkarVO7 OnkarVO7 marked this pull request as draft May 14, 2025 21:25
inishchith and others added 8 commits May 14, 2025 16:57
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 5 to 6.
- [Release notes](https://github.com/astral-sh/setup-uv/releases)
- [Commits](astral-sh/setup-uv@v5...v6)

---
updated-dependencies:
- dependency-name: astral-sh/setup-uv
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
- added _send_to_otel method to logger_adaptor
- fixed pre-commit related errors
* fix diff issues

* view defn fixes
@OnkarVO7 OnkarVO7 marked this pull request as ready for review May 16, 2025 16:08
@inishchith inishchith changed the base branch from develop to main May 22, 2025 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e2e-test run-examples Run examples on the Pull Request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants