feat: Add column description support for BigQuery adapter #2179

K-Oxon · 2024-12-25T06:47:37Z

Feature description

I'd like to suggest adding support for column descriptions in the BigQuery adapter. This would allow users to define and manage column-level metadata when loading data into BigQuery tables. This feature would enhance the data documentation and governance capabilities within the dlt BigQuery integration.

Are you a dlt user?

Yes, I'm already a dlt user.

Use case

When loading data into BigQuery, having clear and accurate column descriptions is critical for data governance and documentation purposes. Currently, DLT doesn't provide a way to define column descriptions during data loading, making it difficult to maintain metadata and properly document data structures.

Problem solved:

Data teams must manually add column descriptions after data is loaded.
Risk of inconsistent or missing column documentation
Additional overhead of maintaining separate data dictionaries
Difficulty in tracking and versioning changes to column metadata

Proposed solution

Implement column description support in the BigQuery adapter by

Extending the BigQuery adapter interface to accept column descriptions during schema definition
Modify dlt/destinations/impl/bigquery/bigquery_adapter.py and dlt/destinations/impl/bigquery/bigquery.py to add column description parameters:

class BigQueryClient(SqlJobClientWithStagingDataset, SupportsStagingDestination):
    ...
    def _get_column_def_sql(self, column: TColumnSchema, table: PreparedTableSchema = None) -> str:
        column_def_sql = super()._get_column_def_sql(column, table)
        if column.get(ROUND_HALF_EVEN_HINT, False):
            column_def_sql += " OPTIONS (rounding_mode='ROUND_HALF_EVEN')"
        if column.get(ROUND_HALF_AWAY_FROM_ZERO_HINT, False):
            column_def_sql += " OPTIONS (rounding_mode='ROUND_HALF_AWAY_FROM_ZERO')"
        
        # adding this
        if "description" in column:
            column_def_sql += f" OPTIONS (description='{column['description']}')"
        
        return column_def_sql
    ...

def bigquery_adapter(
    data: Any,
    partition: TColumnNames = None,
    cluster: TColumnNames = None,
    round_half_away_from_zero: TColumnNames = None,
    round_half_even: TColumnNames = None,
    table_description: Optional[str] = None,
    table_expiration_datetime: Optional[str] = None,
    insert_api: Optional[Literal["streaming", "default"]] = None,
    autodetect_schema: Optional[bool] = None,
    partition_expiration_days: Optional[int] = None,
    column_descriptions: Optional[Dict[str, str]] = None,  # Added parameter
) -> DltResource:
    ...
    # adding column description
    if column_descriptions:
        for column_name, description in column_descriptions.items():
            if column_name in column_hints:
                column_hints[column_name]["description"] = description
            else:
                column_hints[column_name] = {"name": column_name, "description": description}
    
    if partition:
        ...

The implementation would match BigQuery's native support for column descriptions, while maintaining dlt's current functionality and performance.
An example usage could look like this

data = [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]
column_descriptions = {"id": "User ID", "name": "User Name"}

pipeline = dlt.pipeline(
    pipeline_name="description_test",
    destination="bigquery",
    dataset_name="description_test",
    export_schema_path="dlt_schemas/export",
)
load_info = pipeline.run(
    bigquery_adapter(data, column_descriptions=column_descriptions),
    table_name="users",
    write_disposition="replace",
    refresh="drop_data",
)

print(load_info)

Related issues

No related issues found.

sh-rp · 2025-01-02T10:28:58Z

Note for implementation in case we do it: TColumnSchema already has a description value, so we would not need to extend the adapter but rather use that and translate it to the correct commands when creating or altering tables in bigquey.

rudolfix · 2025-04-23T11:30:55Z

@djudjuu you have a ticket to fix clusters in bigquery adapter. adding descriptions to columns form description property should be trivial addition. and we could close this ticket

github-project-automation bot added this to dlt core library Dec 25, 2024

github-project-automation bot moved this to Todo in dlt core library Dec 25, 2024

sh-rp self-assigned this Jan 2, 2025

rudolfix assigned djudjuu and unassigned sh-rp Apr 23, 2025

rudolfix assigned zilto and unassigned djudjuu Apr 28, 2025

rudolfix moved this from Todo to Planned in dlt core library May 13, 2025

rudolfix assigned hsm207 and unassigned zilto May 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add column description support for BigQuery adapter #2179

feat: Add column description support for BigQuery adapter #2179

K-Oxon commented Dec 25, 2024

sh-rp commented Jan 2, 2025

Uh oh!

rudolfix commented Apr 23, 2025

Uh oh!

feat: Add column description support for BigQuery adapter #2179

feat: Add column description support for BigQuery adapter #2179

Comments

K-Oxon commented Dec 25, 2024

Feature description

Are you a dlt user?

Use case

Proposed solution

Related issues

sh-rp commented Jan 2, 2025

Uh oh!

rudolfix commented Apr 23, 2025

Uh oh!