[Bug] dbt-redshift microbatch materialization strategy does not use unique_key in config block #927

hamzaamjad · 2025-03-20T17:27:53Z

Is this a new bug?

I believe this is a new bug
I have searched the existing issues, and I could not find an existing issue for this bug

Which packages are affected?

Current Behavior

Microbatch incremental models built on Redshift are not respecting the unique_key provided in config block. When a microbatch incremental model is run, the unique_key is ignored and not reflected in the Data Manipulation Language generated by the microbatch incremental strategy to identify which records to delete from the existing model before inserting new records. This results in microbatch incremental models that can contain duplicate values of a unique_key.

Expected Behavior

Microbatch incremental models built on Redshift respect the unique_key provided in config block. When a microbatch incremental model is run, the unique_key is reflected in the Data Manipulation Language generated by the microbatch incremental strategy to identify which records to delete from the existing model before inserting new records. This results in microbatch incremental models that can not contain duplicate values of a unique_key.

Steps To Reproduce

Configure a dbt project in dbt Cloud running on Latest.
Identify an existing model that is materialized as table or incremental model that you would like to configure to run using microbatch materialization strategy.
3.. Define the configuration in the model similar to the below{{ config( materialized='incremental', incremental_strategy='microbatch', unique_key='model_unique_key', event_time='model_event_time', begin='2025-01-01', batch_size='model_batch_size' ) }}
Ensure that your model has a uniqueness test configured on the model_unique_key.
Ensure that you have a set of records that have an model_event_time that will have them run in the latest batch. Ideally, your model_event_time is pointed at a timestamp that CAN change, such as record_updated_at.
Run dbt build --select your_model
The model should run successfully, but the test for uniqueness on the model_unique_key will fail.

Relevant log output

Environment

- OS: dbt Cloud
- Python: dbt Cloud Latest
- dbt-adapters: dbt Cloud Latest
- dbt-redshift: dbt Cloud Latest

Additional Context

Expected Adapter-specific Behavior based on Documentation

The Adapter-specific Behavior section in the incremental microbatch documentation specifies that:

dbt-redshift: Uses the delete+insert strategy, which "inserts" or "replaces."

Which implies to a user that the microbatch materialization strategy for Redshift will mirror the expected behavior of the delete+insert materialization strategy.

The Incremental materialization strategies section in the Redshift configurations documentation specifies that:

delete+insert (default when unique_key is defined)

Which implies to a user that the delete+insert relies on a unique_key and will respect this unique_key.

Issue in Source Code for `dbt-redshift` adapter

The issue is found in the incremental_merge.sql file in the dbt-adapters repository. The specific section to reference begins on Line 57.

It appears that handling for the unique_key config was intentionally removed from this microbatch strategy. I don't think this is correct -- unique_key should be an optional config value that can be provided to microbatch incremental models that are run in Redshift.

Futhermore, I'll note that the function cannot just call out to the default implementation of delete+insert. Calling out to the default__get_delete_insert_merge_sql macro would generate Data Manipulation Language that uses AND conditions in the WHERE clause of the DELETE statement. This can cause issues with microbatch models that use this strategy from my testing. Instead, I had to roll a custom microbatch incremental strategy, which is as follows:

{% macro get_incremental_microbatch_sql(arg_dict) %}

  {% if arg_dict["unique_key"] %}
    {%- set target = arg_dict["target_relation"] -%}
    {%- set unique_key = arg_dict["unique_key"] -%}
    {%- set source = arg_dict["temp_relation"] -%}
    {%- set dest_columns = arg_dict["dest_columns"] -%}
    {%- set predicates = [] -%}

    {%- set incremental_predicates = [] if arg_dict.get('incremental_predicates') is none else arg_dict.get('incremental_predicates') -%}
    {%- for pred in incremental_predicates -%}
        {% if "DBT_INTERNAL_DEST." in pred %}
            {%- set pred =  pred | replace("DBT_INTERNAL_DEST.", target ~ "." ) -%}
        {% endif %}
        {% if "dbt_internal_dest." in pred %}
            {%- set pred =  pred | replace("dbt_internal_dest.", target ~ "." ) -%}
        {% endif %}
        {% do predicates.append(pred) %}
    {% endfor %}

    {% if not model.batch or (not model.batch.event_time_start or not model.batch.event_time_end) -%}
        {% do exceptions.raise_compiler_error('dbt could not compute the start and end timestamps for the running batch') %}
    {% endif %}

    {#-- Add additional incremental_predicates to filter for batch --#}
    {% do predicates.append(model.config.event_time ~ " >= TIMESTAMP '" ~ model.batch.event_time_start ~ "'") %}
    {% do predicates.append(model.config.event_time ~ " < TIMESTAMP '" ~ model.batch.event_time_end ~ "'") %}
    {% do arg_dict.update({'incremental_predicates': predicates}) %}

    {%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%}

    {% if unique_key %}
        {% if unique_key is string %}
        {% set unique_key = [unique_key] %}
        {% endif %}

        {%- set unique_key_str = unique_key|join(', ') -%}

        delete from {{ target }}
        where ({{ unique_key_str }}) in (
            select distinct {{ unique_key_str }}
            from {{ source }}
        )
        {%- if incremental_predicates %}
        OR (
            {% for predicate in incremental_predicates %}
                {%- if not loop.first %}and {% endif -%} {{ predicate }}
            {% endfor %}
        )
        {%- endif -%};

    {% endif %}

    insert into {{ target }} ({{ dest_cols_csv }})
    (
        select {{ dest_cols_csv }}
        from {{ source }}
    )

  {% else %}
    {% do return(redshift__get_incremental_microbatch_sql(arg_dict)) %}
  {% endif %}

{% endmacro %}

The text was updated successfully, but these errors were encountered:

hamzaamjad added triage:product In Product's queue type:bug Something isn't working as documented labels Mar 20, 2025

amychen1776 added feature:microbatch Issues related to the microbatch incremental strategy pkg:dbt-redshift Issue affects dbt-redshift and removed triage:product In Product's queue labels Mar 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] dbt-redshift microbatch materialization strategy does not use unique_key in config block #927

[Bug] dbt-redshift microbatch materialization strategy does not use unique_key in config block #927

hamzaamjad commented Mar 20, 2025 •

edited

Loading

[Bug] dbt-redshift microbatch materialization strategy does not use unique_key in config block #927

[Bug] dbt-redshift microbatch materialization strategy does not use unique_key in config block #927

Comments

hamzaamjad commented Mar 20, 2025 • edited Loading

Is this a new bug?

Which packages are affected?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

Expected Adapter-specific Behavior based on Documentation

Issue in Source Code for dbt-redshift adapter

hamzaamjad commented Mar 20, 2025 •

edited

Loading

Issue in Source Code for `dbt-redshift` adapter