[dagster-pipes-rust] User should only need to specify `asset_key` for multi-assets #63

christeefy · 2024-12-18T15:50:48Z

Issue

Currently, users have to specify the asset_key when reporting data back to Dagster.

community-integrations/libraries/pipes/implementations/rust/example-dagster-pipes-rust-project/rust_processing_jobs/src/main.rs

Line 13 in 3ee24de

    
           context.report_asset_materialization("example_rust_subprocess_asset", asset_metadata)?;

community-integrations/libraries/pipes/implementations/rust/example-dagster-pipes-rust-project/rust_processing_jobs/src/main.rs

Lines 19 to 20 in 3ee24de

    
           context.report_asset_check( 
        
               "example_rust_subprocess_check",

This adds unnecessary friction for the user because:

They have to redundantly specify the asset key. (Python's implementation uses optional arguments with defaults, and doesn't have this issue).
If an invalid asset key is specified, we currently report back that invalid key to Dagster 🐛

Proposal

Ideally, users won't have to specify this key. This can be achieve in two ways:

Approach 1: Use Option<&str>

This is similar to Python's implementation. However, Rust doesn't allow default values in function arguments. So this will mean that users still have to write None.

context.report_asset_materialization(asset_metadata, None)?;

Nothing's prevents the user from including an asset key in a single-asset use case, which means we still need validation internally.

Approach 2: Different `PipesContext` APIs for single vs multi-assets

Different APIs can be created for the single vs multi-asset case:

impl PipesContext {
    // Note: No `asset_key` in API below
    fn report_asset_materialization(metadata: ...) { ... }

    // Same signature for reporting asset checks
}

impl MultiAssetPipesContext {
    // Mandatory `asset_key` argument
    fn report_asset_materialization(metadata: ..., asset_key: &str) { ... }

    // Similar signature for reporting asset checks
}

The downside of this approach is the creation of a separate class. This may be okay, if we do not reach for this technique again — we do not expect similar redundant arguments on the user's journey / hot path. Also, the two structs can be hidden (like open_dagster_pipes hiding PipesContext) behind a builder pattern, which is quite common in Rust. (See examples below)

Single Asset Case

let context = PipesContext
        ::builder()
        .with_params_loader(...)  // Users can optionally customize their components
        .build();
let metadata = ...;
context.report_asset_materialization(metadata)?;  // Internally uses `context.data` for asset key

Multi-Asset Case

let context = PipesContext
        ::builder()
        .with_context_loader(...)  // Users can customize their components
        .build_multi_asset();  // A multi-asset context is built
let metadata = ...;
context.report_asset_materialization(metadata, "asset1")?;  // Validates "asset1" against list of assets in `context.data`

I'm leaning towards Approach 2.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dagster-pipes-rust] User should only need to specify `asset_key` for multi-assets #63

[dagster-pipes-rust] User should only need to specify `asset_key` for multi-assets #63

christeefy commented Dec 18, 2024 •

edited

Loading

[dagster-pipes-rust] User should only need to specify asset_key for multi-assets #63

[dagster-pipes-rust] User should only need to specify asset_key for multi-assets #63

Comments

christeefy commented Dec 18, 2024 • edited Loading

Issue

Proposal

Approach 1: Use Option<&str>

Approach 2: Different PipesContext APIs for single vs multi-assets

Single Asset Case

Multi-Asset Case

[dagster-pipes-rust] User should only need to specify `asset_key` for multi-assets #63

[dagster-pipes-rust] User should only need to specify `asset_key` for multi-assets #63

christeefy commented Dec 18, 2024 •

edited

Loading

Approach 2: Different `PipesContext` APIs for single vs multi-assets