Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster-pipes-rust] User should only need to specify asset_key for multi-assets #63

Open
christeefy opened this issue Dec 18, 2024 · 0 comments

Comments

@christeefy
Copy link
Collaborator

christeefy commented Dec 18, 2024

Issue

Currently, users have to specify the asset_key when reporting data back to Dagster.

context.report_asset_materialization("example_rust_subprocess_asset", asset_metadata)?;

This adds unnecessary friction for the user because:

  1. They have to redundantly specify the asset key. (Python's implementation uses optional arguments with defaults, and doesn't have this issue).
  2. If an invalid asset key is specified, we currently report back that invalid key to Dagster 🐛

Proposal

Ideally, users won't have to specify this key. This can be achieve in two ways:

Approach 1: Use Option<&str>

This is similar to Python's implementation. However, Rust doesn't allow default values in function arguments. So this will mean that users still have to write None.

context.report_asset_materialization(asset_metadata, None)?;

Nothing's prevents the user from including an asset key in a single-asset use case, which means we still need validation internally.

Approach 2: Different PipesContext APIs for single vs multi-assets

Different APIs can be created for the single vs multi-asset case:

impl PipesContext {
    // Note: No `asset_key` in API below
    fn report_asset_materialization(metadata: ...) { ... }

    // Same signature for reporting asset checks
}

impl MultiAssetPipesContext {
    // Mandatory `asset_key` argument
    fn report_asset_materialization(metadata: ..., asset_key: &str) { ... }

    // Similar signature for reporting asset checks
}

The downside of this approach is the creation of a separate class. This may be okay, if we do not reach for this technique again — we do not expect similar redundant arguments on the user's journey / hot path. Also, the two structs can be hidden (like open_dagster_pipes hiding PipesContext) behind a builder pattern, which is quite common in Rust. (See examples below)

Single Asset Case

let context = PipesContext
        ::builder()
        .with_params_loader(...)  // Users can optionally customize their components
        .build();
let metadata = ...;
context.report_asset_materialization(metadata)?;  // Internally uses `context.data` for asset key

Multi-Asset Case

let context = PipesContext
        ::builder()
        .with_context_loader(...)  // Users can customize their components
        .build_multi_asset();  // A multi-asset context is built
let metadata = ...;
context.report_asset_materialization(metadata, "asset1")?;  // Validates "asset1" against list of assets in `context.data`

I'm leaning towards Approach 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant