Help needed with single run backfill and dynamic-mapping inside graph-backed asset #17779
Unanswered
Replies: 2 comments 1 reply
-
Are you able to provide a code snippet that reproduces this issue? A guess here is that the issue is caused by the IO manager that stores the output of the mapped ops (pre-collect). |
Beta Was this translation helpful? Give feedback.
0 replies
-
In the following code values_to_run yields a list which is then passed to get_reports to make api calls which returns a dataframe. @op(out={"data_frame_out": Out(io_manager_key="sn_io_manager", metadata={"partition_expr": "recent_update_ts"})})
def return_final_df(response_metrics):
return response_metrics
parition_1 = TimeWindowPartitionsDefinition(cron_schedule="0 */4 * * *", start="2023-07-11-00:00", fmt="%Y-%m-%d-%H:%M")
@graph_asset(name="dagster_api_results", key_prefix=["DAGSTER_ASSETS"], group_name="AWS_META_DATA", partitions_def=parition_1, backfill_policy=BackfillPolicy.single_run())
def dag_asset_from_api_call():
api_call_results = values_to_run().map(get_reports).collect() # This works correctly. It runs only once for all the values in the list.
results_collected = return_final_df(concat_dataframes(api_call_results)) #This also works correctly.
return results_collected # The asset gets materialized as many times as there are partitions. It uses the same start and end date in every materialization. It also uses the same data. Essentially it deletes and inserts the same thing over and over again. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, was wondering if I could get some help. I am using
BackfillPolicy.single_run()
on a graph_asset. I have a dynamic op that maps to another op and then collects the results. I then pass down the collected results to the last op which creates a single dataframe and returns it to be materialized. When running a single backfill, the initial step of capturing all the data from the dynamic ops works. It only executes once. My problem is on the materialization step it deletes and inserts the same data as many times as there are partitions. Am I doing something wrong, is there anyway of avoiding it?The question was originally asked in Dagster Slack.
Beta Was this translation helpful? Give feedback.
All reactions