Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster-airlift] mark dags as migrating #23370

Merged
merged 1 commit into from
Aug 11, 2024
Merged

Conversation

dpeng817
Copy link
Contributor

@dpeng817 dpeng817 commented Aug 1, 2024

Function that allows users to mark airflow dags as "migrating", and injects a tag into the dag with information about the migration.
When placing the tag in the dag, there are two options:

  1. Construct a new dag using essentially a shallow copy of the old dag, and then inject this into global scope.
  2. Inject a tag into the existing dag object you find in global scope using the mutability of airflow dag's data structures.

I prefer (2) in this approach, because the surface area is way lower than (1), and it should be relatively resistant to changes in airflow's API other than this one tiny surface area (which hasn't changed since 1.10, when tags were first introduced). Unless we can figure out a reliable way to create an arbitrary copy constructor for dags across any airflow version, we're likely to run into brittleness with trying to reconstruct a dag from the pure object, I think. In general, taking advantage of airflow's mutability when we can seems like a good approach to injecting migration state.

Another point of discussion; what to do when there are no dags in scope which the state migration object has reference to. For now I throw an exception, but wondering if this is too harsh.

Finally, there's the question of how these tags show up in airflow's UI. It's pretty ugly to see this json blob appear in the airflow UI after setting the tag, but there doesn't seem to be any other data structures we can use for this (except maybe params? But I feel more hesitant hooking into that since it's significantly more complex implementation wise). So might be the best we can do for now.

@dpeng817 dpeng817 marked this pull request as ready for review August 1, 2024 22:20
@dpeng817 dpeng817 requested a review from schrockn August 1, 2024 22:28
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor changes

@dpeng817 dpeng817 force-pushed the dpeng817/airflow_blueprints branch from 4ff80f5 to de7ca4a Compare August 3, 2024 23:59
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_migrating branch from 466fff1 to e1e11e2 Compare August 3, 2024 23:59
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_blueprints branch from de7ca4a to a57a8c1 Compare August 4, 2024 00:04
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_migrating branch from e1e11e2 to ac1ddf1 Compare August 4, 2024 00:04
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_blueprints branch from a57a8c1 to 7c7c8b4 Compare August 5, 2024 02:08
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_migrating branch from ac1ddf1 to 277a9ae Compare August 5, 2024 02:09
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_blueprints branch from 1c2343a to e745d9a Compare August 5, 2024 03:13
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_migrating branch from 277a9ae to a69dca9 Compare August 5, 2024 03:13
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_blueprints branch from e745d9a to 15efcd4 Compare August 5, 2024 16:22
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_migrating branch from a69dca9 to f212aea Compare August 5, 2024 16:22
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_blueprints branch from 15efcd4 to 1ad114f Compare August 5, 2024 16:40
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_migrating branch from f212aea to c0b5cea Compare August 5, 2024 16:40
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_blueprints branch from 1ad114f to a70502a Compare August 5, 2024 17:54
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_migrating branch from c0b5cea to 0a4a7bd Compare August 5, 2024 17:54
schrockn
schrockn previously approved these changes Aug 5, 2024
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make the change now or one of us can do it in a follow up

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q mgmt. I think we should pass globals in explicitly

@dpeng817 dpeng817 force-pushed the dpeng817/move_airflow_dep branch from 5ccfbeb to 90fb17a Compare August 9, 2024 20:32
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_migrating branch from 4fc7a51 to cb04863 Compare August 9, 2024 20:32
Base automatically changed from dpeng817/move_airflow_dep to master August 9, 2024 21:08
@dpeng817 dpeng817 force-pushed the dpeng817/airflow_migrating branch from cb04863 to 2b11d64 Compare August 9, 2024 21:21
@dpeng817 dpeng817 requested a review from schrockn August 9, 2024 21:21
@dpeng817
Copy link
Contributor Author

dpeng817 commented Aug 9, 2024

Screenshot 2024-08-09 at 9 42 12 AM
@schrockn not terrible at small scale but for massive dags, not sure exactly what's going to happen. Seems worth testing.

@dpeng817 dpeng817 force-pushed the dpeng817/airflow_migrating branch from 2b11d64 to 5279253 Compare August 9, 2024 21:48
*,
global_vars: Dict[str, Any],
migration_state: AirflowMigrationState,
logger: logging.Logger = logging.getLogger("dagster_airlift"),
Copy link
Member

@schrockn schrockn Aug 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be None and then default to logging.getLogger("dagster_airlift") in function the body given the way that Python does mutable parameters

Should only ever be the last line in a dag file.

Args:
global_vars (Dict[str, Any]): The global variables in the current context. In most cases, retrieved with `globals()` (no import required).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a pointer to https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#loading-dags to show how we are just doing what airflow does

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Please heed final comments.

@dpeng817 dpeng817 force-pushed the dpeng817/airflow_migrating branch from 5279253 to 1fccf5a Compare August 11, 2024 18:10
Copy link

vercel bot commented Aug 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
dagster-docs-next ❌ Failed (Inspect) Aug 11, 2024 6:11pm

@dpeng817 dpeng817 merged commit 6e511c6 into master Aug 11, 2024
2 of 3 checks passed
@dpeng817 dpeng817 deleted the dpeng817/airflow_migrating branch August 11, 2024 19:06
PedramNavid pushed a commit that referenced this pull request Aug 14, 2024
Function that allows users to mark airflow dags as "migrating", and
injects a tag into the dag with information about the migration.
When placing the tag in the dag, there are two options:
1. Construct a new dag using essentially a shallow copy of the old dag,
and then inject this into global scope.
2. Inject a tag into the existing dag object you find in global scope
using the mutability of airflow dag's data structures.

I prefer (2) in this approach, because the surface area is way lower
than (1), and it should be relatively resistant to changes in airflow's
API other than this one tiny surface area ([which hasn't changed since
1.10, when tags were first
introduced)](https://airflow.apache.org/docs/apache-airflow/1.10.10/_modules/airflow/models/dag.html#DAG).
Unless we can figure out a reliable way to create an arbitrary copy
constructor for dags across any airflow version, we're likely to run
into brittleness with trying to reconstruct a dag from the pure object,
I think. In general, taking advantage of airflow's mutability when we
can seems like a good approach to injecting migration state.

Another point of discussion; what to do when there are no dags in scope
which the state migration object has reference to. For now I throw an
exception, but wondering if this is too harsh.

Finally, there's the question of how these tags show up in airflow's UI.
It's pretty ugly to see this json blob appear in the airflow UI after
setting the tag, but there doesn't seem to be any other data structures
we can use for this (except maybe params? But I feel more hesitant
hooking into that since it's significantly more complex implementation
wise). So might be the best we can do for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants