-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc 302 new etl tutorial - part 1 #25320
Conversation
I've been looking at pyproject.toml, setup.cfg and setup.py and thinking that could be pyproject.toml only for many projects. Especially for beginner level tutorials. |
@neverett I think the first portion of the tutorial is ready for your review. Once this section is good I can continue with the rest of the tutorial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is the right level of granularity and selection of topics for people looking to move beyond the Quickstart (esp. if they're somewhat experienced data engineers), and the pacing is good. I left fairly detailed feedback on pages 1 and 2, and high-level feedback for page 3 (asset dependencies and checks), since I think that one is worth splitting into two pages. Once you've taken another pass at the content, I'm happy to re-review and give more feedback on the downstream asset and asset checks content, and other content where it makes sense.
docs/docs-beta/docs/tutorial/03-asset-dependencies-and-checks.md
Outdated
Show resolved
Hide resolved
We added an “excludes” parameter to the `dagster project scaffold` command recently that could be used to create a project from the scaffold without tests for simplicity.
Thanks,
Daniel
________________________________
From: Alex Noonan ***@***.***>
Sent: Saturday, November 16, 2024 5:03:29 AM
To: dagster-io/dagster ***@***.***>
Cc: Daniel Bartley ***@***.***>; Comment ***@***.***>
Subject: Re: [dagster-io/dagster] Doc 302 new etl tutorial - part 1 (PR #25320)
@C00ldudeNoonan commented on this pull request.
________________________________
In docs/docs-beta/docs/tutorial/01-etl-tutorial-introduction.md<#25320 (comment)>:
+ pip install dagster dagster-webserver pandas dagster-duckdb
+ ```
+
+## Step 2: Copying Project Scaffold
+
+Next we will get the raw data for the project. As well as the project scaffold, Dagster has several pre-built scaffolds you can install depending on your use case. You can see the full up to date list by running. `dagster project list-examples`
+
+Use the project scaffold command for this project.
+ ```bash title="ETL Project Scaffold"
+ dagster project from-example --example getting_started_etl_tutorial
+ ```
+
+The project should have this structure.
+<!-- vale off -->
+```
+dagster-etl-tutorial/
For this tutorial we were trying to keep it as simple as possible by excluding the test folder and the other specific file we can have the user focus on the etl pipeline and getting familiar with the Dagster primitives. The last lesson in the tutorial will also be on refactoring the project into seperate files for assets, resources, etc.
—
Reply to this email directly, view it on GitHub<#25320 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLXPTNXW25YNTH3QLTHWSL2AYZPDAVCNFSM6AAAAABQCI6DRCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDIMZZGMYTANRQGU>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Signed-off-by: nikki everett <[email protected]>
Signed-off-by: nikki everett <[email protected]>
Signed-off-by: nikki everett <[email protected]>
Signed-off-by: nikki everett <[email protected]>
Signed-off-by: nikki everett <[email protected]>
Signed-off-by: nikki everett <[email protected]>
Signed-off-by: nikki everett <[email protected]>
Signed-off-by: nikki everett <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! There's so much good material, and it's really well-paced. If you have any questions on my feedback, I'm happy to answer them in the PR or have a quick zoom chat, whatever's easier for you.
Summary & Motivation
I'm a little way into this and would like to get feedback from @PedramNavid and @cmpadden on the structure and general flow. This isn't done at this point, but I figure we could collaborate here and iterate from there.
I made some changes to the reference file to make it more concise regarding metadata output. The new code example function works great.
Main Questions I have at this point:
How I Tested These Changes
Changelog