Skip to content

Commit

Permalink
[docs] Hello, Dagster becomes Quickstart (#19981)
Browse files Browse the repository at this point in the history
## Summary & Motivation

We would like to reduce the barrier to entry by replacing Hello, Dagster
with a Quickstart project. This project leverages GitHub Codespaces, so
that users can try Dagster without installing any dependencies on their
local machine.

This is a part of a larger initiative of simplifying the entrypoints
into Dagster and our documentation.

### Checklist

- [ ] Make the https://github.com/dagster-io/dagster-quickstart repo
public

### Outstanding Questions

- We have a few _quickstart_ templates that can already be scaffolded
with the `dagster` command. Where do we want to draw the line on this
quickstart vs those?

## How I Tested These Changes

- Ran Next.js project locally

---------

Co-authored-by: hyperlint-ai[bot] <154288675+hyperlint-ai[bot]@users.noreply.github.com>
Co-authored-by: Erin Cochran <[email protected]>
  • Loading branch information
3 people authored Feb 28, 2024
1 parent 5f7a542 commit 25e8e82
Show file tree
Hide file tree
Showing 18 changed files with 263 additions and 201 deletions.
4 changes: 2 additions & 2 deletions docs/content/_navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
"path": "/getting-started/what-why-dagster"
},
{
"title": "Hello, Dagster!",
"path": "/getting-started/hello-dagster"
"title": "Quickstart",
"path": "/getting-started/quickstart"
},
{
"title": "Installation",
Expand Down
6 changes: 3 additions & 3 deletions docs/content/getting-started.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ Dagster is an orchestrator that's designed for developing and maintaining data a

You declare functions that you want to run and the data assets that those functions produce or update. Dagster then helps you run your functions at the right time and keep your assets up-to-date.

Dagster is built to be used at every stage of the data development lifecycle - local development, unit tests, integration tests, staging environments, all the way up to production.
Dagster is designed to be used at every stage of the data development lifecycle, including local development, unit tests, integration tests, staging environments, and production.

**New to Dagster**? Check out the **Hello Dagster example**, learn with some hands-on **Tutorials**, or dive into **Concepts**. For an in-depth learning experience, enroll in **Dagster University**.
**New to Dagster**? Check out the **Quickstart**, learn with some hands-on **Tutorials**, or dive into **Concepts**. For an in-depth learning experience, enroll in **Dagster University**.

<div className="inline-flex flex-row space-x-4">
<Button link="/getting-started/hello-dagster">Run Hello, Dagster!</Button>
<Button link="/getting-started/quickstart">Quickstart</Button>
<Button link="/tutorial" style="secondary">
View Tutorials
</Button>
Expand Down
147 changes: 0 additions & 147 deletions docs/content/getting-started/hello-dagster.mdx

This file was deleted.

196 changes: 196 additions & 0 deletions docs/content/getting-started/quickstart.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
---
title: Quickstart | Dagster Docs
description: Run dagster for the first time
---

# Quickstart

<Note>
Looking to scaffold a new project? Check out the{" "}
<Link href="/getting-started/create-new-project">Creating a new project</Link>{" "}
guide!
</Note>

Welcome to Dagster! This guide will help you quickly run the [Dagster Quickstart](https://github.com/dagster-io/dagster-quickstart) project, showcasing Dagster's capabilities and serving as a foundation for exploring its features.

The [Dagster Quickstart](https://github.com/dagster-io/dagster-quickstart) project can be used without installing anything on your machine by using the pre-configured [GitHub Codespace](https://github.com/features/codespaces). If you prefer to run things on your own machine, however, we've got you covered.

<TabGroup>
<TabItem name="Option 1: Running locally">

### Option 1: Running Locally

<DagsterVersion />

Ensure you have one of the supported Python versions installed before proceeding.

Refer to Python's official <a href="https://www.python.org/about/gettingstarted/">getting started guide</a>, or our recommendation of using <a href="https://github.com/pyenv/pyenv?tab=readme-ov-file#installation">pyenv</a> for installing Python.

1. Clone the Dagster Quickstart repository by executing:

```bash
git clone https://github.com/dagster-io/dagster-quickstart && cd dagster-quickstart
```

2. Install the necessary dependencies using the following command:

We use `-e` to install dependencies in ["editable mode"](https://pip.pypa.io/en/latest/topics/local-project-installs/#editable-installs). This allows changes to be automatically applied when we modify code.

```bash
pip install -e ".[dev]"
```

3. Run the project!

```bash
dagster dev
```

4. Navigate to <a href="localhost:3000">localhost:3000</a> in your web browser.

5. **Success!**

</TabItem>
<TabItem name="Option 2: Using GitHub Codespaces">

### Option 2: Using GitHub Codespaces

1. Fork the [Dagster Quickstart](https://github.com/dagster-io/dagster-quickstart) repository

2. Select **Create codespace on main** from the **Code** dropdown menu.

<Image
width={400}
height={400}
alt="Create codespace"
src="/images/getting-started/quickstart/github-codespace-create.png"
/>

3. After the codespace loads, start Dagster by running `dagster dev` in the terminal:

```bash
dagster dev
```

4. Click **Open in Browser** when prompted.

<Image
width={400}
height={300}
alt="Codespace Open In Browser"
src="/images/getting-started/quickstart/github-codespace-open-in-browser.png"
/>

5. **Success!**

</TabItem>
</TabGroup>

## Navigating the User Interface

You should now have a running instance of Dagster! From here, we can run our data pipeline.

To run the pipeline, click the **Materialize All** button in the top right. In Dagster, _materialization_ refers to executing the code associated with an asset to produce an output.

<Image
alt="HackerNews assets in Dagster's Asset Graph, unmaterialized"
src="/images/getting-started/quickstart/quickstart-unmaterialized.png"
width={2000}
height={816}
/>

Congratulations! You have successfully materialized two Dagster assets:

<Image
alt="HackerNews asset graph"
src="/images/getting-started/quickstart/quickstart.png"
width={2000}
height={1956}
/>

But wait - there's more. Because the `hackernews_top_stories` asset returned some `metadata`, you can view the metadata right in the UI:

1. Click the asset
2. In the sidebar, click the **Show Markdown** link in the **Materialization in Last Run** section. This opens a preview of the pipeline result, allowing you to view the top 10 HackerNews stories:

<Image
alt="Markdown preview of HackerNews top 10 stories"
src="/images/getting-started/quickstart/hn-preview.png"
width={2000}
height={1754}
/>

## Understanding the Code

The Quickstart project defines two **Assets** using the <PyObject object="asset" decorator /> decorator:

- `hackernews_top_story_ids` retrieves the top stories from the Hacker News API and saves them as a JSON file.
- `hackernews_top_stories` asset builds upon the first asset, retrieving data for each story as a CSV file, and returns a `MaterializeResult` with a markdown preview of the top stories.

```python file=/getting-started/quickstart/assets.py
import json

import pandas as pd
import requests

from dagster import (
MaterializeResult,
MetadataValue,
asset,
)

from .configurations import HNStoriesConfig


@asset
def hackernews_top_story_ids(config: HNStoriesConfig):
"""Get top stories from the HackerNews top stories endpoint."""
top_story_ids = requests.get(
"https://hacker-news.firebaseio.com/v0/topstories.json"
).json()

with open(config.hn_top_story_ids_path, "w") as f:
json.dump(top_story_ids[: config.top_stories_limit], f)


@asset(deps=[hackernews_top_story_ids])
def hackernews_top_stories(config: HNStoriesConfig) -> MaterializeResult:
"""Get items based on story ids from the HackerNews items endpoint."""
with open(config.hn_top_story_ids_path, "r") as f:
hackernews_top_story_ids = json.load(f)

results = []
for item_id in hackernews_top_story_ids:
item = requests.get(
f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json"
).json()
results.append(item)

df = pd.DataFrame(results)
df.to_csv(config.hn_top_stories_path)

return MaterializeResult(
metadata={
"num_records": len(df),
"preview": MetadataValue.md(str(df[["title", "by", "url"]].to_markdown())),
}
)
```

---

## Next steps

Congratulations on successfully running your first Dagster pipeline! In this example, we used [assets](/tutorial), which are a cornerstone of Dagster projects. They empower data engineers to:

- Think in the same terms as stakeholders
- Answer questions about data quality and lineage
- Work with the modern data stack (dbt, Airbyte/Fivetran, Spark)
- Create declarative freshness policies instead of task-driven cron schedules

Dagster also offers [ops and jobs](/guides/dagster/intro-to-ops-jobs), but we recommend starting with assets.

To create your own project, consider the following options:

- Scaffold a new project using our [new project guide](/getting-started/create-new-project).
- Begin with an official example, like the [dbt + Dagster project](/integrations/dbt/using-dbt-with-dagster), and explore [all examples on GitHub](https://github.com/dagster-io/dagster/tree/master/examples).
2 changes: 1 addition & 1 deletion docs/content/getting-started/what-why-dagster.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Additionally, Dagster is accompanied by a sleek, modern, [web-based UI](/concept

## How does it work?

If you want to try running Dagster yourself, check out the [Hello, Dagster!](/getting-started/hello-dagster) quickstart.
If you want to try running Dagster yourself, check out the Dagster [Quickstart](/getting-started/quickstart).

---

Expand Down
4 changes: 2 additions & 2 deletions docs/content/integrations/pandas.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ description: The dagster-pandas library provides the ability to perform data val
<Note>
This page describes the <code>dagster-pandas</code> library, which is used for
performing data validation. To simply use pandas with Dagster, start with the{" "}
<a href="/getting-started/hello-dagster" target="new">
<a href="/getting-started/quickstart" target="new">
{" "}
Hello Dagster example.
Dagster Quickstart example.
</a>{" "}
Dagster makes it easy to use pandas code to manipulate data and then store
that data in other systems such as{" "}
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 25e8e82

Please sign in to comment.