Skip to content

Commit

Permalink
Clean up metadata docs
Browse files Browse the repository at this point in the history
  • Loading branch information
petehunt committed Aug 26, 2024
1 parent 23beba5 commit 02b793c
Showing 1 changed file with 36 additions and 23 deletions.
59 changes: 36 additions & 23 deletions docs/docs-beta/docs/guides/data-modeling/metadata.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: "Adding tags and metadata to assets"
description: "Learn how to add tags and metadata to assets to improve observability in Dagster"
title: 'Adding tags and metadata to assets'
description: 'Learn how to add tags and metadata to assets to improve observability in Dagster'
sidebar_position: 40
sidebar_label: "Enriching assets with metadata"
sidebar_label: 'Enriching assets with metadata'
---

Assets feature prominently in the Dagster UI. It is often helpful to attach information to assets to understand where they are stored, what they contain, and how they should be organized.
Expand Down Expand Up @@ -30,30 +30,34 @@ To follow the steps in this guide, you'll need:

## Adding owners to your assets

In a large organization, it's important to know who is responsible for a given data asset. With `owners` it's straightforward to add individuals and teams as owners for your asset:
In a large organization, it's important to know which individuals and teams are responsible for a given data asset:

<CodeExample filePath="guides/data-modeling/metadata/owners.py" language="python" title="Using owners" />

`owners` must either be an email address, or a team name prefixed by `team:`.

> With Dagster+ Pro, you can create asset-based alerts that will automatically notify an asset's owners when triggered. Refer to the [Dagster+ alert documentation](/dagster-plus/deployment/alerts) for more information.
:::tip

With Dagster+ Pro, you can create asset-based alerts that will automatically notify an asset's owners when triggered. Refer to the [Dagster+ alert documentation](/dagster-plus/deployment/alerts) for more information.

:::

## Choosing between tags or metadata for custom information

In Dagster, you can attach custom information to assets in two ways: **tags** and **metadata**.

**Tags** are a simple way to organize assets in Dagster. You can attach several tags to an asset when it is defined, and they will appear in the UI. You can also use tags to search and filter for assets in the [asset catalog](/todo). They are structured as key-value pairs of strings.
**Tags** are the primary way to organize assets in Dagster. You can attach several tags to an asset when it is defined, and they will appear in the UI. You can also use tags to search and filter for assets in the [Asset catalog](/todo). They're structured as key-value pairs of strings.

Check warning on line 49 in docs/docs-beta/docs/guides/data-modeling/metadata.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.contractions] Use 'it's' instead of 'it is'. Raw Output: {"message": "[Dagster.contractions] Use 'it's' instead of 'it is'.", "location": {"path": "docs/docs-beta/docs/guides/data-modeling/metadata.md", "range": {"start": {"line": 49, "column": 106}}}, "severity": "INFO"}

Here's an example of some tags one might apply to an asset:

```python
{"domain": "marketing", "pii": "true"}
```

**Metadata** allows you to attach rich information to the asset, like a Markdown description, a table schema, or a time series. Metadata is more flexible than tags, as it can store more complex information. Metadata can be attached to an asset at definition time (i.e. when the code is first imported) or at runtime (every time an asset is materialized).

Here's an example of some metadata one might apply to an asset:

```python
{
"link_to_docs": MetadataValue.url("https://..."),
Expand All @@ -70,7 +74,6 @@ Like `owners`, just pass a dictionary of tags to the `tags` argument when defini

Keep in mind that tags must contain only strings as keys and values. Additionally, the Dagster UI will render tags with the empty string as a "label" rather than a key-value pair.


## Attaching metadata to an asset at definition time

Attaching metadata at definition time is quite similar to how you attach tags.
Expand All @@ -83,7 +86,7 @@ Some metadata keys will be given special treatment in the Dagster UI. See the [S

## Attaching metadata to an asset at runtime

Metadata becomes very powerful when it is attached when an asset is materialized. This allows you to update metadata when information about an asset changes and track historical metadata such as execution time and row counts as a time series.
Metadata becomes powerful when it's attached when an asset is materialized. This allows you to update metadata when information about an asset changes and track historical metadata such as execution time and row counts as a time series.

<CodeExample filePath="guides/data-modeling/metadata/runtime-metadata.py" language="python" title="Using metadata at runtime" />

Expand All @@ -93,16 +96,15 @@ Any numerical metadata will be treated as a time series in the Dagster UI.

Some metadata keys will be given special treatment in the Dagster UI.

| Key | Description |
|------------------------------|---------------------------------------------------------------------------------------------------------------|
| `dagster/uri` | **Type:** `str` <br/><br/> The URI for the asset, e.g. "s3://my_bucket/my_object" |
| `dagster/column_schema` | **Type:** [`TableSchema`](/todo) <br/><br/> For an asset that's a table, the schema of the columns in the table. Refer to the [Table and column metadata](#table-and-column-metadata) secton for details. |
| `dagster/column_lineage` | **Type:** [`TableColumnLineage`](/todo) <br/><br/> For an asset that's a table, the lineage of column inputs to column outputs for the table. Refer to the [Table and column metadata](#table-and-column-metadata) secton for details. |
| `dagster/row_count` | **Type:** `int` <br/><br/> For an asset that's a table, the number of rows in the table. Refer to the Table metadata documentation for details. |
| `dagster/partition_row_count` | **Type:** `int` <br/><br/> For a partition of an asset that's a table, the number of rows in the partition. |
| `dagster/relation_identifier` | **Type:** `str` <br/><br/> A unique identifier for the table/view, typically fully qualified. For example, my_database.my_schema.my_table |
| `dagster/code_references` | **Type:** [`CodeReferencesMetadataValue`](/todo) <br/><br/> A list of code references for the asset, such as file locations or references to Github URLs. Refer to the [Linking your assets with their source code](#linking-your-assets-with-their-source-code) section for details. Should only be provided in definition-level metadata, not materialization metadata. |

| Key | Description |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `dagster/uri` | **Type:** `str` <br/><br/> The URI for the asset, e.g. "s3://my_bucket/my_object" |

Check warning on line 101 in docs/docs-beta/docs/guides/data-modeling/metadata.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Dagster.latin] Use 'for example' instead of 'e.g.', but consider rewriting the sentence. Raw Output: {"message": "[Dagster.latin] Use 'for example' instead of 'e.g.', but consider rewriting the sentence.", "location": {"path": "docs/docs-beta/docs/guides/data-modeling/metadata.md", "range": {"start": {"line": 101, "column": 85}}}, "severity": "WARNING"}

Check failure on line 101 in docs/docs-beta/docs/guides/data-modeling/metadata.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'S3' instead of 's3'. Raw Output: {"message": "[Vale.Terms] Use 'S3' instead of 's3'.", "location": {"path": "docs/docs-beta/docs/guides/data-modeling/metadata.md", "range": {"start": {"line": 101, "column": 91}}}, "severity": "ERROR"}
| `dagster/column_schema` | **Type:** [`TableSchema`](/todo) <br/><br/> For an asset that's a table, the schema of the columns in the table. Refer to the [Table and column metadata](#table-and-column-metadata) section for details. |
| `dagster/column_lineage` | **Type:** [`TableColumnLineage`](/todo) <br/><br/> For an asset that's a table, the lineage of column inputs to column outputs for the table. Refer to the [Table and column metadata](#table-and-column-metadata) section for details. |
| `dagster/row_count` | **Type:** `int` <br/><br/> For an asset that's a table, the number of rows in the table. Refer to the Table metadata documentation for details. |
| `dagster/partition_row_count` | **Type:** `int` <br/><br/> For a partition of an asset that's a table, the number of rows in the partition. |
| `dagster/relation_identifier` | **Type:** `str` <br/><br/> A unique identifier for the table/view, typically fully qualified. For example, my_database.my_schema.my_table |
| `dagster/code_references` | **Type:** [`CodeReferencesMetadataValue`](/todo) <br/><br/> A list of code references for the asset, such as file locations or references to GitHub URLs. Refer to the [Linking your assets with their source code](#linking-your-assets-with-their-source-code) section for details. Should only be provided in definition-level metadata, not materialization metadata. |

## Table and column metadata

Expand All @@ -118,18 +120,29 @@ Note that there are several data types and constraints available on [`TableColum

### Column lineage metadata

> Many integrations such as [dbt](https://docs.dagster.io/integrations/dbt/reference) automatically attach this metadata out-of-the-box.
:::tip

Many integrations such as [dbt](https://docs.dagster.io/integrations/dbt/reference) automatically attach this metadata out-of-the-box.

:::

Column lineage metadata is a powerful way to track how columns in a table are derived from other columns. Here is how you can manually attach this metadata:

<CodeExample filePath="guides/data-modeling/metadata/table-column-lineage-metadata.py" language="python" title="Table column lineage metadata" />

> Dagster+ provides rich visualization and navigation of column lineage in the asset catalog. Refer to the [Dagster+ documentation](/dagster-plus) for more information.
:::tip

Dagster+ provides rich visualization and navigation of column lineage in the asset catalog. Refer to the [Dagster+ documentation](/dagster-plus) for more information.

Check warning on line 135 in docs/docs-beta/docs/guides/data-modeling/metadata.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Terms.dagster-ui] Use 'Asset catalog' instead of 'asset catalog' when referring to a Dagster UI component or page. Raw Output: {"message": "[Terms.dagster-ui] Use 'Asset catalog' instead of 'asset catalog' when referring to a Dagster UI component or page.", "location": {"path": "docs/docs-beta/docs/guides/data-modeling/metadata.md", "range": {"start": {"line": 135, "column": 78}}}, "severity": "WARNING"}

:::

## Linking your assets with their source code

> This feature is considered experimental and is under active development. This guide will be updated as we roll out new features.
:::warning

This feature is considered experimental and is under active development. This guide will be updated as we roll out new features.

:::

Attaching code reference metadata to your Dagster asset definitions allows you to easily view those assets' source code from the Dagster UI both in local development and in production.

Expand All @@ -149,7 +162,7 @@ You can manually add the `dagster/code_references` metadata to your asset defini

### Attaching code references in production (Dagster+)

Dagster+ can automatically annotate your assets with code references to source control such as GitHub or GitLab.
Dagster+ can automatically annotate your assets with code references to source control such as GitHub or Gitlab.

<CodeExample filePath="guides/data-modeling/metadata/plus-references.py" language="python" title="Production source code references (Dagster+)" />

Expand All @@ -159,4 +172,4 @@ If you aren't using Dagster+, you can annotate your assets with code references

<CodeExample filePath="guides/data-modeling/metadata/oss-references.py" language="python" title="Production source code references (OSS)" />

[`link_code_references_to_git`](/todo) currently supports GitHub and GitLab repositories. It also supports customization of how file paths are mapped; see the [`AnchorBasedFilePathMapping`](/todo) API docs for more information.
[`link_code_references_to_git`](/todo) currently supports GitHub and Gitlab repositories. It also supports customization of how file paths are mapped; see the [`AnchorBasedFilePathMapping`](/todo) API docs for more information.

0 comments on commit 02b793c

Please sign in to comment.