-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
integration docs beta 1/ replicate all integration pages from mkt sit…
…e to beta docs (#24330) ## Summary & Motivation copy over from https://dagster.io/integrations (note: this content is up-to-date per PR stack dagster-io/dagster-website#1280 few weeks ago) this PR made the following changes: 1. update title to "Dagster & <name>" 2. add `sidebar_label: <name>" so it won't show a wall of "Dagster &" on left nav 3. fix all vale errors and warnings, including a lot of vale accept additions 4. rename files from `dagster-<name>.mdx` to just `<name>.md` Next steps in later stack: - move code to python files - improve navigation: Index page and/or left nav to bucket integrations into categories, differentiate community owned Later steps: - improve doc content page one by one (e.g. template guides, reuse the good ones from current docs site) **Open discussion:** Figure out the relationship between docs and marketing site regarding integrations. * Option 1: no dagster.io/integrations and redirect that to docs.dagster.io/integrations * Yuhan's pick: I'm actually leaning towards this to completely consolidate all integration contents into the docs site for simplicity and ease of navigation so there won't be two similar content on two different sites, but I'd need to consult the SEO implication in this option. * Option 2: keep both dagster.io/integrations and docs.dagster.io/integrations; no code in marketing site, only for SEO purpose; docs pages focus on more technical guides/references. ## How I Tested These Changes **see in preview: https://dagster-docs-beta-211qncb7r-elementl.vercel.app/integrations** ## Changelog `NOCHANGELOG` --------- Co-authored-by: colton <[email protected]>
- Loading branch information
Showing
56 changed files
with
3,061 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
--- | ||
title: "Integrations" | ||
title: 'Integrations' | ||
displayed_sidebar: 'integrations' | ||
--- | ||
|
||
# Integrations |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
--- | ||
layout: Integration | ||
status: published | ||
name: Airbyte | ||
title: Dagster & Airbyte | ||
sidebar_label: Airbyte | ||
excerpt: Orchestrate Airbyte connections and schedule syncs alongside upstream or downstream dependencies. | ||
date: 2022-11-07 | ||
apireflink: https://docs.dagster.io/_apidocs/libraries/dagster-airbyte | ||
docslink: https://docs.dagster.io/integrations/airbyte | ||
partnerlink: https://airbyte.com/tutorials/orchestrate-data-ingestion-and-transformation-pipelines | ||
logo: /integrations/airbyte.svg | ||
categories: | ||
- ETL | ||
enabledBy: | ||
enables: | ||
--- | ||
|
||
### About this integration | ||
|
||
Using this integration, you can trigger Airbyte syncs and orchestrate your Airbyte connections from within Dagster, making it easy to chain an Airbyte sync with upstream or downstream steps in your workflow. | ||
|
||
### Installation | ||
|
||
```bash | ||
pip install dagster-airbyte | ||
``` | ||
|
||
### Example | ||
|
||
```python | ||
from dagster import EnvVar | ||
from dagster_airbyte import AirbyteResource, load_assets_from_airbyte_instance | ||
import os | ||
|
||
# Connect to your OSS Airbyte instance | ||
airbyte_instance = AirbyteResource( | ||
host="localhost", | ||
port="8000", | ||
# If using basic auth, include username and password: | ||
username="airbyte", | ||
password=EnvVar("AIRBYTE_PASSWORD") | ||
) | ||
|
||
# Load all assets from your Airbyte instance | ||
airbyte_assets = load_assets_from_airbyte_instance(airbyte_instance) | ||
|
||
``` | ||
|
||
### About Airbyte | ||
|
||
**Airbyte** is an open source data integration engine that helps you consolidate your SaaS application and database data into your data warehouses, lakes and databases. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
--- | ||
layout: Integration | ||
status: published | ||
name: AWS Athena | ||
title: Dagster & AWS Athena | ||
sidebar_label: AWS Athena | ||
excerpt: This integration allows you to connect to AWS Athena and analyze data in Amazon S3 using standard SQL within your Dagster pipelines. | ||
date: 2024-06-21 | ||
apireflink: https://docs.dagster.io/_apidocs/libraries/dagster-aws | ||
docslink: | ||
partnerlink: https://aws.amazon.com/ | ||
logo: /integrations/aws-athena.svg | ||
categories: | ||
- Storage | ||
enabledBy: | ||
enables: | ||
--- | ||
|
||
### About this integration | ||
|
||
This integration allows you to connect to AWS Athena, a serverless interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Using this integration, you can issue queries to Athena, fetch results, and handle query execution states within your Dagster pipelines. | ||
|
||
### Installation | ||
|
||
```bash | ||
pip install dagster-aws | ||
``` | ||
|
||
### Examples | ||
|
||
```python | ||
from dagster import Definitions, asset | ||
from dagster_aws.athena import AthenaClientResource | ||
|
||
|
||
@asset | ||
def example_athena_asset(athena: AthenaClientResource): | ||
return athena.get_client().execute_query("SELECT 1", fetch_results=True) | ||
|
||
|
||
defs = Definitions( | ||
assets=[example_athena_asset], resources={"athena": AthenaClientResource()} | ||
) | ||
``` | ||
|
||
### About AWS Athena | ||
|
||
AWS Athena is a serverless, interactive query service that allows you to analyze data directly in Amazon S3 using standard SQL. Athena is easy to use; point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there are no infrastructure setups, and you pay only for the queries you run. It scales automatically—executing queries in parallel—so results are fast, even with large datasets and complex queries. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
--- | ||
layout: Integration | ||
status: published | ||
name: AWS CloudWatch | ||
title: Dagster & AWS CloudWatch | ||
sidebar_label: AWS CloudWatch | ||
excerpt: This integration allows you to send Dagster logs to AWS CloudWatch, enabling centralized logging and monitoring of your Dagster jobs. | ||
date: 2024-06-21 | ||
apireflink: https://docs.dagster.io/_apidocs/libraries/dagster-aws | ||
docslink: | ||
partnerlink: https://aws.amazon.com/ | ||
logo: /integrations/aws-cloudwatch.svg | ||
categories: | ||
- Monitoring | ||
enabledBy: | ||
enables: | ||
--- | ||
|
||
### About this integration | ||
|
||
This integration allows you to send Dagster logs to AWS CloudWatch, enabling centralized logging and monitoring of your Dagster jobs. By using AWS CloudWatch, you can take advantage of its powerful log management features, such as real-time log monitoring, log retention policies, and alerting capabilities. | ||
|
||
Using this integration, you can configure your Dagster jobs to log directly to AWS CloudWatch, making it easier to track and debug your workflows. This is particularly useful for production environments where centralized logging is essential for maintaining observability and operational efficiency. | ||
|
||
### Installation | ||
|
||
```bash | ||
pip install dagster-aws | ||
``` | ||
|
||
### Examples | ||
|
||
```python | ||
import dagster as dg | ||
from dagster_aws.cloudwatch import cloudwatch_logger | ||
|
||
|
||
@dg.asset | ||
def my_asset(context: dg.AssetExecutionContext): | ||
context.log.info("Hello, CloudWatch!") | ||
context.log.error("This is an error") | ||
context.log.debug("This is a debug message") | ||
|
||
|
||
defs = dg.Definitions( | ||
assets=[my_asset], | ||
loggers={ | ||
"cloudwatch_logger": cloudwatch_logger, | ||
}, | ||
) | ||
``` | ||
|
||
### About AWS CloudWatch | ||
|
||
AWS CloudWatch is a monitoring and observability service provided by Amazon Web Services (AWS). It allows you to collect, access, and analyze performance and operational data from a variety of AWS resources, applications, and services. With AWS CloudWatch, you can set up alarms, visualize logs and metrics, and gain insights into your infrastructure and applications to ensure they're running smoothly. | ||
|
||
AWS CloudWatch provides features such as: | ||
|
||
- Real-time monitoring: Track the performance of your applications and infrastructure in real-time. | ||
- Log management: Collect, store, and analyze log data from various sources. | ||
- Alarms and notifications: Set up alarms to automatically notify you of potential issues. | ||
- Dashboards: Create custom dashboards to visualize metrics and logs. | ||
- Integration with other AWS services: Seamlessly integrate with other AWS services for a comprehensive monitoring solution. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
--- | ||
layout: Integration | ||
status: published | ||
name: AWS ECR | ||
title: Dagster & AWS ECR | ||
sidebar_label: AWS ECR | ||
excerpt: This integration allows you to connect to AWS Elastic Container Registry (ECR), enabling you to manage your container images more effectively in your Dagster pipelines. | ||
date: 2024-06-21 | ||
apireflink: https://docs.dagster.io/_apidocs/libraries/dagster-aws | ||
docslink: | ||
partnerlink: https://aws.amazon.com/ | ||
logo: /integrations/aws-ecr.svg | ||
categories: | ||
- Other | ||
enabledBy: | ||
enables: | ||
--- | ||
|
||
### About this integration | ||
|
||
This integration allows you to connect to AWS Elastic Container Registry (ECR). It provides resources to interact with AWS ECR, enabling you to manage your container images. | ||
|
||
Using this integration, you can seamlessly integrate AWS ECR into your Dagster pipelines, making it easier to manage and deploy containerized applications. | ||
|
||
### Installation | ||
|
||
```bash | ||
pip install dagster-aws | ||
``` | ||
|
||
### Examples | ||
|
||
```python | ||
from dagster import asset, Definitions | ||
from dagster_aws.ecr import ECRPublicResource | ||
|
||
|
||
@asset | ||
def get_ecr_login_password(ecr_public: ECRPublicResource): | ||
return ecr_public.get_client().get_login_password() | ||
|
||
|
||
defs = Definitions( | ||
assets=[get_ecr_login_password], | ||
resources={ | ||
"ecr_public": ECRPublicResource( | ||
region_name="us-west-1", | ||
aws_access_key_id="your_access_key_id", | ||
aws_secret_access_key="your_secret_access_key", | ||
aws_session_token="your_session_token", | ||
) | ||
}, | ||
) | ||
``` | ||
|
||
### About AWS ECR | ||
|
||
AWS Elastic Container Registry (ECR) is a fully managed Docker container registry that makes it easy for developers to store, manage, and deploy Docker container images. AWS ECR is integrated with Amazon Elastic Kubernetes Service (EKS), simplifying your development to production workflow. With ECR, you can securely store and manage your container images and easily integrate with your existing CI/CD pipelines. AWS ECR provides high availability and scalability, ensuring that your container images are always available when you need them. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
--- | ||
layout: Integration | ||
status: published | ||
name: AWS EMR | ||
title: Dagster & AWS EMR | ||
sidebar_label: AWS EMR | ||
excerpt: The AWS EMR integration allows you to seamlessly integrate AWS EMR into your Dagster pipelines for petabyte-scale data processing using open source tools like Apache Spark, Hive, Presto, and more. | ||
date: 2024-06-21 | ||
apireflink: https://docs.dagster.io/_apidocs/libraries/dagster-aws | ||
docslink: | ||
partnerlink: https://aws.amazon.com/ | ||
logo: /integrations/aws-emr.svg | ||
categories: | ||
- Compute | ||
enabledBy: | ||
enables: | ||
--- | ||
|
||
### About this integration | ||
|
||
The `dagster-aws` integration provides ways orchestrating data pipelines that leverage AWS services, including AWS EMR (Elastic MapReduce). This integration allows you to run and scale big data workloads using open source tools such as Apache Spark, Hive, Presto, and more. | ||
|
||
Using this integration, you can: | ||
|
||
- Seamlessly integrate AWS EMR into your Dagster pipelines. | ||
- Utilize EMR for petabyte-scale data processing. | ||
- Easily manage and monitor EMR clusters and jobs from within Dagster. | ||
- Leverage Dagster's orchestration capabilities to handle complex data workflows involving EMR. | ||
|
||
### Installation | ||
|
||
```bash | ||
pip install dagster-aws | ||
``` | ||
|
||
### Examples | ||
|
||
```python | ||
from pathlib import Path | ||
from typing import Any | ||
|
||
from dagster import Definitions, ResourceParam, asset | ||
from dagster_aws.emr import emr_pyspark_step_launcher | ||
from dagster_aws.s3 import S3Resource | ||
from dagster_pyspark import PySparkResource | ||
from pyspark.sql import DataFrame, Row | ||
from pyspark.sql.types import IntegerType, StringType, StructField, StructType | ||
|
||
|
||
emr_pyspark = PySparkResource(spark_config={"spark.executor.memory": "2g"}) | ||
|
||
|
||
@asset | ||
def people( | ||
pyspark: PySparkResource, pyspark_step_launcher: ResourceParam[Any] | ||
) -> DataFrame: | ||
schema = StructType( | ||
[StructField("name", StringType()), StructField("age", IntegerType())] | ||
) | ||
rows = [ | ||
Row(name="Thom", age=51), | ||
Row(name="Jonny", age=48), | ||
Row(name="Nigel", age=49), | ||
] | ||
return pyspark.spark_session.createDataFrame(rows, schema) | ||
|
||
|
||
@asset | ||
def people_over_50( | ||
pyspark_step_launcher: ResourceParam[Any], people: DataFrame | ||
) -> DataFrame: | ||
return people.filter(people["age"] > 50) | ||
|
||
|
||
defs = Definitions( | ||
assets=[people, people_over_50], | ||
resources={ | ||
"pyspark_step_launcher": emr_pyspark_step_launcher.configured( | ||
{ | ||
"cluster_id": {"env": "EMR_CLUSTER_ID"}, | ||
"local_pipeline_package_path": str(Path(__file__).parent), | ||
"deploy_local_pipeline_package": True, | ||
"region_name": "us-west-1", | ||
"staging_bucket": "my_staging_bucket", | ||
"wait_for_logs": True, | ||
} | ||
), | ||
"pyspark": emr_pyspark, | ||
"s3": S3Resource(), | ||
}, | ||
) | ||
``` | ||
|
||
### About AWS EMR | ||
|
||
**AWS EMR** (Elastic MapReduce) is a cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. It simplifies running big data frameworks, allowing you to process and analyze large datasets quickly and cost-effectively. AWS EMR provides the scalability, flexibility, and reliability needed to handle complex data processing tasks, making it an ideal choice for data engineers and scientists. |
Oops, something went wrong.
6dca8c2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deploy preview for dagster-docs-beta ready!
✅ Preview
https://dagster-docs-beta-2locj6pyt-elementl.vercel.app
https://dagster-docs-beta.dagster-docs.io
Built with commit 6dca8c2.
This pull request is being automatically deployed with vercel-action
6dca8c2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deploy preview for dagster-docs ready!
✅ Preview
https://dagster-docs-8no15cmtx-elementl.vercel.app
https://master.dagster.dagster-docs.io
Built with commit 6dca8c2.
This pull request is being automatically deployed with vercel-action