Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guide for connecting to APIs #23920

Merged
merged 11 commits into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 77 additions & 1 deletion docs/docs-beta/docs/guides/external-systems/apis.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,80 @@
---
title: Connecting to APIs
sidebar_position: 20
---
---

Copy link
Member

@schrockn schrockn Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamiedemaria This is great.

@erinkcochran87 @PedramNavid curious what you think in terms of in scope for a "how-to" guide. I just want it to be clear the resources are opt-in for this use case.

I think we should include (abridged) language about when it is good to use resources. E.g:


Accessing an API through a Dagster resource is useful if you want to:

  • Parameterize how to access the service at runtime (either through API or UI) via config.
  • Surface that configuration in Dagster.
  • Centralize configuration and implementation of this API access.
  • Plug in different implementations of resources in different environments (local dev versus production, for example).

If you don't want any of these features, you should just invoke the external service directly.

When building a data pipeline, you'll likely need to connect to several external APIs, each with its own specific configuration and behavior. This guide demonstrates how to standardize your API connections and customize their configuration using Dagster resources.


## What you'll learn

- How to connect to an API using a Dagster resource
- How to use that resource in an asset
- How to configure a resource
- How to source configuration values from environment variables

<details>
<summary>Prerequisites</summary>

To follow the steps in this guide, you'll need:

- Familiarity with [Asset definitions](/concepts/assets)
- Familiarity with [resources](/concepts/resources)
- Install the `requests` library:
```bash
pip install requests
```

</details>

## Step 1: Write a resource to connect to an API

This example fetches the sunrise time for a given location from a REST API.

Begin by defining a Dagster resource with a method to return the sunrise time for a location. In the first version of this resource, the location will be hard-coded to San Francisco International Airport.


<CodeExample filePath="guides/external-systems/apis/minimal_resource.py" language="python" title="Resource to connect to the Sunrise API" />


## Step 2: Use the resource in an asset

To use the resource written in Step 1, you can provide it as a parameter to an asset after including it in the Definitions object:

<CodeExample filePath="guides/external-systems/apis/use_minimal_resource_in_asset.py" language="python" title="Use the SunResource in an asset" />

When you materialize `sfo_sunrise`, Dagster will provide an initialized `SunResource` to the `sun_resource` parameter.


## Step 3: Configure your resource
Many APIs have configuration you can set to customize your usage. Here is an updated version of the resource from Step 1 with configuration to allow for setting the query location:

<CodeExample filePath="guides/external-systems/apis/use_configurable_resource_in_asset.py" language="python" title="Use the configurable SunResource in an asset" />

The configurable resource can be provided to an asset exactly as before. When the resource is initialized, you can pass values for each of the configuration options.

When you materialize `sfo_sunrise`, Dagster will provide a `SunResource` initialized with the configuration values to the `sun_resource` parameter.


## Step 4: Source configuration values from environment variables
Resources can also be configured with environment variables. You can use Dagster's built-in `EnvVar` class to source configuration values from environment variables at materialization time.

In this example, there is a new `home_sunrise` asset. Rather than hard-coding the location of your home, you can set it in environment variables, and configure the `SunResource` by reading those values:

<CodeExample filePath="guides/external-systems/apis/env_var_configuration.py" language="python" title="Configure the resource with values from environment variables" />

When you materialize `home_sunrise`, Dagster will read the values set for the `HOME_LATITUDE`, `HOME_LONGITUDE`, and `HOME_TIMZONE` environment variables and initialize a `SunResource` with those values.

The initialized `SunResource` will be provided to the `sun_resource` parameter.

:::note
You can also fetch environment variables using the `os` library. Dagster treats each approach to fetching environment variables differently, such as when they're fetched or how they display in the UI. Refer to the [Environment variables guide](/todo) for more information.
:::


erinkcochran87 marked this conversation as resolved.
Show resolved Hide resolved
## Next steps

- [Authenticate to a resource](/guides/external-systems/authentication.md)
- [Use different resources in different execution environments](/todo)
- [Set environment variables in Dagster+](/todo)
- Learn what [Dagster-provided resources](/todo) are available to use
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import requests

import dagster as dg


class SunResource(dg.ConfigurableResource):
latitude: str
longitude: str
time_zone: str

@property
def query_string(self) -> str:
return f"https://api.sunrise-sunset.org/json?lat={self.latitude}&lng={self.longitude}&date=today&tzid={self.time_zone}"

def sunrise(self) -> str:
data = requests.get(self.query_string, timeout=5).json()
return data["results"]["sunrise"]


# highlight-start
@dg.asset
def home_sunrise(context: dg.AssetExecutionContext, sun_resource: SunResource) -> None:
sunrise = sun_resource.sunrise()
context.log.info(f"Sunrise at home is at {sunrise}.")


defs = dg.Definitions(
assets=[home_sunrise],
resources={
"sun_resource": SunResource(
latitude=dg.EnvVar("HOME_LATITUDE"),
longitude=dg.EnvVar("HOME_LONGITUDE"),
time_zone=dg.EnvVar("HOME_TIMEZONE"),
)
},
)

# highlight-end
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import requests

import dagster as dg


class SunResource(dg.ConfigurableResource):
@property
def query_string(self) -> str:
latittude = "37.615223"
longitude = "-122.389977"
time_zone = "America/Los_Angeles"
return f"https://api.sunrise-sunset.org/json?lat={latittude}&lng={longitude}&date=today&tzid={time_zone}"

def sunrise(self) -> str:
data = requests.get(self.query_string, timeout=5).json()
return data["results"]["sunrise"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import requests

import dagster as dg


class SunResource(dg.ConfigurableResource):
# highlight-start
latitude: str
longitude: str
time_zone: str

@property
def query_string(self) -> str:
return f"https://api.sunrise-sunset.org/json?lat={self.latittude}&lng={self.longitude}&date=today&tzid={self.time_zone}"

# highlight-end

def sunrise(self) -> str:
data = requests.get(self.query_string, timeout=5).json()
return data["results"]["sunrise"]


@dg.asset
def sfo_sunrise(context: dg.AssetExecutionContext, sun_resource: SunResource) -> None:
sunrise = sun_resource.sunrise()
context.log.info(f"Sunrise in San Francisco is at {sunrise}.")


# highlight-start
defs = dg.Definitions(
assets=[sfo_sunrise],
resources={
"sun_resource": SunResource(
latitude="37.615223",
longitude="-122.389977",
time_zone="America/Los_Angeles",
)
},
)

# highlight-end
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import requests

import dagster as dg


class SunResource(dg.ConfigurableResource):
@property
def query_string(self) -> str:
latittude = "37.615223"
longitude = "-122.389977"
time_zone = "America/Los_Angeles"
return f"https://api.sunrise-sunset.org/json?lat={latittude}&lng={longitude}&date=today&tzid={time_zone}"

def sunrise(self) -> str:
data = requests.get(self.query_string, timeout=5).json()
return data["results"]["sunrise"]


# highlight-start
@dg.asset
def sfo_sunrise(context: dg.AssetExecutionContext, sun_resource: SunResource) -> None:
sunrise = sun_resource.sunrise()
context.log.info(f"Sunrise in San Francisco is at {sunrise}.")


defs = dg.Definitions(assets=[sfo_sunrise], resources={"sun_resource": SunResource()})

# highlight-end
Loading