Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rest_api: simplify dependent resource relationship configuration #2190

Open
burnash opened this issue Jan 6, 2025 · 1 comment · May be fixed by #2210
Open

rest_api: simplify dependent resource relationship configuration #2190

burnash opened this issue Jan 6, 2025 · 1 comment · May be fixed by #2210
Assignees
Labels
enhancement New feature or request

Comments

@burnash
Copy link
Collaborator

burnash commented Jan 6, 2025

Background

The current syntax for defining relationships between resources in the rest_api source is verbose and requires explicit configuration of parameter resolution. This makes the configuration more complex than necessary, especially for common parent-child relationships were only a simple field reference is needed.

Current syntax:

{
    "resources": [
        {
            "name": "issues",
            "endpoint": {
                "path": "issues",
            },
        },
        {
            "name": "issue_comments",
            "endpoint": {
                "path": "issues/{issue_number}/comments",
                "params": {
                    "issue_number": {
                        "type": "resolve",
                        "resource": "issues",
                        "field": "number",
                    }
                },
            },
        },
    ],
}

Proposal

  1. Introduce a simplified reference syntax using resources.<resource_name>.<field_name>
  2. Allow direct field references in path templates and query parameters
  3. Add support incremental context object via exposing incremental object to template strings
  4. The existing resolve mechanism would be kept for backward compatibility

New syntax:

For path templates:

source = rest_api_source({
    "client": {
        "base_url": "https://example.com/api",
    },
    "resources": [
        {
            "name": "issues",
            "endpoint": {
                "path": "issues",
            },
        },
        {
            "name": "issue_comments",
            "endpoint": {
                "path": "issues/{resources.issues.number}/comments",  # reference parent field directly
            },
        },
    ],
})

For query parameters:

source = rest_api_source({
    "client": {
        "base_url": "https://example.com/api",
    },
    "resources": [
        "issues", # using the short form here
        {
            "name": "issue_comments",
            "endpoint": {
                "path": "issue/comments",
                "params": {
                    "issue_number": "{resources.issues.number}",  # reference parent field in query params
                },
            },
        },
    ],
}

For query parameters with incremental sync:

source = rest_api_source({
    "client": {
        "base_url": "https://example.com/api",
    },
    "resources": [
        "issues",
        {
            "name": "issue_comments",
            "endpoint": {
                "path": "issue/comments",
                "params": {
                    "issue_number": "{resources.issues.number}",  # reference parent field in query params
                    "since": "{incremental.last_value}", # the incremental config is defined below
                },
                "incremental": {
                     "cursor_path": "created_at",
                     "initial_value": "2025-01-12"
                }
            },
        },
    ],
}

So the HTTP request for issue_comments resource would be:

GET /issues/comments?issue_number=123&since=2025-01-12

(this should fix #1978 but with a different syntax)

Benefits:

  1. More intuitive configuration of resource relationships
  2. Reduced boilerplate code
  3. Consistent syntax for both path and query parameters (also possible to use in headers, see Enhanced parameter resolution for RESTAPIConfig and REST API source #2071)

Additional Examples:

Multiple field references:

{
    "name": "user_details",
    "endpoint": {
        "path": "groups/{resources.users.group_id}/users/{resources.users.id}/details",
    },
}

Nested field access (TBD, how to define JSONPath expressions):

{
    "name": "user_details",
    "endpoint": {
        "path": "groups/{resources.users.group.id}/users/{resources.users.id}/details",
    },
}
@burnash burnash added the enhancement New feature or request label Jan 6, 2025
@burnash burnash changed the title rest_api: simplify dependent resource relationship configuration with parent field rest_api: simplify dependent resource relationship configuration Jan 9, 2025
@burnash burnash self-assigned this Jan 13, 2025
@rudolfix rudolfix moved this from Todo to In Progress in dlt core library Jan 13, 2025
@francescomucio
Copy link
Contributor

this is a nice idea, but I wonder if it can be extended to handle also "aggregates".

An example of aggregates is an endpoint which can provide details for multiple resources (within limits) if passed as an array in a json.

One idea could be something like (to get chunks of 100 user ids):

{
    "name": "user_details",
    "parent": "users",
    "endpoint": {
        "path": "users/details",
        "json: {
              "user_ids": ["{list(parent.id, 100)}"]
        }
    },
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

2 participants