Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Conditionally required settings #1519

Open
aaronsteers opened this issue Mar 22, 2023 · 7 comments · May be fixed by #2789
Open

feat: Conditionally required settings #1519

aaronsteers opened this issue Mar 22, 2023 · 7 comments · May be fixed by #2789

Comments

@aaronsteers
Copy link
Contributor

Notes from office hours:

  • Settings group validation today is not very helpful in many common use cases.
  • Example case, "if setting A, then require setting B"
  • Per setting requirements: certain settings can require other settings.
  • Without something like that in Meltano, the only way to tell if config is wrong is to run config plugin test.
  • Example is 'tap-github'.
  • Proposal for SDK to have a feature to validate config.
  • Pat: some plugins use JSON Schema for validation.

We want to be able to tell if the config is valid:

  • Meltano option for setting validation.

Reuben use cases:

  1. When a tap supports multiple authorization flows, we want to validate that the flow is fully valid.
    • E.g. anything with OAuth option as well as other username/password or other credential options.
  2. Mutually exclusive input options.
    • E.g. tap-github (MeltanoLabs) - With list of repositories or list of organizations or list of user IDs.
  3. Sometimes want to validate based on a value, not just the presence of a specific key.
    • E.g. If "auth_flow" value is "oauth", then certain params would be required and/or disallowed.
@aaronsteers aaronsteers converted this from a draft issue Mar 22, 2023
@aaronsteers aaronsteers changed the title Reuben: Config validation Advanced config validation requirements for SDK-based taps and targets Mar 22, 2023
@aaronsteers aaronsteers moved this from Up Next to To Discuss in Office Hours Mar 22, 2023
@ReubenFrankel
Copy link
Contributor

ReubenFrankel commented Mar 23, 2023

Initial proposal from office hours 2023-03-22 (tap-spotify mock example):

tap_spotify.tap

# ...

class TapSpotify(Tap):
    # ...

    # config_jsonschema as a callable to evaluate at runtime
    config_jsonschema = lambda cls, config: th.PropertiesList(
        th.Property(
            "client_id",
            th.StringType,
            required=config.get("auth_flow") == "client_credentials",
            description="App client ID",
        ),
        th.Property(
            "client_secret",
            th.StringType,
            required=config.get("auth_flow") == "client_credentials",
            description="App client secret",
        ),
        th.Property(
            "refresh_token",
            th.StringType,
            required=config.get("flow") == "client_credentials",
            description="Refresh token",
        ),
        th.Property(
            "username",
            th.StringType,
            required=config.get("flow") in ["password", None],
            description="Username",
        ),
        th.Property(
            "password",
            th.StringType,
            required=config.get("flow") in ["password", None],
            description="Password",
        ),
        th.Property(
            "oauth2_flow",
            th.StringType,
            required=False,
            default="client_credentials"
            description="OAuth2 flow",
        ),
    ).to_dict()

# ...

singer_sdk.plugin_base

# ...

class PluginBase(metaclass=abc.ABCMeta):
    # ...

    config_jsonschema: dict | Callable[[type[PluginBase], Mapping[str, Any]], dict] = {}

    # ...

    # new method to resolve JSON schema base on defined type
    @classmethod
    def _resolve_config_jsonschema(
        cls: type[PluginBase],
        config: Mapping[str, Any] = None,
    ) -> dict[str, Any]:
        config_jsonschema = cls.config_jsonschema

        if not isinstance(config_jsonschema, dict):
            config_jsonschema = config_jsonschema(cls, config or {})

        return config_jsonschema

    # ...

    @classproperty
    def _env_var_config(cls) -> dict[str, Any]:
        """Return any config specified in environment variables.

        Variables must match the convention "<PLUGIN_NAME>_<SETTING_NAME>",
        all uppercase with dashes converted to underscores.

        Returns:
            Dictionary of configuration parsed from the environment.
        """
        plugin_env_prefix = f"{cls.name.upper().replace('-', '_')}_"
        config_jsonschema = cls._resolve_config_jsonschema()  # used here
        cls.append_builtin_config(config_jsonschema)

        return parse_environment_config(config_jsonschema, plugin_env_prefix)

    # ...

    def _validate_config(
        self,
        raise_errors: bool = True,
        warnings_as_errors: bool = False,
    ) -> tuple[list[str], list[str]]:
        """Validate configuration input against the plugin configuration JSON schema.

        Args:
            raise_errors: Flag to throw an exception if any validation errors are found.
            warnings_as_errors: Flag to throw an exception if any warnings were emitted.

        Returns:
            A tuple of configuration validation warnings and errors.

        Raises:
            ConfigValidationError: If raise_errors is True and validation fails.
        """
        warnings: list[str] = []
        errors: list[str] = []
        log_fn = self.logger.info
        config_jsonschema = self._resolve_config_jsonschema(self.config)  # used here

        if config_jsonschema:
            self.append_builtin_config(config_jsonschema)
            self.logger.debug(
                f"Validating config using jsonschema: {config_jsonschema}",
            )
            validator = JSONSchemaValidator(config_jsonschema)
            errors = [e.message for e in validator.iter_errors(self._config)]

        if errors:
            summary = (
                f"Config validation failed: {'; '.join(errors)}\n"
                f"JSONSchema was: {config_jsonschema}"
            )
            if raise_errors:
                raise ConfigValidationError(summary)

            log_fn = self.logger.warning
        else:
            summary = f"Config validation passed with {len(warnings)} warnings."
            for warning in warnings:
                summary += f"\n{warning}"

        if warnings_as_errors and raise_errors and warnings:
            raise ConfigValidationError(
                f"One or more warnings ocurred during validation: {warnings}",
            )
        log_fn(summary)
        return warnings, errors

# ...

@edgarrmondragon
Copy link
Collaborator

We could add a OneOf class as well as a Discriminator that wraps a required JSON Schema const:

OneOf(
  ObjectType(
    Discriminator(name="flow", value="oauth"),
    Property("client_id", StringType, secret=True, required=True),
    Property("client_secret", StringType, secret=True, required=True),
  ),
  ObjectType(
    Discriminator(name="flow", value="apikey"),
    Property("api_key", StringType, secret=True, required=True),
  ),
)

Roughly equivalent to

JSON Schema
{
  "oneOf": [
    {
      "type": "object",
      "properties": {
        "flow": {
          "const": "oauth"
        },
        "client_id": {
          "type": "string"
        },
        "client_secret": {
          "type": "string"
        }
      },
      "required": [
        "flow",
        "client_id",
        "client_secret"
      ]
    },
    {
      "type": "object",
      "properties": {
        "flow": {
          "const": "apikey"
        },
        "key": {
          "type": "string"
        }
      },
      "required": [
        "flow",
        "key"
      ]
    }
  ]
}

I think this covers all the use cases described above. Wdyt?

@aaronsteers
Copy link
Contributor Author

aaronsteers commented Mar 29, 2023

@edgarrmondragon - Yeah, this solution looks really good. Where does this statement need to exist in relationship to the PropertiesList object, which builds the "properties: {...}" declaration. Would it be inline with the arrayArgs we pass to the PropertiesList constructor, something like this

PropertiesList(
    Property("flow", StringType()),
    OneOf(
      ObjectType(
        Discriminator(name="flow", value="oauth"),
        Property("client_id", StringType, secret=True, required=True),
        Property("client_secret", StringType, secret=True, required=True),
      ),
      ObjectType(
        Discriminator(name="flow", value="apikey"),
        Property("api_key", StringType, secret=True, required=True),
      ),
    )
)

If I understand correctly, I think the above should work. I wonder if it's also possible to declare these below the properties list.

PropertiesList(
    Property("flow", StringType),
    Property("client_id", StringType, secret=True),
    Property("client_secret", StringType, secret=True),
    Property("api_key", StringType, secret=True),
    constraints=[
      OneOf(
        ObjectType(
          Discriminator(name="flow", value="oauth"),
          PropertyRef("client_id"),
          PropertyRef("client_secret"),
        ),
        ObjectType(
          Discriminator(name="flow", value="apikey"),
          PropertyRef("api_key"),
        ),
      )
    ]
)

This type of syntax is probably worse for the above use case, since there aren't any redundancies across declarations. But in a scenario where we wouldn't want to redeclare overlapped settings in multiple scenarios, it could be handy to declare the conditions after the property declarations. Wdyt?

We can discuss again in office hours today too. 😄

@aaronsteers aaronsteers moved this from To Discuss to Up Next in Office Hours Mar 29, 2023
@aaronsteers aaronsteers moved this from Up Next to Discussed in Office Hours Mar 29, 2023
@edgarrmondragon
Copy link
Collaborator

edgarrmondragon commented Mar 30, 2023

@aaronsteers I couldn't join OH yesterday but I've just watched the recording and here's some notes.

  • No singer_sdk.typing helpers will influence the markdown rendering of settings since we're doing that from a processed JSON schema, not the original PropertiesList object.

    In the case of oneOf, we could render multiple tables:

    To configure the tap, use any of the following sets of properties:
    
    ### `oauth` (we could use the `const` value or the `title` from the `oneOf` element
    
    | Setting       | Required | Default | Description           |
    |:--------------|:--------:|:-------:|:----------------------|
    | flow          | True     | None    | Always set to `oauth` |
    | client_id     | True     | None    |                       |
    | client_secret | True     | None    |                       |
    
    ### `apikey`
    
    | Setting | Required | Default | Description            |
    |:--------|:--------:|:-------:|:-----------------------|
    | flow    | True     | None    | Always set to `apikey` |
    | api_key | True     | None    |                        |

    The advantage of this is that we could recurse if the oneOf setting is nested and not at the root level, replacing the first line with

    To configure `some-property`, use any of the following sets of properties:
  • Redundant properties are not too big of a deal in my opinion since the developer could just declare those once:

    REDUNDANT_PROP = StringType("is_required", required=True)
    
    schema = DiscriminatedUnion(
        "flow",
        oauth=ObjectType(
            StringType("client_id", secret=True, required=True),
            StringType("client_secret", secret=True, required=True),
            REDUNDANT_PROP,
        ),
        apikey=ObjectType(
            StringType("key", secret=True, required=True),
            REDUNDANT_PROP,
        ),
    )

    (latest syntax from feat: Support union schemas #1525)

  • If I understand correctly, I think the above should work. I wonder if it's also possible to declare these below the properties list.

    That would probably need treating the elements of PropertiesList as definitions rather that direct properties, and would require a refactor.

    It's also probably risky, since it doesn't make sense to have top-level properties when there's a oneOf in the schema, so the developer would need to make sure no property is left out of the constraints.

@stale
Copy link

stale bot commented Jul 28, 2023

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

@stale stale bot added the stale label Jul 28, 2023
@edgarrmondragon
Copy link
Collaborator

I think the only missing piece here is better support in --about --format=markdown. Currently complex/object settings are not fully rendered: #1884

@edgarrmondragon
Copy link
Collaborator

https://json-schema.org/understanding-json-schema/reference/conditionals would make for an elegant solution to this.

@stale stale bot removed the stale label Nov 30, 2024
@edgarrmondragon edgarrmondragon changed the title Advanced config validation requirements for SDK-based taps and targets feat: Conditionally required settings Nov 30, 2024
@edgarrmondragon edgarrmondragon linked a pull request Dec 1, 2024 that will close this issue
@edgarrmondragon edgarrmondragon self-assigned this Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants