Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set instance's default olap connector as model's default output connector #6302

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

k-anshul
Copy link
Member

Setting it to default OLAP connector allows us to skip setting output in models that ingests data from external sources. So the following model definition is sufficient to ingest data from postgres to duckdb.

connector: postgres
sql: SELECT * FROM students 
dsn: "postgresql://postgres:postgres@localhost:5432/postgres"

@k-anshul k-anshul self-assigned this Dec 18, 2024
Copy link
Contributor

@begelundmuller begelundmuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned if we will end up regretting this because:

  1. It basically breaks all models in projects that don't have olap_connector: set. I'm not sure how wide the impact of this will be, but worried it could cause some unexpected issues.
  2. If we add support for doing modeling in another connector (like in BigQuery) and then only transferring the final model to the OLAP, it will be unintuitive that any model without an explicit output: gets transferred to the OLAP.

Any thoughts on this? Maybe the alternatives are worse, just a little concerned here.

Some alternatives I can think of are:

  • If there is type: source, emit a model that has output.connector = <default olap>.
  • If the input connector is not a supported output connector, output to the default OLAP.

@k-anshul
Copy link
Member Author

Yeah I also had mixed opinions on this. Optimising for one use case breaks another use case.

It basically breaks all models in projects that don't have olap_connector set.

For duckDB projects this is not a concern since that is the default olap_connector. For clickhouse, modeling is not common and when used setting olap_connector is mostly necessary.

If we add support for doing modeling in another connector (like in BigQuery) and then only transferring the final model to the OLAP, it will be unintuitive that any model without an explicit output: gets transferred to the OLAP.

Fair point. Also as I mentioned it is going to be problematic for multiple olap connectors.

If there is type: source, emit a model that has output.connector = .

Yeah that can be done. This is mostly for cases when building model.yaml manually.

If the input connector is not a supported output connector, output to the default OLAP.

I am not sure if this can be checked easily without hardcoding supported output connectors in parser.


The reason I wanted to put this change was that a lot of time internal users have complained that why do we need to add output connector.
Given the use case of ingestion is going to be far more common than other use cases like modeling in another connector and multiple connectors may be we optimise for that.
But I will let you make a call here.

@begelundmuller
Copy link
Contributor

Given the use case of ingestion is going to be far more common than other use cases like modeling in another connector and multiple connectors may be we optimise for that. But I will let you make a call here.

I'm not yet comfortable with outputting to the default OLAP connector by default. I'm worried that a) it becomes too easy to accidentally do an expensive data ingestion if you forgot to set olap_connector in rill.yaml, b) we would regret this change if we add support for modeling outside the OLAP (e.g. in BigQuery) in the future.

Instead, what do you think about:

  1. If we observe type: source, treat it as a model with output.connector=<default OLAP>, and
  2. Support writing output: <connector name> as a shorthand for output: connector: <connector name>

This means you can define a source for e.g. DuckDB as:

type: source
connector: bigquery
sql: SELECT * FROM project.dataset.table

or:

type: model
connector: bigquery
sql: SELECT * FROM project.dataset.table
output: duckdb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants