-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SQLAlchemy & BigQuery sources #1062
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1062 +/- ##
==========================================
- Coverage 57.51% 57.20% -0.32%
==========================================
Files 109 111 +2
Lines 14291 14369 +78
==========================================
Hits 8220 8220
- Misses 6071 6149 +78 ☔ View full report in Codecov by Sentry. |
|
||
|
||
class SQLAlchemySource(BaseSQLSource): | ||
driver = param.String(default=None, doc="SQL driver.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be wise to have a way to input URL, perhaps as a classmethod from_url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, looks like a great start. I do have a few initial comments:
At the highest level I'm not quite seeing how this correctly implements the BaseSQLSource
APIs, i.e. at minimum I would have expected an implementation of
execute
: This is meant for executing a SQL query and returning the result as a DataFrame, yourrun_query
method seems to come close but you have to wrap it inpd.DataFrame.from_records
I'm guessing.get_tables
: This should return a list of valid tables.
Additionally I would expect a somewhat similar API to other sources, where you can define a tables
parameter that accepts a list of tables (as to limit which tables you can access), or a dictionary mapping from table name alias to a SQL expression.
I appreciate the documentation is a little sparse, so please don't hesitate to reach out to myself or Andrew to clarify things about the API.
I think there's also a misunderstanding of the role of get_schema
. The additional schema information you are getting from BigQuery is quite helpful but I'd consider that part metadata. The schema in Lumen specifically refers to a dictionary that contains the type of the column and it's min-max and unique values.
This is very helpful information. From the discussion I need to implement the following.
For documentation purposes I think I will also update the
Any other items y'all can think of will be added to the above to-do lists. |
One other thing are tests (terribly lacking in the ai/ directory, but should be maintained for source/). Not sure how hard it is to set up a mysql server, but this seems useful for testing locally (probably not CI?) https://docs.getwren.ai/oss/getting_started/sample_data/hr |
resolves #1051
Adds the following source objects