Add SQLAlchemy & BigQuery sources #1062

amaloney · 2025-02-18T17:30:11Z

resolves #1051

Adds the following source objects

SQLAlchemy
BigQuery

codecov · 2025-02-18T17:32:46Z

Codecov Report

Attention: Patch coverage is 0% with 78 lines in your changes missing coverage. Please review.

Project coverage is 57.20%. Comparing base (1dd6cee) to head (a3b1062).

Files with missing lines	Patch %	Lines
lumen/sources/bigquery.py	0.00%	53 Missing ⚠️
lumen/sources/sqlalchemy.py	0.00%	25 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1062      +/-   ##
==========================================
- Coverage   57.51%   57.20%   -0.32%     
==========================================
  Files         109      111       +2     
  Lines       14291    14369      +78     
==========================================
  Hits         8220     8220              
- Misses       6071     6149      +78

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ahuang11 · 2025-02-18T17:38:16Z

lumen/sources/sqlalchemy.py

+
+
+class SQLAlchemySource(BaseSQLSource):
+    driver = param.String(default=None, doc="SQL driver.")


I think it'd be wise to have a way to input URL, perhaps as a classmethod from_url

philippjfr

Thanks for the PR, looks like a great start. I do have a few initial comments:

At the highest level I'm not quite seeing how this correctly implements the BaseSQLSource APIs, i.e. at minimum I would have expected an implementation of

execute: This is meant for executing a SQL query and returning the result as a DataFrame, your run_query method seems to come close but you have to wrap it in pd.DataFrame.from_records I'm guessing.
get_tables: This should return a list of valid tables.

Additionally I would expect a somewhat similar API to other sources, where you can define a tables parameter that accepts a list of tables (as to limit which tables you can access), or a dictionary mapping from table name alias to a SQL expression.

I appreciate the documentation is a little sparse, so please don't hesitate to reach out to myself or Andrew to clarify things about the API.

I think there's also a misunderstanding of the role of get_schema. The additional schema information you are getting from BigQuery is quite helpful but I'd consider that part metadata. The schema in Lumen specifically refers to a dictionary that contains the type of the column and it's min-max and unique values.

amaloney · 2025-02-18T18:39:05Z

ahuang11 · 2025-02-20T20:11:14Z

One other thing are tests (terribly lacking in the ai/ directory, but should be maintained for source/).

Not sure how hard it is to set up a mysql server, but this seems useful for testing locally (probably not CI?) https://docs.getwren.ai/oss/getting_started/sample_data/hr

Add SQLAlchemy & BigQuery sources

a3b1062

amaloney self-assigned this Feb 18, 2025

ahuang11 reviewed Feb 18, 2025

View reviewed changes

philippjfr reviewed Feb 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SQLAlchemy & BigQuery sources #1062

Add SQLAlchemy & BigQuery sources #1062

amaloney commented Feb 18, 2025

codecov bot commented Feb 18, 2025

ahuang11 Feb 18, 2025

philippjfr Feb 18, 2025

philippjfr left a comment

amaloney commented Feb 18, 2025

ahuang11 commented Feb 20, 2025



		class SQLAlchemySource(BaseSQLSource):
		driver = param.String(default=None, doc="SQL driver.")

Add SQLAlchemy & BigQuery sources #1062

Are you sure you want to change the base?

Add SQLAlchemy & BigQuery sources #1062

Conversation

amaloney commented Feb 18, 2025

codecov bot commented Feb 18, 2025

Codecov Report

ahuang11 Feb 18, 2025

Choose a reason for hiding this comment

philippjfr Feb 18, 2025

Choose a reason for hiding this comment

philippjfr left a comment

Choose a reason for hiding this comment

amaloney commented Feb 18, 2025

ahuang11 commented Feb 20, 2025