-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load_as_spark using config as variable instead of file #483
Comments
@stevenayers-bge: Would you be satisfied with building the json object while bringing the secrets from key vault and any other configuration from application resource configuration or even from the keyvault? |
@aimtsou initializing the rest client works fine, for example: import delta_sharing
from delta_sharing.protocol import DeltaSharingProfile
# GET SECRET VALUE
delta_sharing_config = dbutils.secrets.get(scope="delta-sharing", key="my-config")
# PASS SECRET CONFIG WITHOUT SAVING TO FILE
profile = DeltaSharingProfile.from_json(delta_sharing_config)
client = delta_sharing.SharingClient(profile)
# List tables available in share
client.list_all_tables() The issue is when you go to read the data into a dataframe. You cannot use import delta_sharing
share_file = "/tmp/my-secret-config-stored-in-a-file" # security problem!
df = delta_sharing.load_as_spark(share_file + '#nep_test.reference.settlement_period_calendar') |
@aimtsou @linzhou-db this old PR would solve the issue #103 |
Now i get your point. Yes the load_as_spark function does not support SharingClient, so you are proposing something different from me, to pass the whole JSON stored in secure storage as a profile to Spark, I proposed to build the JSON object in python which is closer to @zsxwing suggests. I see 2 different implementations to be honest. My question is what is your incentive for storing the whole file is a secret store? Usually we retrieve only the secret value. I mean in your case the person who has access to this databricks workspace has already access to the whole file since it is in the secret store. |
@aimtsou i don't mind how it's done 🙂 Im not sure your PR will help though. load_as_spark doesn't interact with the Python SharingClient. load_as_spark passes the config file path to the JVM, and the file is read and parsed within the delta sharing scala library. |
I don't have an incentive to store the whole file in a secret manager, but it is often easier to store the whole json config in one secret. It would be equally secure if you only stored the bearerToken in a secrets manager, but then you're storing some of the config on a filesystem, and some in a secrets manager, which is a bit messy. All I'm proposing is that we can pass profile configuration via parameters, rather than passing in a filepath. So something like: share_endpoint = ... pulled in from a secretsmanager or whatever
share_token = ... pulled in from a secretsmanager or whatever
delta_sharing.load_as_spark(
share_name='<share-name>',
schema_name='<schema name>',
table_name='my_table',
endpoint=share_endpoint,
bearer_token=share_token
)
# or pass a client object?
client: SharingClient = ...
delta_sharing.load_as_spark(client=client, share_name=.... etc) @aimtsou does that make sense? |
For me yes and as you said yes there is only a |
This is causing me a whole world of pain of Databricks. Having to write the config into a temporary file that sometimes needs a /dbfs prefix and sometimes doesn't. Completely agree that a solution that does not rely on a file base url would be much better. |
@linzhou-db @moderakh @zhu-tom @pranavsuku-db do you have any time to look at this? I can also raise a PR with the changes if that helps? I know it's available in the rust client, and I've seen you're using maturin/pyo3 so it might already be possible, I'm just not very familiar with how rust & python interact. Please let me know if there's anything I can do to help 🙏 |
@stevenayers-bge sure feel free to send out the PR. |
This will be a awesome change! In my opinion this should be possible to do with other readers to, like delta_sharing.load_as_pandas() |
Bump on this please - as a Delta Share provider some of our recipients might object to having to persist credentials to a file, even temporarily. |
Hi all,
I want to use
load_as_spark
, but instead of saving the config credentials as a file, I want to pull them from a secrets manager (Azure Vault, AWS SecretsManager, Databricks Secret) instead of storing the file on the filesystem. I want to avoid saving the json config to the filesystem for security reasons.For example:
Is this possible?
Thanks
The text was updated successfully, but these errors were encountered: