-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offer public API to get dataset info? #926
Comments
Hey @SajidAlamQB, |
Thank you for your response and for offering to work on this! I think some of the metadata we're Interested in is:
@astrojuanlu did you have any thoughts on what else? |
paging @BielStela since you opened kedro-org/kedro-viz#1714 :) before proceeding with kedro-plugins/kedro-datasets/kedro_datasets/svmlight/svmlight_dataset.py Lines 168 to 175 in 57f6279
|
@astrojuanlu This makes sense to me. I can reuse this and add any extra metadata that would be required. Is there anything outside of the list that @SajidAlamQB has given above? |
Would love to know @merelcht and @rashidakanchwala opinion before proceeding |
(Specifically, whether we should add a dedicated public |
I missed this but I love this! It would really help Kedro-viz; though we need to decide what it means for each dataset. For instance, for SQLDatasets, could it get_info provide SQL Query etc. |
We discussed this in backlog grooming and we acknowledge that the original problem (that Viz needs private data from datasets kedro-org/kedro-viz#1893 (comment)) exists and we want to address it. There was past discussion about We didn't feel like adding more Viz-specific public methods, although an exception was made for said This problem was already anticipated when implementing the preview functionality kedro-org/kedro-viz#662 (comment) and re-discovered by a user some months later kedro-org/kedro-viz#1714 Renaming this issue so it's more clear that this is an open problem. |
get_info
Method to Datasets for Metadata Retrieval
Description
Related to: kedro-org/kedro-viz#1893 (comment)
When trying to retrieve metadata (like file size) from datasets in Kedro Viz we are running into issues, specifically within our
DatasetStatsHook
. Currently, to obtain this metadata, we need to access private attributes (e.g.,_filepath
,_fs
,_protocol
), which is causing inconsistency and compatibility problems with datasets that do not expose these attributes.Context
By having a standardised, public way to retrieve dataset metadata we can improve best practices. This benefits users who want to collect dataset metadata or build plugins, as it provides a consistent and maintainable approach.
Possible Implementation
Perhaps adding an optional, public method (e.g.,
get_info
) to datasets that can provide metadata like file size, etc.The text was updated successfully, but these errors were encountered: