Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datasets): add wrapped dataset to store the response from a POST/PUT request #905

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

npfp
Copy link

@npfp npfp commented Oct 23, 2024

Proof of concept to start discussions about design on how to store and use the response object.

Description

To pursue discussions on how we could store the response object from a POST or PUT api request.

Relates to #748

Development notes

  • Only poc code
  • Tests to be implemented

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes
  • Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

@property
def _nested_dataset_type(
self,
) -> Type[JSONDataset | PickleDataset | MemoryDataset]:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a better way to do this.

Comment on lines +243 to +250
if self._extension == "json":
self.wrapped_dataset.save(response.json()) #TODO(npfp): expose json loads arguments
elif self._extension == "text":
self.wrapped_dataset.save(response.text)
elif self._extension:
self.wrapped_dataset.save(response)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to check on the self.wrapped_dataset type:

Suggested change
if self._extension == "json":
self.wrapped_dataset.save(response.json()) #TODO(npfp): expose json loads arguments
elif self._extension == "text":
self.wrapped_dataset.save(response.text)
elif self._extension:
self.wrapped_dataset.save(response)
elif isinstance(self.wrapped_dataset, JSONDataset):
self.wrapped_dataset.save(response.json()) #TODO(npfp): expose json loads arguments
elif isinstance(self.wrapped_dataset, TextDataset)
self.wrapped_dataset.save(response.text)
elif self._extension:
self.wrapped_dataset.save(response)

@npfp npfp force-pushed the feature/save-response-in-api-dataset branch from 2978b3c to dc433d1 Compare October 23, 2024 12:54
@npfp npfp changed the title feat(api_dataset): add wrapped dataset to store the response from a POST/PUT request feat(dataset): add wrapped dataset to store the response from a POST/PUT request Oct 23, 2024
@npfp npfp changed the title feat(dataset): add wrapped dataset to store the response from a POST/PUT request feat(datasets): add wrapped dataset to store the response from a POST/PUT request Oct 23, 2024
…/PUT request

Proof of concept to start discussions about design on how to store and use response.

Signed-off-by: Nicolas Perrin <[email protected]>
@npfp npfp force-pushed the feature/save-response-in-api-dataset branch from dc433d1 to e9980af Compare October 23, 2024 13:14
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @npfp , sorry for the very long delay here. Thanks a lot for sending this pull request.

Is your idea to make APIDataset wrap another dataset? (Similar to what you mentioned in #748 (comment))

I'm assuming that this is a breaking change on the API dataset right?

@npfp
Copy link
Author

npfp commented Jan 14, 2025

No worries. Thx for getting back to me!

Yes, I chose the wrapped dataset mainly because it would avoid maintenance burden by relying on the implementation of the wrapped dataset rather than duplicating the logic there.

Of course, it's a subjective opinion, so feel free to challenge this, if it doesn't follow kedro's guidelines.

@npfp
Copy link
Author

npfp commented Jan 14, 2025

I don't think it's a breaking change of the API dataset: the wrapped dataset args are optionals.

If not provided, the response is not saved as it's already the case.

@merelcht
Copy link
Member

It's been a while since I looked at the issue this PR is for, but if I understand correctly the problem with APIDataset is that it only returns the response, but doesn't actually save it? And this PR aims to add a wrapper so a user can indicate what format to save the response in?

On the original issue I said I didn't like the idea of importing other datasets, which I still agree with, but at the same time it does make sense now I see it in code 😜 I'll have to think about this a bit more to review it properly.

@npfp
Copy link
Author

npfp commented Jan 17, 2025

It's been a while since I looked at the issue this PR is for, but if I understand correctly the problem with APIDataset is that it only returns the response, but doesn't actually save it? And this PR aims to add a wrapper so a user can indicate what format to save the response in?

On the original issue I said I didn't like the idea of importing other datasets, which I still agree with, but at the same time it does make sense now I see it in code 😜 I'll have to think about this a bit more to review it properly.

Yes that's right:

  • some APIs would return a response that might be important to store somewhere,
  • yes exactly.

Good to know, thx! (I'm a bit biased because we almost always use wrapped dataset for our custom datasets: we're lazy and like relying on a maintained classe :-) )

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second review I think this solution is a pretty good solution. I have left some comments and asked for a second opinion. If we all agree on approach you can polish this and we can do the proper review process 🙂

Comment on lines +165 to +167
self._extension = extension
self._wrapped_dataset_args = wrapped_dataset
self._wrapped_dataset = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused on the need and difference between all these arguments. Can you explain them? Would it make sense to rename the wrapped_dataset argument in __init__ to wrapped_dataset_args?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you're right the naming is confusing. I'm changing this.

"""The wrapped dataset where response data is stored."""
if self._wrapped_dataset is None and self._extension is not None:
self._wrapped_dataset = self._nested_dataset_type(
**self._wrapped_dataset_args
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is **self._wrapped_dataset_args actually used in _nested_dataset_type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understood your question, but as _nested_dataset_type returns a dataset class,

self._nested_dataset_type(
                 **self._wrapped_dataset_args
                 )

will create the instance of the wrapped dataset:

Also I should change the name of the property _nested_dataset_type to _wrapped_dataset_type to be consistent...

@ElenaKhaustova
Copy link
Contributor

The idea of having such a dataset makes sense to me. Implementation-wise, I would expect to see something similar to https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-6.0.0/_modules/kedro_datasets/partitions/partitioned_dataset.html#PartitionedDataset, so logic of requesting data is separated from saving it via dataset and the extension is derived from dataset set rather than vice a versa.

@ElenaKhaustova ElenaKhaustova self-requested a review February 4, 2025 19:03
else:
response: requests.Response = self._execute_save_request(json_data=data)

if self._wrapped_dataset is None:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self._wrapped_dataset is None:
if self.wrapped_dataset is None:

@npfp npfp requested review from merelcht and astrojuanlu February 13, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants