Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] : Implementation of read_archive function #1438

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

Sabrina-Hassaim
Copy link

@Sabrina-Hassaim Sabrina-Hassaim commented Jan 21, 2025

PR Description

Please describe the changes proposed in the pull request:

1. Implementation of the read_archive Function:

  • Added a new method to read archive files (.zip, .tar, .tar.gz) and extract their contents as a DataFrame or a list of compatible files.
  • Supports CSV and Excel file formats within the archives.

2. Unit Tests

  • Added tests to validate the behavior of the read_archive method:
  • Ensures correct reading of files from .zip and .tar.gz formats.
  • Handles cases where the file is not a valid archive or does not contain compatible files.
  • Tests include interactive behavior for file selection.

This PR resolves #(put issue number here, and remove parentheses).

PR Checklist

Please ensure that you have done the following:

  1. PR in from a fork off your branch. Do not PR from <your_username>:dev, but rather from <your_username>:<feature-branch_name>.
  1. If you're not on the contributors list, add yourself to AUTHORS.md.
  1. Add a line to CHANGELOG.md under the latest version header (i.e. the one that is "on deck") describing the contribution.
    • Do use some discretion here; if there are multiple PRs that are related, keep them in a single line.

Automatic checks

There will be automatic checks run on the PR. These include:

  • Building a preview of the docs on Netlify
  • Automatically linting the code
  • Making sure the code is documented
  • Making sure that all tests are passed
  • Making sure that code coverage doesn't go down.

Relevant Reviewers

Please tag maintainers to review.

Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sabrina-Hassaim thank you for kickstarting this PR! It was a relatively easy one to review, and I just happened to have a small chunk of time to review it. I'm going to request changes here, as it seems to me that anything related to I/O should live in io.py and follow the patterns there. Once we're done with these changes and @samukweku has reviewed, I think we can merge and cut a new release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sabrina-Hassaim it feels like the changes in here weren't necessary for testing read_archive functionality, is that right? Could you revert these please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the contents of this file should be moved under io, and they do not need the @pf.register_dataframe_method decorator either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file feels superfluous, could you delete it please?

@@ -25,7 +25,7 @@ def test_docs_general_functions_present():
# I put in a subsample of general functions.
# This can be made much more robust.
rendered_correctly = False
with open("./site/api/functions/index.html", "r+") as f:
with open("./site/api/functions/index.html", "r+", encoding="utf-8") as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the encoding argument necessary here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the contents of read_archive.py moving into io.py, I think these tests can be moved to the appropriate test file as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file shouldn't be changed, IMO, based on what I see in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants