Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Added a generic FileStream (still in active development!) #2654

Merged
merged 45 commits into from
Oct 2, 2024

Conversation

edgarrmondragon
Copy link
Collaborator

@edgarrmondragon edgarrmondragon commented Sep 6, 2024

@edgarrmondragon edgarrmondragon linked an issue Sep 6, 2024 that may be closed by this pull request
Copy link

codspeed-hq bot commented Sep 6, 2024

CodSpeed Performance Report

Merging #2654 will not alter performance

Comparing 2648-feat-add-a-generic-filestream-interface (e527cee) with main (56b496f)

Summary

✅ 6 untouched benchmarks

Copy link

codecov bot commented Sep 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.50%. Comparing base (5c68b09) to head (e527cee).
Report is 99 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2654      +/-   ##
==========================================
+ Coverage   90.21%   90.50%   +0.29%     
==========================================
  Files          58       62       +4     
  Lines        4895     4994      +99     
  Branches      964      974      +10     
==========================================
+ Hits         4416     4520     +104     
+ Misses        331      328       -3     
+ Partials      148      146       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@edgarrmondragon edgarrmondragon force-pushed the 2648-feat-add-a-generic-filestream-interface branch 4 times, most recently from d3d86fe to 4dfdd17 Compare September 10, 2024 03:40
@edgarrmondragon edgarrmondragon force-pushed the 2648-feat-add-a-generic-filestream-interface branch from 4dfdd17 to bd5c138 Compare September 10, 2024 03:56
@edgarrmondragon edgarrmondragon self-assigned this Sep 10, 2024
@edgarrmondragon edgarrmondragon added this to the v0.41.0 milestone Sep 16, 2024
@edgarrmondragon edgarrmondragon marked this pull request as ready for review September 23, 2024 15:01
@edgarrmondragon edgarrmondragon requested a review from a team as a code owner September 23, 2024 15:01
@edgarrmondragon edgarrmondragon changed the title feat: Added a generic FileStream refactor: Added a generic FileStream Sep 23, 2024
@edgarrmondragon edgarrmondragon changed the title refactor: Added a generic FileStream refactor: Added a generic FileStream (still in active development!) Sep 23, 2024
Copy link
Collaborator Author

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready for review. Everything here is subject to change: naming conventions, implementation, abstractions so feel free to comment on those.

pyproject.toml Outdated Show resolved Hide resolved
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest reviewing this module in split view. All that's left after the refactor are the get_schema and read_file implementations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest reviewing this module in split view.

CSV-specific settings were added and custom discovery was removed in favor of the default implementation.

Comment on lines +68 to +71
@property
def partitions(self) -> list[dict[str, t.Any]]:
"""Return the list of partitions for this stream."""
return self._partitions
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using partitions allows us to track state for each individual file in merge mode.

@edgarrmondragon
Copy link
Collaborator Author

These are working now

FTP

{
    "filesystem": "ftp",
    "path": "fixtures/csv",
    "read_mode": "one_stream_per_file",
    "delimiter": "\t",
    "ftp": {
        "host": "127.0.0.1",
        "port": 21,
        "username": "my_ftp_user",
        "password": "my_ftp_password"
    }
}

SFTP

{
    "filesystem": "sftp",
    "path": "fixtures/csv",
    "read_mode": "one_stream_per_file",
    "delimiter": "\t",
    "sftp": {
        "host": "127.0.0.1",
        "port": 2022,
        "username": "my_ftp_user",
        "password": "my_ftp_password"
    }
}

@edgarrmondragon edgarrmondragon force-pushed the 2648-feat-add-a-generic-filestream-interface branch from 886444a to e527cee Compare October 2, 2024 20:08
@edgarrmondragon edgarrmondragon merged commit 1442536 into main Oct 2, 2024
35 checks passed
@edgarrmondragon edgarrmondragon deleted the 2648-feat-add-a-generic-filestream-interface branch October 2, 2024 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Discussed
Development

Successfully merging this pull request may close these issues.

feat: Add a generic FileStream interface
1 participant