Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] conditional_join can return ragged_arrays where possible - PR no. 2 #1397

Merged
merged 17 commits into from
Sep 17, 2024

Conversation

samukweku
Copy link
Collaborator

@samukweku samukweku commented Sep 7, 2024

This is the second part of a series of PRs that ultimately adds support for aggregations within conditional_join. where possible, ragged_arrays can be returned to the user, either as slices or arrays of indices, which can be used in akimbo, or awkward or pyarrow to aggregate the data. this should be faster than materializing the entire dataframe within pandas before aggregating.

PR Description

Please describe the changes proposed in the pull request:

  • return ragged_arrays where possible with the get_join_indices function
  • limited to single join conditions or a range join, where the columns on the right dataframe are both monotonically increasing

This PR relates to #1269 and #1396 .

Please tag maintainers to review.

@samukweku samukweku self-assigned this Sep 7, 2024
@ericmjl
Copy link
Member

ericmjl commented Sep 7, 2024

Copy link

codecov bot commented Sep 7, 2024

Codecov Report

Attention: Patch coverage is 82.78689% with 21 lines in your changes missing coverage. Please review.

Project coverage is 89.23%. Comparing base (6e77fbc) to head (ef4f9f3).
Report is 11 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1397      +/-   ##
==========================================
+ Coverage   89.07%   89.23%   +0.15%     
==========================================
  Files          87       87              
  Lines        5374     5534     +160     
==========================================
+ Hits         4787     4938     +151     
- Misses        587      596       +9     

@samukweku samukweku marked this pull request as ready for review September 14, 2024 22:15
Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving, @samukweku! Just one question re: documentation.

@ericmjl
Copy link
Member

ericmjl commented Sep 15, 2024

I also noticed that the project test coverage has significantly gone down to <90%. I might need to dig further, but this is definitely a sign that we may need to invest more in covering edge cases throughout the repo.

@samukweku samukweku merged commit 2fa5145 into dev Sep 17, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants