Skip to content

ENH #61033: Add coalesce_keys option to DataFrame.join for preserving join keys #61678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rit4rosa
Copy link

Add coalesce_keys option to DataFrame.join for preserving join keys

This adds a coalesce_keys keyword to DataFrame.join to allow preservation
of both join key columns (id and id_right), instead of automatically
coalescing them into a single column.

This is especially useful in full outer joins, where retaining information
about unmatched keys from both sides is important.

Example:
df1.join(df2, on=id, coalesce_keys=False)

This will result in both id and id_right columns being preserved, rather
than merged into a single id.

Includes:

  • Modifications to join internals (core/reshape/merge.py)

  • A dedicated test file (test_merge_coalesce.py) covering:

    • Preservation of join keys when coalesce_keys=False
    • Comparison with default behavior (coalesce_keys=True)
    • Full outer joins with asymmetric key presence
  • All code checks passed.

This adds a coalesce_keys keyword to DataFrame.join to allow
preservation of both join key columns (id and id_right),
instead of automatically coalescing them into a single column.

This is especially useful in full outer joins, where retaining
information about unmatched keys from both sides is important.

Example:
    df1.join(df2, on=id, coalesce_keys=False)

This will result in both id and id_right columns being preserved,
rather than merged into a single id.

Includes:
- Modifications to join internals (core/reshape/merge.py)
- A dedicated test file (test_merge_coalesce.py) covering:
    - Preservation of join keys when coalesce_keys=False
    - Comparison with default behavior (coalesce_keys=True)
    - Full outer joins with asymmetric key presence

Co-authored-by: Maria Pereira <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant