-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add cols_label_with()
method
#626
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #626 +/- ##
==========================================
+ Coverage 90.85% 91.24% +0.39%
==========================================
Files 46 47 +1
Lines 5423 5484 +61
==========================================
+ Hits 4927 5004 +77
+ Misses 496 480 -16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Whoops! I'm unsure whether we should call |
Alternatively, we could extract this code block into a separate function, allowing both great-tables/great_tables/_boxhead.py Lines 134 to 152 in dc82197
|
Thanks for considering the units syntax codepath (I missed that too)! For incorporating that functionality, I do think your last comment (create a units handler as a shared function) is the way right to go. |
As for the docs-building CI error, we solved that in a recent commit by removing the pip install of the pointblank package in the CI workflow file. Just merge with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR -- this function seems really helpful! WDYT of making columns the first argument? I detail a bit of the rationale in a comment.
Here's a detail that's not critical to this PR
One thing I learned just now going through the R library, is that there cols_label
has a .fn=
parameter for doing these kinds of transformations. However, it sounds like cols_label_with()
is a helpful convenience function, because this is something people often want to do.
Essentially, the breakdown seems like:
cols_label()
often involves directly relabeling columns (e.g. a = "b").cols_label_with()
only allows transformation via a function.
However, because we haven't added cols_label(fn=...)
, there's an opportunity in the future to define it. One wild idea, would be supporting Polars selectors somehow. E.g.
import polars as pl
import polars.selectors as cs
# imagine that selectors are run over a special DataFrame w/ 1 row
df_names = pl.DataFrame({"my_col": ["my_col"], "my_col2": ["my_col2"]})
# the selected column names are the columns to relabel, the value in the single
# row is the new name. There might be a more natural way to do this with
# polars, but wanted to float it as a broad idea!
df_names.select(cs.starts_with("my").str.to_uppercase())
great_tables/_boxhead.py
Outdated
return self._replace(_boxhead=boxhead) | ||
|
||
|
||
def cols_label_with(self: GTSelf, fn: Callable[[str], str], columns: SelectExpr = None) -> GTSelf: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From pairing w/ Rich, @jrycw WDYT of switching the order of the parameters, so columns
is before fn
?
Looking through the cols_*()
methods in the R library, column selection usually comes first. In the case of cols_align()
where it doesn't, it sounds like it's a historical artifact. Keeping column first might help cement a pattern for cols_*()
methods.
It makes sense you put it first in the PR, since it doesn't have a default. WDYT of us setting it to fn = None
, and then erroring if fn is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@machow, nice suggestion! I definitely like your idea of introducing fn=
to cols_label()
using Polars syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to @machow for the comment! I think this is a good opportunity to introduce fn=
to accept a callable in cols_label()
as well.
By the way, I noticed that we have two test files for boxhead, test_boxhead.py
and test__boxhead.py
. I picked one to add the new test, but perhaps we should consider consolidating them in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note, we might consider using the pl.Expr.name attribute to handle Polars syntax. For example:
df_names.select(cs.starts_with("my").name.to_uppercase())
shape: (1, 2)
┌────────┬─────────┐
│ MY_COL ┆ MY_COL2 │
│ --- ┆ --- │
│ str ┆ str │
╞════════╪═════════╡
│ my_col ┆ my_col2 │
└────────┴─────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I've attempted to implement this in the latest commit. I admit it's a bit of a bold move, so feel free to set it aside for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When passing a list of Polars expressions, if a column is referenced multiple times, we can either superimpose the transformations or follow a "last one wins" approach (current implementation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for experimenting with this. If it's okay with you, I think we need a little bit more time to chew on this approach / run it past some folks, since it's a bit cutting edge in terms of external libraries using Polars selectors. I definitely think it'll be useful to figure out in the long run though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for working my feedback in. I left comments asking if we could remove fn
from .cols_label()
and remove unit string processing from .cols_label_with()
.
I'm still getting a feel for some dynamics in these functions, so thanks for being so patient on this PR 😅
great_tables/_boxhead.py
Outdated
self: GTSelf, cases: dict[str, str | BaseText] | None = None, **kwargs: str | BaseText | ||
self: GTSelf, | ||
cases: dict[str, str | BaseText] | None = None, | ||
fn: Callable[[str | BaseText], str | BaseText] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this and running with it. I'm so sorry to send you down a rough path, but maybe we take fn
out of here for now?
We're realizing that the combination of fn=
and the ability to wrap inputs with BaseText containers is a bit tricky. Since we'd have to explain and figure out how to apply fn
to things like md("**some text**")
.
This seems resolvable, but I would punt on for the sake of getting cols_label_with()
in. There's some cleanup across Great Tables we can likely do with BaseText and UnitStr/definition list, which I'm guessing will make situations like this easier to handle.
great_tables/_boxhead.py
Outdated
new_cases = {col: fn(col) for col in sel_cols} | ||
|
||
# Handle units syntax in labels (e.g., "Density ({{ppl / mi^2}})") | ||
new_kwargs = _handle_units_syntax(new_cases) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the R package doesn't automatically apply unit syntax here. My understanding is that this makes .cols_label_with()
a very plain custom function applier. WDYT of not handling unit syntax here?
We could tell people they can make it happen with define_units()
etc.. so hopefully they'd be able to make it happen if they need it
I'm still getting a feel for some of these dynamics, so could be totally off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoot, I can see how this is confusing. I'll take a look at the linked PR, and think about what might put units more front and center. I think it's okay for now to just omit any mention of units in this docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrycw there are a lot of really interesting changes here, but I think this PR has a bit more in it than I can review.
When I first reviewed, I'd have been onboard with merging the PR after just the columns and fn arguments were switched (so columns was first). If you could revert to that, and make that change (plus having .cols_label_with()
not do any unit syntax handling, I'm happy to merge.
(The extra expression stuff is exciting, and seems really useful down the road, but it's a lot to review at once. I do think there's a lot of interesting Polars logic in the PR right now for handling expressions. If you want to stash it somewhere, it could also go into an issue?)
@machow, thanks for the feedback. I'm not sure which commit you're referring to, but feel free to pick any relevant one for the revert. Could you help proceed with the revert process? Meanwhile, I've created issue #648 to track the Polars expression implementation. |
Hello team,
This PR ports the cols_label_with() function from
{gt}
to Great Tables.