-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consider: option to include NULLs in Array.collect()? #9703
Comments
Hey @NickCrews -- were you able to work around the null exclusion issues? |
@gforsyth only by dropping down to |
We standardized on ignoring nulls in |
That will not break in the future -- we'll probably go ahead and add an |
Adding it for
My slight preference is for |
+1 on I think a pattern where default |
We do have some non-pluralization happening, like |
|
I was, but I feel differently about plurals in keyword-arguments. I am trying to square that linguistically |
For consistency with |
What happened?
In #9313 we unified behavior across backends and made it so Array.collect() excluded NULLs
This behavior change broke [this util function of mine](My function that relies on this property is here.
This was due to my reliance on the previously-true-on-duckdb property that
unnest()
andcollect()
were inverses (minus the fact thatunnest([]) -> no row
andunnest(NULL) -> no row
, so in order forx.unnest().collect()
to get back to x I needed a little post-process massaging). I hadn't realized that my implementation there wasn't fully portable to other backends, this issue is informative, thank you!Do you see another way to accomplish what I'm trying to do there that can sidestep this behavior change? Or could we consider adding a
exclude_nulls=True
option?Regardless, would you like a PR that documents the intended behavior in .collect()s docstring? It was difficult for me to find this issue, I thought this behavior might have just been accidental.
What version of ibis are you using?
main
What backend(s) are you using, if any?
duckdb, though I ideally want my code as linked above to be portable to pyspark, and pyspark always exludes NULLs from list_agg(), so an entierly new implementation might be the only way to really solve this...
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: