Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for polars dataframes and series #7463

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

fonnesbeck
Copy link
Member

@fonnesbeck fonnesbeck commented Aug 15, 2024

Description

Mostly superficial changes to recognize polars data structures.

Related Issue

Checklist

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc--7463.org.readthedocs.build/en/7463/

pymc/data.py Outdated Show resolved Hide resolved
pymc/pytensorf.py Outdated Show resolved Hide resolved
@ricardoV94
Copy link
Member

ricardoV94 commented Aug 15, 2024

polars should be an optional dependency. For the dispatch it can be done with a try except import

@fonnesbeck fonnesbeck changed the title Added support for polars dataframes and series Add support for polars dataframes and series Aug 16, 2024
Comment on lines +160 to +171
if pl is not None:
@_as_tensor_variable.register(pd.Series)
@_as_tensor_variable.register(pd.DataFrame)
@_as_tensor_variable.register(pl.DataFrame)
@_as_tensor_variable.register(pl.Series)
def dataframe_to_tensor_variable(df: pd.DataFrame | pl.DataFrame, *args, **kwargs) -> TensorVariable:
return pt.as_tensor_variable(df.to_numpy(), *args, **kwargs)
else:
@_as_tensor_variable.register(pd.Series)
@_as_tensor_variable.register(pd.DataFrame)
def dataframe_to_tensor_variable(df: pd.DataFrame, *args, **kwargs) -> TensorVariable:
return pt.as_tensor_variable(df.to_numpy(), *args, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more succinct. Also type hint of df was wrong, so I just removed it.

Suggested change
if pl is not None:
@_as_tensor_variable.register(pd.Series)
@_as_tensor_variable.register(pd.DataFrame)
@_as_tensor_variable.register(pl.DataFrame)
@_as_tensor_variable.register(pl.Series)
def dataframe_to_tensor_variable(df: pd.DataFrame | pl.DataFrame, *args, **kwargs) -> TensorVariable:
return pt.as_tensor_variable(df.to_numpy(), *args, **kwargs)
else:
@_as_tensor_variable.register(pd.Series)
@_as_tensor_variable.register(pd.DataFrame)
def dataframe_to_tensor_variable(df: pd.DataFrame, *args, **kwargs) -> TensorVariable:
return pt.as_tensor_variable(df.to_numpy(), *args, **kwargs)
@_as_tensor_variable.register(pd.Series)
@_as_tensor_variable.register(pd.DataFrame)
def dataframe_to_tensor_variable(df, *args, **kwargs) -> TensorVariable:
return pt.as_tensor_variable(df.to_numpy(), *args, **kwargs)
if pl is not None:
@_as_tensor_variable.register(pl.DataFrame)
@_as_tensor_variable.register(pl.Series)
def polars_dataframe_to_tensor_variable(df, *args, **kwargs) -> TensorVariable:
return pt.as_tensor_variable(df.to_numpy(), *args, **kwargs)

@@ -111,6 +115,18 @@ def convert_data(data) -> np.ndarray | Variable:
ret = np.ma.MaskedArray(vals, mask)
else:
ret = vals
elif hasattr(data, "to_numpy") and hasattr(data, "is_null"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif hasattr(data, "to_numpy") and hasattr(data, "is_null"):
elif hasattr(data, "to_numpy") and hasattr(data, "is_null"):
# Probably polars object

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a bit more explicit:

Suggested change
elif hasattr(data, "to_numpy") and hasattr(data, "is_null"):
elif pl is not None and isinstance(data, (pl.DataFrame, pl.Series)):

The polars namespace is used anyway (in the except clause).

Copy link

codecov bot commented Aug 16, 2024

Codecov Report

Attention: Patch coverage is 42.30769% with 15 lines in your changes missing coverage. Please review.

Project coverage is 92.10%. Comparing base (8cdc9ee) to head (f304035).
Report is 91 commits behind head on main.

Files with missing lines Patch % Lines
pymc/pytensorf.py 40.00% 15 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7463      +/-   ##
==========================================
- Coverage   92.17%   92.10%   -0.08%     
==========================================
  Files         103      103              
  Lines       17258    17279      +21     
==========================================
+ Hits        15908    15914       +6     
- Misses       1350     1365      +15     
Files with missing lines Coverage Δ
pymc/data.py 89.44% <100.00%> (ø)
pymc/pytensorf.py 87.50% <40.00%> (-3.02%) ⬇️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Replace pandas dependence/use with narwhals
3 participants