-
Notifications
You must be signed in to change notification settings - Fork 72
Swap out patsy
for formulae
#463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Cool. Thanks @ksolarski, just a quick reply from my phone... Don't do this for the synthetic control because I have an in progress PR that will change it. It won't have a formula input. But can I just get some clarification... does this change the API? Can we get the exact same functionality? If not, let's think again. Will try to look at the code properly when I can 👍🏻 |
I can't find where I saw it in the I'm not 100% sure that this is a problem, and apologies I can't find the relevant part in the docs. But does my concern make sense? |
You're right, Patsy has the power of preserving the transformation / encoding of variables through However, Patsy repo suggests migration to https://github.com/matthewwardrop/formulaic instead, which is capable of "reusing the encoding choices made during conversion of one data-set on other datasets." (see https://matthewwardrop.github.io/formulaic/latest/). There's also a migration guide from Patsy to Formulaic to switch would be easy. It also supports many operators: https://matthewwardrop.github.io/formulaic/latest/guides/grammar/ Did you check out this library before? What do you think about using this instead of formulae? |
@drbenvincent any strong opinions about using |
Sorry for the delayed response @ksolarski. So as far as I understand, Right now there are no use-cases for hierarchical modelling. That might change in the future, though I don't have any specific use cases in mind. So I guess the only choice at the moment is |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #463 +/- ##
==========================================
- Coverage 94.66% 94.66% -0.01%
==========================================
Files 32 32
Lines 2195 2194 -1
==========================================
- Hits 2078 2077 -1
Misses 117 117 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@drbenvincent Yes, from the docs it seems that no hierarchical models are allowed in the import pandas as pd
from formulaic import model_matrix
import formulaic
# Create a training dataset
train_data = pd.DataFrame(
{
"feature1": ["A", "B", "C", "D"],
"target": [0, 1, 0, 1],
}
)
# Create a test dataset
test_data = pd.DataFrame(
{
"feature1": [
"A", # In training
"D", # In training
"E", # Not in training
],
"target": [0, 1, 0],
}
)
# Generate the model matrix for the training data
train_matrix = model_matrix("target ~ 0 + feature1", train_data)
# Print the training matrix and spec
print("Training Matrix:")
print(train_matrix)
# Use the same spec to transform the test data
test_matrix = model_matrix(spec=train_matrix.model_spec, data=test_data)
# Print the test matrix - see that columns are properly aligned from the training data transformation
print("\nTest Matrix:")
print(test_matrix) Is that the problem you had in mind or something else? |
Solving issue #386
Starting with DiD, will continue with other methods if you with general design @drbenvincent
Seems like the key practical difference between
formulae
andpatsy
is lack ofbuild_design_matrices
method informulae
. User has to then provide formula again.📚 Documentation preview 📚: https://causalpy--463.org.readthedocs.build/en/463/