-
Notifications
You must be signed in to change notification settings - Fork 31
Improvements to do before migrating to PyWhy
The purpose of this page is to combine a definitive list of changes we want to make to this repo before moving to host it in the PyWhy organization. So far this is Egor's attempt at summarizing the discussions with Amit, Emre and Peter over email and Discord.
Amend as per the discussion with Anastasia
"Auto-causality" may overclaim It may give the impression that we can go directly from data to a causal estimate, just like auto-ml does
I like the acronym CATS, but whyCATS sounds as if we are questioning why CATS should be done. Are you open to an alternative prefix? E.g, how about "doCATS" ? In my mind, it asks people to do CATS, has a nice overlap with do-calculus, and also fits nicely with two other libraries in pywhy(dowhy and dodiscover)
Eleanor also suggested CausalTune as another option for you to consider as a library name
Each method comes with its own assumptions So our suggestion is to make the assumptions and limitations of each method explicit.
- this can be done by adding an assumption/limitations section in the docstring for each method.
- In addition, it may be good to specify "when this method is applicable", either in the docstring, or perhaps more useful as a part of a user guide METHODS It will be good to add docstrings for all user-facing classes/functions.
You can refer to DoWhy docs for examples, explaining the method's details (summary on how it works) and arguments expected for each method.
Within the summary, here are some assumptions to add.
-
ERUPT
- Assumes that an evaluating dataset/policy is available that is either random or the propensity to treat is known.
- If the propensity to treat is unknown, quality of evaluation depends on the accuracy of the propensity score model.
- If the propensity score model outputs really low values, then the metric may have high variance
- [Not for docs, a general comment here] if propensity weights are really high, we may need to consider clipping the propensities to admit a biased, but low variance estimate. Maybe for now, it is best to say that "ERUPT works best for datasets where propensity scores are not extreme" or something like that, with an appropriate defn of extreme?
-
AUUC and qini score, also energy score?
- This line from your paper is perfect "it should be noted that they return a biased estimate unless the treatment assignment in the dataset was fully-randomised"
-
R-learner : we probably want to exclude from this release anyway as it hasn't been maintained
- The comments in your paper are spot on, and will be good to include in the docstring. Specifically,
- depends on the models for treatment and outcome
- The comments in your paper are spot on, and will be good to include in the docstring. Specifically,
-
IV
- we need to assume that instrument variable is randomized
-
I also saw a shap module. Not sure how that is used.
With the motivation to avoid over-promising and clearly communicating the capabilities of the library, the readme and other docs may also specify how to interpret the results of an auto-causality analysis. E.g., we can presumably make stronger conclusions with randomized experiment data versus observational data. For observational data, what are the assumptions that still need to be tested?
- 1 and 2 are great. For 3) Observational causal inference, the advanced causal inference models allow impact estimation as a function of customer features, rather than just averages==>(consider adding), "under the assumption that all relevant confounders are observed (no unobserved confounding)."
- for 4, good to mention that feature availability is assumed to be randomized. "Here feature availablity is assumed to be randomized, which lets us use it as an instrumental variable."
- Somewhere in the readme, it will be good to mention something like this: "Just like DoWhy and EconML, we assume that the causal graph provided by the user accurately describes the data-generating process (e.g., for CATE estimation, the list of backdoor variables under the graph/confounding variables provided by the user do reflect all sources of confounding between the treatment and the outcome). The validation methods in auto-causality cannot catch such violations and therefore this is an important assumption.
- open governance structure, allowing PRs from others, open to new methods - Question to Emre, Amit, Peter: Could you possibly spell out what "open governance structure" means and implies?
- commitment to inter-operability with the py-why/dowhy api which is WIP and is being changed. This is to ensure that users get a seamless experience using any of the pywhy tools.
As next steps, I propose that we start working on these points together. If you agree with these comments, you may start updating the current auto-causality repo, especially the docs. After we've reached a consensus on these points, I suggest that we can do another py-why public meeting where the updated library is presented, during which we can also discuss the specifics of the actual move to py-why (if everyone agrees). Of course, this is contingent on consensus on the above points.
Some of these already exist but could use a refresh, others need to be added/adapted from internal versions
- Randomized assignment CATE notebook, include Shapley values
- Notebook on ERUPT to estimate avg outcome given random assignment
- Notebook with confidence intervals generation, fit limited set of estimators
- Notebook on propensity function choice/observational CATE
- Notebook on IV models
- (Notebook on analyzing A/B tests specifically, with wise-pizza)
- It will be nice to have a consistent story across refutations in dowhy and auto-causality.
- For users, dowhy could point to auto-causality if the goal is hyperparameter tuning.
- Similarly, auto-causality could point to refutations in case a user wants to verify a graphical assumption.
- Ideally, over time, there is a common API so a user can call either an auto-causality validation or a dowhy refutation seamlessly.