-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p-value is always less than 0.5 #73
Comments
Hi @DirceuVio , I'm not quite sure how you performed the t-test but either way the result you observed is indeed correct. This question of the behavior of the p-value has been asked before and is something that troubles end users but as far as correctness of the system goes everything is running as expected. After reading the explanation in the issue just mentioned I realized the explanation I gave was quite superficial, I probably should improve on that. The reason why the upper limit for p-value being 0.5 is that it's computed by taking the minimum of all points that either surpass or not the summation of the observed In other words, we try to decided whether there's impact or not by simulating lots of Then we take the summation of all those simulated time series over the time dimension and are left with the total value representing each simulation. After that the value is compared with the summation of the observed response The procedure here is to take the minimum between each (which express either positive or negative effect): signal = min(
np.sum(sim_sum > post_data_sum),
np.sum(sim_sum < post_data_sum)
) If all simulated data falls underneath That's why 0.5 is the highest possible value. It's really how this computation is performed in this library that sets this behavior altogether. This was taken from the original package and resembles a markovian sampling for finding the p-value, which also means the value will be capped at 0.5 by definition. As for deciding whether there's impact or not this shouldn't be a source of concern. The important values tend to be around the 0.05 area where there's the indication the Let me know if this helps in your question, Best, |
Hey, Willian. Thanks so much for publishing the package. I've been using it a lot.
Something that really bothers me is the fact that the model always gives us p-values less than 0.5, no matter what kind of time series we use.
See the code below where I use synthetic data. Using a t-test, we get a p-value next to 1.
Posterior Inference {Causal Impact}
Average Cumulative
Actual 3222.5 96675.0
Prediction (s.d.) 3206.98 (29.77) 96209.42 (893.14)
95% CI [3148.09, 3264.79] [94442.71, 97943.76]
Absolute effect (s.d.) 15.52 (29.77) 465.58 (893.14)
95% CI [-42.29, 74.41] [-1268.76, 2232.29]
Relative effect (s.d.) 0.48% (0.93%) 0.48% (0.93%)
95% CI [-1.32%, 2.32%] [-1.32%, 2.32%]
Posterior tail-area probability p: 0.37
Posterior prob. of a causal effect: 63.34%
For more details run the command: print(impact.summary('report'))
The text was updated successfully, but these errors were encountered: