Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing the p-value of each segment #184

Open
newtonharry opened this issue Sep 26, 2024 · 1 comment
Open

Accessing the p-value of each segment #184

newtonharry opened this issue Sep 26, 2024 · 1 comment

Comments

@newtonharry
Copy link

newtonharry commented Sep 26, 2024

Hi,

I've noticed you can get the p-value for the result, but not for each segment, which is only seen when printing the result. Have I missed something in the code?

                                          best_split max_gain p_value
(0, 2000]                                        475    36.18   0.005
 ¦--(0, 475]                                     252  103.484   0.005
 ¦   ¦--(0, 252]                                  21    4.228    0.05
 ¦   °--(252, 475]                               427   11.796   0.005
 ¦       ¦--(252, 427]                           320   46.795   0.005
 ¦       ¦   ¦--(252, 320]                       272  -17.767    0.82
 ¦       ¦   °--(320, 427]                       364  -18.147       1
 ¦       °--(427, 475]                           454   -29.02    0.99
 °--(475, 2000]                                 1228    17.11   0.005
     ¦--(475, 1228]                              785   23.767   0.005
     ¦   ¦--(475, 785]                           739    5.792   0.005
     ¦   ¦   ¦--(475, 739]                       527   45.532   0.005
     ¦   ¦   ¦   ¦--(475, 527]                   502  -17.065    0.27
     ¦   ¦   ¦   °--(527, 739]                   711  -10.619   0.025
     ¦   ¦   °--(739, 785]                       759   -12.84    0.05
     ¦   °--(785, 1228]                         1199    2.848    0.02
     °--(1228, 2000]                            1340  142.196   0.005
         ¦--(1228, 1340]                        1279    2.301   0.005
         ¦   ¦--(1228, 1279]                    1253   -1.059   0.005
         ¦   ¦   ¦--(1228, 1253]                                     
         ¦   ¦   °--(1253, 1279]                                     
         ¦   °--(1279, 1340]                    1317  -26.685    0.31
         °--(1340, 2000]                        1688   37.242   0.005
             ¦--(1340, 1688]                    1429   10.582   0.005
             ¦   ¦--(1340, 1429]                1384   -2.852   0.005
             ¦   ¦   ¦--(1340, 1384]            1360   -6.443    0.02
             ¦   ¦   °--(1384, 1429]            1404   -1.174   0.005
             ¦   ¦       ¦--(1384, 1404]                             
             ¦   ¦       °--(1404, 1429]                             
             ¦   °--(1429, 1688]                1494    8.805   0.005
             ¦       ¦--(1429, 1494]            1455  -17.805    0.26
             ¦       °--(1494, 1688]            1632    2.709    0.04
             °--(1688, 2000]                    1822   45.679   0.005
                 ¦--(1688, 1822]                1735   12.374   0.005
                 ¦   ¦--(1688, 1735]            1708  -13.261   0.975
                 ¦   °--(1735, 1822]            1771   13.194   0.005
                 ¦       ¦--(1735, 1771]                             
                 ¦       °--(1771, 1822]        1801  -18.571    0.62
                 °--(1822, 2000]                1872   12.889   0.005
                     ¦--(1822, 1872]            1844    3.874   0.005
                     ¦   ¦--(1822, 1844]                             
                     ¦   °--(1844, 1872]                             
                     °--(1872, 2000]            1898   -7.626     0.1

Thanks,

Harry

@mlondschien
Copy link
Owner

Hi Harry

Is this in Python or R?

In Python, you can access the p-value via the p_value attribute:

In [1]: import numpy as np
   ...: 
   ...: Sigma = np.full((5, 5), 0.7)
   ...: np.fill_diagonal(Sigma, 1)
   ...: 
   ...: rng = np.random.default_rng(12)
   ...: X = np.concatenate(
   ...:     (
   ...:         rng.normal(0, 1, (200, 5)),
   ...:         rng.multivariate_normal(np.zeros(5), Sigma, 200, method="cholesky"),
   ...:         rng.normal(0, 1, (200, 5)),
   ...:     ),
   ...:     axis=0,
   ...: )

In [2]: from changeforest import changeforest
   ...: 
   ...: result = changeforest(X, "random_forest", "bs")
   ...: result
Out[2]: 
                    best_split max_gain p_value
(0, 600]                   400   14.814   0.005
 ¦--(0, 400]               200   59.314   0.005
 ¦   ¦--(0, 200]             6    -1.95    0.67
 ¦   °--(200, 400]         393   -8.668    0.81
 °--(400, 600]             412   -9.047    0.66

In [3]: result.p_value
Out[3]: 0.005

In [4]: result.left.p_value
Out[4]: 0.005

In [5]: result.right.p_value
Out[5]: 0.66

In R, it's similar, but using $ to access attributes:

> library(MASS)

set.seed(0)
Sigma = matrix(0.7, nrow=5, ncol=5)
diag(Sigma) = 1
mu = rep(0, 5)
X = rbind(
    mvrnorm(n=200, mu=mu, Sigma=diag(5)),
    mvrnorm(n=200, mu=mu, Sigma=Sigma),
    mvrnorm(n=200, mu=mu, Sigma=diag(5))
)
> library(changeforest)
Warnmeldung:
Paketchangeforestwurde unter R Version 4.2.3 erstellt 
> result = changeforest(X, "random_forest", "bs")
> result
                 name best_split  max_gain p_value is_significant
1 (0, 600]                   410  13.49775   0.005           TRUE
2  ¦--(0, 410]               199  61.47201   0.005           TRUE
3  ¦    ¦--(0, 199]          192 -22.47364   0.955          FALSE
4  ¦    °--(199, 410]        396  11.50559   0.190          FALSE
5  °--(410, 600]
> result$p_value
[1] 0.005
> result$left$p_value
[1] 0.005
> result$right$p_value
[1] 0.965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants