Accessing the p-value of each segment #184

newtonharry · 2024-09-26T07:58:44Z

Hi,

I've noticed you can get the p-value for the result, but not for each segment, which is only seen when printing the result. Have I missed something in the code?

                                          best_split max_gain p_value
(0, 2000]                                        475    36.18   0.005
 ¦--(0, 475]                                     252  103.484   0.005
 ¦   ¦--(0, 252]                                  21    4.228    0.05
 ¦   °--(252, 475]                               427   11.796   0.005
 ¦       ¦--(252, 427]                           320   46.795   0.005
 ¦       ¦   ¦--(252, 320]                       272  -17.767    0.82
 ¦       ¦   °--(320, 427]                       364  -18.147       1
 ¦       °--(427, 475]                           454   -29.02    0.99
 °--(475, 2000]                                 1228    17.11   0.005
     ¦--(475, 1228]                              785   23.767   0.005
     ¦   ¦--(475, 785]                           739    5.792   0.005
     ¦   ¦   ¦--(475, 739]                       527   45.532   0.005
     ¦   ¦   ¦   ¦--(475, 527]                   502  -17.065    0.27
     ¦   ¦   ¦   °--(527, 739]                   711  -10.619   0.025
     ¦   ¦   °--(739, 785]                       759   -12.84    0.05
     ¦   °--(785, 1228]                         1199    2.848    0.02
     °--(1228, 2000]                            1340  142.196   0.005
         ¦--(1228, 1340]                        1279    2.301   0.005
         ¦   ¦--(1228, 1279]                    1253   -1.059   0.005
         ¦   ¦   ¦--(1228, 1253]                                     
         ¦   ¦   °--(1253, 1279]                                     
         ¦   °--(1279, 1340]                    1317  -26.685    0.31
         °--(1340, 2000]                        1688   37.242   0.005
             ¦--(1340, 1688]                    1429   10.582   0.005
             ¦   ¦--(1340, 1429]                1384   -2.852   0.005
             ¦   ¦   ¦--(1340, 1384]            1360   -6.443    0.02
             ¦   ¦   °--(1384, 1429]            1404   -1.174   0.005
             ¦   ¦       ¦--(1384, 1404]                             
             ¦   ¦       °--(1404, 1429]                             
             ¦   °--(1429, 1688]                1494    8.805   0.005
             ¦       ¦--(1429, 1494]            1455  -17.805    0.26
             ¦       °--(1494, 1688]            1632    2.709    0.04
             °--(1688, 2000]                    1822   45.679   0.005
                 ¦--(1688, 1822]                1735   12.374   0.005
                 ¦   ¦--(1688, 1735]            1708  -13.261   0.975
                 ¦   °--(1735, 1822]            1771   13.194   0.005
                 ¦       ¦--(1735, 1771]                             
                 ¦       °--(1771, 1822]        1801  -18.571    0.62
                 °--(1822, 2000]                1872   12.889   0.005
                     ¦--(1822, 1872]            1844    3.874   0.005
                     ¦   ¦--(1822, 1844]                             
                     ¦   °--(1844, 1872]                             
                     °--(1872, 2000]            1898   -7.626     0.1

Thanks,

Harry

The text was updated successfully, but these errors were encountered:

mlondschien · 2024-09-26T19:20:42Z

Hi Harry

Is this in Python or R?

In Python, you can access the p-value via the p_value attribute:

In [1]: import numpy as np
   ...: 
   ...: Sigma = np.full((5, 5), 0.7)
   ...: np.fill_diagonal(Sigma, 1)
   ...: 
   ...: rng = np.random.default_rng(12)
   ...: X = np.concatenate(
   ...:     (
   ...:         rng.normal(0, 1, (200, 5)),
   ...:         rng.multivariate_normal(np.zeros(5), Sigma, 200, method="cholesky"),
   ...:         rng.normal(0, 1, (200, 5)),
   ...:     ),
   ...:     axis=0,
   ...: )

In [2]: from changeforest import changeforest
   ...: 
   ...: result = changeforest(X, "random_forest", "bs")
   ...: result
Out[2]: 
                    best_split max_gain p_value
(0, 600]                   400   14.814   0.005
 ¦--(0, 400]               200   59.314   0.005
 ¦   ¦--(0, 200]             6    -1.95    0.67
 ¦   °--(200, 400]         393   -8.668    0.81
 °--(400, 600]             412   -9.047    0.66

In [3]: result.p_value
Out[3]: 0.005

In [4]: result.left.p_value
Out[4]: 0.005

In [5]: result.right.p_value
Out[5]: 0.66

In R, it's similar, but using $ to access attributes:

> library(MASS)

set.seed(0)
Sigma = matrix(0.7, nrow=5, ncol=5)
diag(Sigma) = 1
mu = rep(0, 5)
X = rbind(
    mvrnorm(n=200, mu=mu, Sigma=diag(5)),
    mvrnorm(n=200, mu=mu, Sigma=Sigma),
    mvrnorm(n=200, mu=mu, Sigma=diag(5))
)
> library(changeforest)
Warnmeldung:
Paket ‘changeforest’ wurde unter R Version 4.2.3 erstellt 
> result = changeforest(X, "random_forest", "bs")
> result
                 name best_split  max_gain p_value is_significant
1 (0, 600]                   410  13.49775   0.005           TRUE
2  ¦--(0, 410]               199  61.47201   0.005           TRUE
3  ¦    ¦--(0, 199]          192 -22.47364   0.955          FALSE
4  ¦    °--(199, 410]        396  11.50559   0.190          FALSE
5  °--(410, 600]
> result$p_value
[1] 0.005
> result$left$p_value
[1] 0.005
> result$right$p_value
[1] 0.965

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accessing the p-value of each segment #184

Accessing the p-value of each segment #184

newtonharry commented Sep 26, 2024 •

edited

Loading

mlondschien commented Sep 26, 2024

Accessing the p-value of each segment #184

Accessing the p-value of each segment #184

Comments

newtonharry commented Sep 26, 2024 • edited Loading

mlondschien commented Sep 26, 2024

newtonharry commented Sep 26, 2024 •

edited

Loading