-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support f_ratio? #162
Comments
Please read my conversation with Rie about it in last year.
|
Thanks! Depending on the problem, |
I think you should talk with Rie about it. P.S. I'm in progress of adding her to |
Bringing @riejohnson here. |
Hi. f_ratio didn't make it into the official interface of rgf because I thought it wasn't useful -- not consistently, at least. And so basically, it's untested. It's possible that it works and it's useful in some cases, but looking at the code, there is a potential problem when compiled with Visual C++ and when the number of features is very large. The thing is, "rand()" is used for picking the features, and so there will be a problem if the number of features is larger than RAND_MAX of the compiler -- only the first (RAND_MAX-1) features would be picked in that case. No problem with gnu g++ as its RAND_MAX is very large, but it's small (RAND_MAX=32765) with Visual C++. I'm not sure how likely it is to have more than 32765 features, though. |
@riejohnson Thank you for joining out team! Since it is a parameter adopted by major decision Forest libraries, And we had better to use |
Hi. I guess if f_ratio works as it is, it wouldn't hurt to promote it into the official interface with a clear note of the limitation that the number of features must be no greater than RAND_MAX. If f_ratio becomes official, random_seed should become official too. std::rand() does exactly the same as rand(), at least on Visual Studio and gnu, and so it doesn't seem to me worth changing, does it? |
Yes. Properly, we need |
std::mt19937 in C++11 instead of std::rand()? I see. If you go for it, please don't forget to change srand in AzRgForest.cpp too. |
@fukatani Any news? |
I found not documented parameter
f_ratio
in RGF.This corresponding to LightGBM
feature_fraction
and XGBcolsample_bytree
.I tried these parameter with boston regression example.
In small
max_leaf
(300),f_ratio=0.9
improves score to 11.0 from 11.8,but in many
max_leaf
(5000),f_ratio=0.95
degrared score to 10.34 from 10.19810.After all, is there no value to use
f_ratio
< 1.0?The text was updated successfully, but these errors were encountered: