-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: replace json with pickle for storing lgbm params #190
base: main
Are you sure you want to change the base?
Conversation
This pull request has not seen any recent activity. |
Thank you for your PR! Let me leave some comments:
For example, what about the following: serializable_lgbm_params = {}
for k, v in lgbm_params.items():
try:
json.dumps([v])
serializable_lgbm_params[k] = v
except TypeError:
# We store only the name of an unserializable object.
serializable_lgbm_params[k] = v.__name__ |
Thanks @nabenabe0928 for reviewing the PR. My first fix attempt was exactly what you suggested, but it didn't work. When optimizing, I think the LightGBMTuner restores the parametes of the best trial after finishing all the trials for a specific step. So when running the tuner, the first 7 trials that search for Here is the error trace:
|
@pmandiola |
@nabenabe0928 Could you review this PR? |
Sure, the code I tested is:
One alternative solution could be to just store the oprimized parameters from the current Trial instead of the full lgbm_params. I tried it just changing line 271 and it seems to work (the tuning is running correctly) but I'm not sure if something else could be broken:
|
@pmandiola |
This is what I did (skipping some previous details):
|
Verification Codefrom __future__ import annotations
import optuna.integration.lightgbm as lgb
from lightgbm import early_stopping
from lightgbm import log_evaluation
import numpy as np
import sklearn.datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import KFold
def custom_binary_objective(
y_true: np.ndarray, y_pred: lgb.Dataset
) -> tuple[np.ndarray, np.ndarray]:
preds = y_pred.get_label()
ps = 1.0 / (1.0 + np.exp(-preds))
res = y_true - ps
grad = -res / (ps * (1 - ps))
hess = -ps * (1 - ps) * (1 - 2 * y_true) / ((ps * (1 - ps)) ** 2)
return grad, hess
def custom_accuracy(
y_true: np.ndarray, y_pred: lgb.Dataset
) -> tuple[str, float, bool]:
preds = y_pred.get_label()
ps = np.round(1.0 / (1.0 + np.exp(-preds)))
return "custom_accuracy", accuracy_score(y_true, ps), True
if __name__ == "__main__":
data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
dtrain = lgb.Dataset(data, label=target)
params = {
"objective": custom_binary_objective,
"metric": "custom_accuracy",
"verbosity": -1,
"boosting_type": "gbdt",
}
tuner = lgb.LightGBMTunerCV(
params,
dtrain,
callbacks=[early_stopping(10), log_evaluation(10)],
feval=custom_accuracy,
)
tuner.run() |
Another approach for the bug fix: We need to check whether this change becomes a breaking change or not. |
@pmandiola |
Sure, happy to help! |
This pull request has not seen any recent activity. |
Motivation
Fixes #188, allowing the use of custom objective functions
Description of the changes
Replaces json.dumps and json.loads with pickle to store and retrieve the trials' lightgbm_params dictionary