-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nan score for StackingClassifier due to 'scoring' argument in cross_val_score #1059
Comments
Thanks for the note! I can confirm, having this issue in sklearn 1.3.0 as well (but not in 1.2.2). I just submitted a PR via #1060 to fix that |
I came across this lecture by @rasbt. Based on his explanation StackingClassifier was included in sklearn. I adjusted the code to use the sklearn version of StackingClassifier: from sklearn import datasets
iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier, StackingClassifier
# from mlxtend.classifier import StackingClassifier
import numpy as np
import warnings
warnings.simplefilter('ignore')
clf1 = KNeighborsClassifier(n_neighbors=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
estimators = [("clf1", clf1),
("clf2", clf2),
("clf3", clf3)]
lr = LogisticRegression()
sclf = StackingClassifier(estimators=estimators,
final_estimator=lr)
print('3-fold cross validation:\n')
for clf, label in zip([clf1, clf2, clf3, sclf],
['KNN',
'Random Forest',
'Naive Bayes',
'StackingClassifier']):
scores = model_selection.cross_val_score(clf, X, y, cv=3, scoring="accuracy")
print("Accuracy: %0.2f (+/- %0.2f) [%s]"
% (scores.mean(), scores.std(), label)) Now I do get an output more in line with what I expect, though not exactly same as in the mlxtend StackingClassifier documentation (Example 1): 3-fold cross validation:
Accuracy: 0.91 (+/- 0.01) [KNN]
Accuracy: 0.95 (+/- 0.01) [Random Forest]
Accuracy: 0.91 (+/- 0.02) [Naive Bayes]
Accuracy: 0.93 (+/- 0.02) [StackingClassifier] Perhaps sklearn's StackingClassifier implementation is different from mlxtend's. I am wondering whether we should still use mlxtend's StackingClassifier or whether it is deprecated and we should use sklearn's implementation instead? |
Thanks for the reply. I posted my second comment before I read your reply, apologies. |
Hi, I try to run the code below (Example 1 from the StackingClassifier documentation):
I get the following output:
The expected output is that the score for StackingClassifier should be a number like:
When I print the warning by commenting out
warnings.simplefilter('ignore')
, I get the output below (I truncated it, as the warning is repeated several times):The problem seems to be related to the
scoring
argument inscores = model_selection.cross_val_score(clf, X, y, cv=3, scoring='accuracy')
. If I remove that argument, then the default scoring is used (accuracy, I think), and then I get the expected output which is the same as in the example in the documentation:However I would like to be able to use other scoring metrics as well (e.g.
roc_auc
), but then I have to provide the argument explicitly and I get the nan score again for StackingClassifier.I already checked issues #423 and #426, which mention a similar warning/error (
AttributeError: 'StackingClassifier' object has no attribute 'classes_'
), but I couldn't figure it out based on those issues.I am using:
The text was updated successfully, but these errors were encountered: