Sebastian Raschka
2016-04-19 22:25:57 UTC
Hi, Saddy,
the initial implementation did something like that, however, as far as I can remember, the “majority vote” was in favor or the “tuples” (we discussed it somewhere in the pull request I think: https://github.com/scikit-learn/scikit-learn/pull/4161). The before-scikit-learn implementation still uses the regular list of classifier with a naming scheme similar to your suggestion (http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/).
Best,
Sebastian
estimators=[('clf%i' % i, clf) for i, clf in
enumerate(estimators)] # [('clf0', clf0), ('clf1', clf1)]
Kind regards,
Saddy
Hi, Saddy,
thanks for sharing your ideas, I appreciate it. Let’s use the scikit-learn mailing list for scikit-learn related discussions in future, though.
https://github.com/scikit-learn/scikit-learn/issues/5820
estimators=[('lr', clf1), ('rf', clf2), …]
estimators=[['lr', clf1], ['rf', clf2], …]
etc.
Best,
Sebastian
the initial implementation did something like that, however, as far as I can remember, the “majority vote” was in favor or the “tuples” (we discussed it somewhere in the pull request I think: https://github.com/scikit-learn/scikit-learn/pull/4161). The before-scikit-learn implementation still uses the regular list of classifier with a naming scheme similar to your suggestion (http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/).
Best,
Sebastian
* If mistakenly not given estimators tuples, it throws some unintuitive
What did you provide as input? So, you are suggesting to add an additional “if instance” check to throw a more meaningful message? I think that the input doesn’t necessary have to be a list of tuples but just some sort of nested array, e.g.,
estimators=[('lr', clf1), ('rf', clf2), …]
estimators=[['lr', clf1], ['rf', clf2], …]
etc.
I simply put the estimators into the list: estimators=[clf0, clf1, ...].What did you provide as input? So, you are suggesting to add an additional “if instance” check to throw a more meaningful message? I think that the input doesn’t necessary have to be a list of tuples but just some sort of nested array, e.g.,
estimators=[('lr', clf1), ('rf', clf2), …]
estimators=[['lr', clf1], ['rf', clf2], …]
etc.
estimators=[('clf%i' % i, clf) for i, clf in
enumerate(estimators)] # [('clf0', clf0), ('clf1', clf1)]
Kind regards,
Saddy
Hi, Saddy,
thanks for sharing your ideas, I appreciate it. Let’s use the scikit-learn mailing list for scikit-learn related discussions in future, though.
Why? Can't we derive probabilities also when hard-voting? I would give
True a chance of 0.8 if 80% of the voters predicted True for this label.
Hm, yeah, I think that could work, however, it would have to be described carefully to avoid confusion, i.e., as normalized label frequency or so.True a chance of 0.8 if 80% of the voters predicted True for this label.
* n_jobs would be nice
I agree, there’s already a pull request for that; hopefully, it gets polished up and merged soon :).https://github.com/scikit-learn/scikit-learn/issues/5820
* If mistakenly not given estimators tuples, it throws some unintuitive
What did you provide as input? So, you are suggesting to add an additional “if instance” check to throw a more meaningful message? I think that the input doesn’t necessary have to be a list of tuples but just some sort of nested array, e.g.,estimators=[('lr', clf1), ('rf', clf2), …]
estimators=[['lr', clf1], ['rf', clf2], …]
etc.
Best,
Sebastian