[Scikit-learn-general] VotingClassifier

Andreas Mueller

2016-04-21 15:18:43 UTC

We could add a "make_voting_classifier" function. Which would at least
be consistent.

Post by Sebastian Raschka
Hi, Saddy,
the initial implementation did something like that, however, as far as I can remember, the “majority vote” was in favor or the “tuples” (we discussed it somewhere in the pull request I think: https://github.com/scikit-learn/scikit-learn/pull/4161). The before-scikit-learn implementation still uses the regular list of classifier with a naming scheme similar to your suggestion (http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/).
Best,
Sebastian

* If mistakenly not given estimators tuples, it throws some unintuitive
What did you provide as input? So, you are suggesting to add an additional “if instance” check to throw a more meaningful message? I think that the input doesn’t necessary have to be a list of tuples but just some sort of nested array, e.g.,
estimators=[('lr', clf1), ('rf', clf2), …]
estimators=[['lr', clf1], ['rf', clf2], …]
etc.

I simply put the estimators into the list: estimators=[clf0, clf1, ...].
estimators=[('clf%i' % i, clf) for i, clf in
enumerate(estimators)] # [('clf0', clf0), ('clf1', clf1)]
Kind regards,
Saddy
Hi, Saddy,
thanks for sharing your ideas, I appreciate it. Let’s use the scikit-learn mailing list for scikit-learn related discussions in future, though.

Why? Can't we derive probabilities also when hard-voting? I would give
True a chance of 0.8 if 80% of the voters predicted True for this label.

Hm, yeah, I think that could work, however, it would have to be described carefully to avoid confusion, i.e., as normalized label frequency or so.

* n_jobs would be nice

I agree, there’s already a pull request for that; hopefully, it gets polished up and merged soon :).
https://github.com/scikit-learn/scikit-learn/issues/5820

* If mistakenly not given estimators tuples, it throws some unintuitive

What did you provide as input? So, you are suggesting to add an additional “if instance” check to throw a more meaningful message? I think that the input doesn’t necessary have to be a list of tuples but just some sort of nested array, e.g.,
estimators=[('lr', clf1), ('rf', clf2), …]
estimators=[['lr', clf1], ['rf', clf2], …]
etc.
Best,
Sebastian

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general