Discussion:
error when using linear SVM with AdaBoost
(too old to reply)
Pagliari, Roberto
2014-09-27 01:06:56 UTC
Permalink
I'm trying to run AdaBoost with linear SVM and got this error:

TypeError: fit() got an unexpected keyword argument 'sample_weight'

The code looks like this:
clf = AdaBoostClassifier(svm.LinearSVC(), n_estimators=args.ada_estimators, algorithm='SAMME')
Mathieu Blondel
2014-09-27 02:51:43 UTC
Permalink
This is because LinearSVC doesn't support sample_weight.

I added a new issue for raising a more explicit error message:
https://github.com/scikit-learn/scikit-learn/issues/3711

BTW, a linear combination of linear models is a linear model itself. So you
can't learn a better model than a LinearSVC() with
AdaBoostClassifier(svm.LinearSVC())

M.
Post by Pagliari, Roberto
TypeError: fit() got an unexpected keyword argument 'sample_weight'
clf = AdaBoostClassifier(svm.LinearSVC(),
n_estimators=args.ada_estimators, algorithm='SAMME')
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andy
2014-09-27 06:22:49 UTC
Permalink
Post by Mathieu Blondel
This is because LinearSVC doesn't support sample_weight.
https://github.com/scikit-learn/scikit-learn/issues/3711
BTW, a linear combination of linear models is a linear model itself.
So you can't learn a better model than a LinearSVC() with
AdaBoostClassifier(svm.LinearSVC())
It is a linear combination of the "predict_probas" not the
"decision_functions", right?
So it is not a linear model any more (more like a neural network ;)
Mathieu Blondel
2014-09-27 09:33:59 UTC
Permalink
Since LinearSVC doesn't have predict_proba, one must use algorithm="SAMME",
the original AdaBoost which uses the output of "predict".
This is not exactly a linear combination because of the sign function but
still a linear SVM isn't really what I would use with Adaboost.
And it doesn't seem to improve upon a single linear SVM, see the link
below. I used SVC(kernel="linear") since it supports sample_weight.

Loading Image...

M.
Post by Andy
Post by Mathieu Blondel
This is because LinearSVC doesn't support sample_weight.
https://github.com/scikit-learn/scikit-learn/issues/3711
BTW, a linear combination of linear models is a linear model itself. So
you can't learn a better model than a LinearSVC() with
AdaBoostClassifier(svm.LinearSVC())
It is a linear combination of the "predict_probas" not the
"decision_functions", right?
So it is not a linear model any more (more like a neural network ;)
Olivier Grisel
2014-10-03 09:10:35 UTC
Permalink
Post by Mathieu Blondel
This is because LinearSVC doesn't support sample_weight.
https://github.com/scikit-learn/scikit-learn/issues/3711
BTW, a linear combination of linear models is a linear model itself. So you
can't learn a better model than a LinearSVC() with
AdaBoostClassifier(svm.LinearSVC())
While adaboosted linear SVM and vanilla linear SVM are both linear
models, they don't optimize the same loss: the loss of the boosted
model automatically puts more weights on samples that are harder to
classify (closer to the decision hyperplane, or on the wrong side of
the optimal hyperplane).

Therefore, adaboosted linear models might or might not be better than
non-boosted linear models. I think it depends on the amount of label
noise that might cause the boosted models to overfit some noisy
samples outliers.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Mathieu Blondel
2014-10-03 11:55:16 UTC
Permalink
If you want to use the exponential loss (the loss used by AdaBoost), you
can train a (single) linear model which minimizes it directly. The main
point I want to make is that a LinearSVC is not a good choice of weak
learner.

M.
Post by Olivier Grisel
Post by Mathieu Blondel
This is because LinearSVC doesn't support sample_weight.
https://github.com/scikit-learn/scikit-learn/issues/3711
BTW, a linear combination of linear models is a linear model itself. So
you
Post by Mathieu Blondel
can't learn a better model than a LinearSVC() with
AdaBoostClassifier(svm.LinearSVC())
While adaboosted linear SVM and vanilla linear SVM are both linear
models, they don't optimize the same loss: the loss of the boosted
model automatically puts more weights on samples that are harder to
classify (closer to the decision hyperplane, or on the wrong side of
the optimal hyperplane).
Therefore, adaboosted linear models might or might not be better than
non-boosted linear models. I think it depends on the amount of label
noise that might cause the boosted models to overfit some noisy
samples outliers.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Olivier Grisel
2014-10-03 12:28:50 UTC
Permalink
If you want to use the exponential loss (the loss used by AdaBoost), you can
train a (single) linear model which minimizes it directly. The main point I
want to make is that a LinearSVC is not a good choice of weak learner.
Alright.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Andy
2014-10-03 16:09:01 UTC
Permalink
Post by Olivier Grisel
Post by Mathieu Blondel
This is because LinearSVC doesn't support sample_weight.
https://github.com/scikit-learn/scikit-learn/issues/3711
BTW, a linear combination of linear models is a linear model itself. So you
can't learn a better model than a LinearSVC() with
AdaBoostClassifier(svm.LinearSVC())
While adaboosted linear SVM and vanilla linear SVM are both linear
models
I'm pretty sure that is wrong, unless you use the "decision_function"
and not "predict_proba" or "predict".
Mathieu said "predict" is used. Then it is still like a (very old
school) neural network with a thresholding layer,
and not like a linear model at all.
Mathieu Blondel
2014-10-04 14:12:46 UTC
Permalink
Post by Andy
I'm pretty sure that is wrong, unless you use the "decision_function"
and not "predict_proba" or "predict".
Mathieu said "predict" is used. Then it is still like a (very old
school) neural network with a thresholding layer,
and not like a linear model at all.
I don't think this is exactly like a neural network. In a neural network,
the non-linear activation functions are part of the objective function, so
they affect parameter estimation directly. Here, a linear SVC is first
fitted *then* its weight in the ensemble is estimated, given the
predictions fixed. Since np.sign (or predict_proba when available) is
applied post-hoc, it should affect neither the linear SVC model nor its
weight in the ensemble.

The main idea of AdaBoost is to increasingly focus on the difficult
examples. This suggests that weak learners should be diverse enough, i.e.,
they should disagree in their predictions on most examples. My intuition is
that a linear SVC doesn't fulfill this requirement. I would rather use a
weak learner (oracle) with high variance, low bias.

I would be curious to see how AdaBoost + LinearSVC fares on MNIST. Since
non-linear models outperform linear ones on this dataset, the results would
be a good indicator.

Mathieu
Pagliari, Roberto
2014-10-06 18:27:47 UTC
Permalink
Hi Matthieu,
Which dataset are you referring to?

Thanks


From: Mathieu Blondel [mailto:***@mblondel.org]
Sent: Saturday, October 04, 2014 10:13 AM
To: scikit-learn-general
Subject: Re: [Scikit-learn-general] error when using linear SVM with AdaBoost



On Sat, Oct 4, 2014 at 1:09 AM, Andy <***@gmail.com<mailto:***@gmail.com>> wrote:

I'm pretty sure that is wrong, unless you use the "decision_function"
and not "predict_proba" or "predict".
Mathieu said "predict" is used. Then it is still like a (very old
school) neural network with a thresholding layer,
and not like a linear model at all.

I don't think this is exactly like a neural network. In a neural network, the non-linear activation functions are part of the objective function, so they affect parameter estimation directly. Here, a linear SVC is first fitted *then* its weight in the ensemble is estimated, given the predictions fixed. Since np.sign (or predict_proba when available) is applied post-hoc, it should affect neither the linear SVC model nor its weight in the ensemble.
The main idea of AdaBoost is to increasingly focus on the difficult examples. This suggests that weak learners should be diverse enough, i.e., they should disagree in their predictions on most examples. My intuition is that a linear SVC doesn't fulfill this requirement. I would rather use a weak learner (oracle) with high variance, low bias.

I would be curious to see how AdaBoost + LinearSVC fares on MNIST. Since non-linear models outperform linear ones on this dataset, the results would be a good indicator.
Mathieu

Loading...