Discussion:
OneVsRestClassifier in git master
(too old to reply)
David Warde-Farley
2012-02-15 00:54:45 UTC
Permalink
Further to my last message, there seems to be a rather serious regression in
OneVsRestClassifier in git master. I didn't find an issue about it.

The following reproduces the behaviour in question:



import numpy as np
import sklearn
from sklearn.multiclass import OneVsRestClassifier
import sklearn.svm as svm
rng = np.random.seed(0)
train = np.random.randn(1000, 400)
train_l = np.random.random_integers(1, 10, size=(1000,))
test = np.random.randn(1000, 400)
test_l = np.random.random_integers(1, 10, size=(1000,))
svm = OneVsRestClassifier(svm.SVC(C=0.01, scale_C=False)).fit(train, train_l)
print "Training accuracy:", (train_l ≡ svm.predict(train)).mean()
print "Test accuracy:", (test_l ≡ svm.predict(test)).mean()
print "scikit-learn version:", sklearn.__version__




***@atchoum:~$ python test.py
/u/wardefar/.local/lib/python2.7/site-packages/scikit_learn-0.10-py2.7-linux-x86_64.egg/sklearn/svm/classes.py:184:
FutureWarning: SVM: scale_C will be True by default in scikit-learn 0.11
cache_size, scale_C)
Training accuracy: 1.0
Test accuracy: 0.096
scikit-learn version: 0.10


***@atchoum:~$ python test.py
/u/wardefar/.local/lib/python2.7/site-packages/scikit_learn-0.11_git-py2.7-linux-x86_64.egg/sklearn/svm/classes.py:228:
FutureWarning: SVM: scale_C will disappear and be assumed to be True in
scikit-learn 0.12
cache_size, scale_C, sparse="auto")
Training accuracy: 0.0
Test accuracy: 0.107
scikit-learn version: 0.11-git
David Warde-Farley
2012-02-15 01:03:41 UTC
Permalink
Post by David Warde-Farley
print "Training accuracy:", (train_l ≡ svm.predict(train)).mean()
print "Test accuracy:", (test_l ≡ svm.predict(test)).mean()
print "scikit-learn version:", sklearn.__version__
Sorry, editor fudge. That should be:

print "Training accuracy:", (train_l == svm.predict(train)).mean()
print "Test accuracy:", (test_l == svm.predict(test)).mean()
print "scikit-learn version:", sklearn.__version__
Olivier Grisel
2012-02-15 06:31:57 UTC
Permalink
C is now scaled against the number of examples. You should re-grid
search tune it or multiply it's old value my `n_samples`.
--
Olivier
Mathieu Blondel
2012-02-15 07:20:59 UTC
Permalink
On Wed, Feb 15, 2012 at 7:31 AM, Olivier Grisel
Post by Olivier Grisel
C is now scaled against the number of examples. You should re-grid
search tune it or multiply it's old value my `n_samples`.
David did use scale_C=False...

I can reproduce the error by adding the following test to test_multiclass.py:

def test_ovr_fit_predict_svc():
ovr = OneVsRestClassifier(SVC())
pred = ovr.fit(iris.data, iris.target).predict(iris.data)
assert_equal(len(ovr.estimators_), n_classes)
print ovr.score(iris.data, iris.target)

git bisect tells me that the regression was introduced in:
https://github.com/scikit-learn/scikit-learn/commit/658897497399147a78fad5f7001fc62dd1e487ed

decision_function in SVC is lacking tests !

Mathieu
Olivier Grisel
2012-02-15 07:30:04 UTC
Permalink
Indeed, good catch! Can you open an issue along with the script you
used for the bisect?
Olivier Grisel
2012-02-15 07:30:46 UTC
Permalink
Post by Olivier Grisel
Indeed, good catch! Can you open an issue along with the script you
used for the bisect?
Actually I will do it myself.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Olivier Grisel
2012-02-15 07:34:28 UTC
Permalink
Post by Olivier Grisel
Post by Olivier Grisel
Indeed, good catch! Can you open an issue along with the script you
used for the bisect?
Actually I will do it myself.
Done: https://github.com/scikit-learn/scikit-learn/issues/630
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
David Warde-Farley
2012-02-15 10:26:16 UTC
Permalink
Post by Mathieu Blondel
https://github.com/scikit-learn/scikit-learn/commit/658897497399147a78fad5f7001fc62dd1e487ed
Wow, that was quick. Thanks, Mathieu!

Loading...