Discussion:
[Scikit-learn-general] Ovr Classifier predict error
Ark
2013-01-11 00:19:28 UTC
Permalink
Hello,
I see an issue with predict in case of predicting a text document. [I load
an already trained classifier (OneVsRest(SGDClassifier(loss=log))) using
joblib.load].
Thanks.


In [1]: import sklearn

In [2]: from sklearn.externals import joblib

In [4]: clf = joblib.load("classifier.joblib")

In [6]: with open("topredict.txt") as f:
...: em = f.read()

In [7]: clf.predict
Out[7]:
<bound method OneVsRestClassifier.predict of
OneVsRestClassifier(estimator=SGDClassifier(alpha=1e-05, class_weight=None,
epsilon=0.1, eta0=0.0,
fit_intercept=True, learning_rate='optimal', loss='log', n_iter=35,
n_jobs=-1, penalty='l2', power_t=0.5, rho=0.85, seed=0,
shuffle=True, verbose=0, warm_start=False))>

In [8]: clf.predict(em)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-20-3c92945e466e> in <module>()
----> 1 clf.predict(em)

/home/n/env/lib/python2.6/site-packages/sklearn/multiclass.pyc in predict(self,
X)
180 self._check_is_fitted()
181
--> 182 return predict_ovr(self.estimators_, self.label_binarizer_, X)
183
184 @property

/home/n/env/lib/python2.6/site-packages/sklearn/multiclass.pyc in
predict_ovr(estimators, label_binarizer, X)
79 def predict_ovr(estimators, label_binarizer, X):
80 """Make predictions using the one-vs-the-rest strategy."""
---> 81 Y = np.array([_predict_binary(e, X) for e in estimators])
82 e = estimators[0]
83 thresh = 0 if hasattr(e, "decision_function") and is_classifier(e)
else .5

/home/n7/env/lib/python2.6/site-packages/sklearn/multiclass.pyc in
_predict_binary(estimator, X)
54 else:
55 # probabilities of the positive class
---> 56 return estimator.predict_proba(X)[:, 1]
57
58

AttributeError: 'list' object has no attribute 'predict_proba'

In [9]:
Andreas Mueller
2013-01-11 09:20:34 UTC
Permalink
Hi Ark
Thanks for reporting the issue.
Could you please provide a minimum code sample to reproduce and open an
issue on github.
That would be great.
Which version of sklearn are you using?

Also, are you aware that you don't need the OneVsRestClassifier for
multi-class support in SGDClassifier?
SGDClassifier has multi-class support on its own.
If you didn't know this, it would be good if you could point us to the
docs that gave you the impression
you needed OneVsRestClassifier - this seems to be a common misconception.

Best,
Andy
Post by Ark
Hello,
I see an issue with predict in case of predicting a text document. [I load
an already trained classifier (OneVsRest(SGDClassifier(loss=log))) using
joblib.load].
Thanks.
In [1]: import sklearn
In [2]: from sklearn.externals import joblib
In [4]: clf = joblib.load("classifier.joblib")
...: em = f.read()
In [7]: clf.predict
<bound method OneVsRestClassifier.predict of
OneVsRestClassifier(estimator=SGDClassifier(alpha=1e-05, class_weight=None,
epsilon=0.1, eta0=0.0,
fit_intercept=True, learning_rate='optimal', loss='log', n_iter=35,
n_jobs=-1, penalty='l2', power_t=0.5, rho=0.85, seed=0,
shuffle=True, verbose=0, warm_start=False))>
In [8]: clf.predict(em)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-20-3c92945e466e> in <module>()
----> 1 clf.predict(em)
/home/n/env/lib/python2.6/site-packages/sklearn/multiclass.pyc in predict(self,
X)
180 self._check_is_fitted()
181
--> 182 return predict_ovr(self.estimators_, self.label_binarizer_, X)
183
/home/n/env/lib/python2.6/site-packages/sklearn/multiclass.pyc in
predict_ovr(estimators, label_binarizer, X)
80 """Make predictions using the one-vs-the-rest strategy."""
---> 81 Y = np.array([_predict_binary(e, X) for e in estimators])
82 e = estimators[0]
83 thresh = 0 if hasattr(e, "decision_function") and is_classifier(e)
else .5
/home/n7/env/lib/python2.6/site-packages/sklearn/multiclass.pyc in
_predict_binary(estimator, X)
55 # probabilities of the positive class
---> 56 return estimator.predict_proba(X)[:, 1]
57
58
AttributeError: 'list' object has no attribute 'predict_proba'
------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Ark
2013-01-15 01:22:41 UTC
Permalink
Post by Andreas Mueller
Could you please provide a minimum code sample to reproduce and open an
issue on github.
Following the minimalistic code to reproduce the issue (assuming the
classifier is already trained and saved). I will open the issue on github for
the same.

-------------------------------------------------------------
from sklearn.externals import joblib


# Train the classifier on a dataset of text
# documents with
# OneVsRestClassifier(SGDClassifier(loss=log, n_iter=35))

# classifier object dumped using joblib.dump,
# without compression, for later use.


classifier = joblib.load("classifier.joblib")

with open("file") as f:
document = f.read()

predict=classifier.predict(document)

--------------------------------------------------------------
Post by Andreas Mueller
That would be great.
Which version of sklearn are you using?
I am on scikit 0.12.1
Post by Andreas Mueller
Also, are you aware that you don't need the OneVsRestClassifier for
multi-class support in SGDClassifier?
SGDClassifier has multi-class support on its own.
If you didn't know this, it would be good if you could point us to the
docs that gave you the impression
you needed OneVsRestClassifier - this seems to be a common misconception.
I am using OneVsRest so as to analyze the binary classifiers for each category.
I also tried downloading the 0.13 version from source and installing it. This
time I see a different error. The steps to reproduce for version 0.13 in ipython
are as follows:

$ ipython
Python 2.6.6 (r266:84292, Jun 18 2012, 14:18:47)
Type "copyright", "credits" or "license" for more information.

IPython 0.13 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.

In [1]: import sklearn

In [2]: from sklearn.externals import joblib

In [3]: clf = joblib.load("/home/n7/classifier.joblib")
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-3-999ca461c6f0> in <module>()
----> 1 clf = joblib.load("/home/n7/classifier.joblib")

/home/n7/env/lib/python2.6/site-
packages/sklearn/externals/joblib/numpy_pickle.pyc in load(filename, mmap_mode)
416
417 try:
--> 418 obj = unpickler.load()
419 finally:
420 if hasattr(unpickler, 'file_handle'):

/usr/lib64/python2.6/pickle.pyc in load(self)
856 while 1:
857 key = read(1)
--> 858 dispatch[key](self)
859 except _Stop, stopinst:
860 return stopinst.value

/usr/lib64/python2.6/pickle.pyc in load_global(self)
1088 module = self.readline()[:-1]
1089 name = self.readline()[:-1]
-> 1090 klass = self.find_class(module, name)
1091 self.append(klass)
1092 dispatch[GLOBAL] = load_global

/usr/lib64/python2.6/pickle.pyc in find_class(self, module, name)
1122 def find_class(self, module, name):
1123 # Subclasses may override this
-> 1124 __import__(module)
1125 mod = sys.modules[module]
1126 klass = getattr(mod, name)

/home/n7/env/lib/python2.6/site-packages/sklearn/linear_model/__init__.py in
<module>()
22 from .ridge import Ridge, RidgeCV, RidgeClassifier, RidgeClassifierCV, \
23 ridge_regression
---> 24 from .logistic import LogisticRegression
25 from .omp import orthogonal_mp, orthogonal_mp_gram,
OrthogonalMatchingPursuit
26 from .perceptron import Perceptron

/home/n7/env/lib/python2.6/site-packages/sklearn/linear_model/logistic.py in
<module>()
4 from ..feature_selection.selector_mixin import SelectorMixin
5 from ..svm.base import BaseLibLinear
----> 6 from ..svm.liblinear import csr_predict_prob_wrap, predict_prob_wrap
7
8

ImportError: cannot import name csr_predict_prob_wrap
Kenneth C. Arnold
2013-01-15 02:27:52 UTC
Permalink
In your code, 'document' is just a string, not a feature vector. You should
use the same Vectorizer that you used to train the classifier to begin with.

Trained classifier objects are generally not compatible across versions.
You should retrain the classifier using the new version (and who knows, you
might even get a better classifier!). Although your particular error looks
like a problem with your installation. Try reinstalling?



-Ken
Post by Ark
Post by Andreas Mueller
Could you please provide a minimum code sample to reproduce and open an
issue on github.
Following the minimalistic code to reproduce the issue (assuming the
classifier is already trained and saved). I will open the issue on github for
the same.
-------------------------------------------------------------
from sklearn.externals import joblib
# Train the classifier on a dataset of text
# documents with
# OneVsRestClassifier(SGDClassifier(loss=log, n_iter=35))
# classifier object dumped using joblib.dump,
# without compression, for later use.
classifier = joblib.load("classifier.joblib")
document = f.read()
predict=classifier.predict(document)
--------------------------------------------------------------
Post by Andreas Mueller
That would be great.
Which version of sklearn are you using?
I am on scikit 0.12.1
Post by Andreas Mueller
Also, are you aware that you don't need the OneVsRestClassifier for
multi-class support in SGDClassifier?
SGDClassifier has multi-class support on its own.
If you didn't know this, it would be good if you could point us to the
docs that gave you the impression
you needed OneVsRestClassifier - this seems to be a common misconception.
I am using OneVsRest so as to analyze the binary classifiers for each category.
I also tried downloading the 0.13 version from source and installing it. This
time I see a different error. The steps to reproduce for version 0.13 in ipython
$ ipython
Python 2.6.6 (r266:84292, Jun 18 2012, 14:18:47)
Type "copyright", "credits" or "license" for more information.
IPython 0.13 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import sklearn
In [2]: from sklearn.externals import joblib
In [3]: clf = joblib.load("/home/n7/classifier.joblib")
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-3-999ca461c6f0> in <module>()
----> 1 clf = joblib.load("/home/n7/classifier.joblib")
/home/n7/env/lib/python2.6/site-
packages/sklearn/externals/joblib/numpy_pickle.pyc in load(filename, mmap_mode)
416
--> 418 obj = unpickler.load()
/usr/lib64/python2.6/pickle.pyc in load(self)
857 key = read(1)
--> 858 dispatch[key](self)
860 return stopinst.value
/usr/lib64/python2.6/pickle.pyc in load_global(self)
1088 module = self.readline()[:-1]
1089 name = self.readline()[:-1]
-> 1090 klass = self.find_class(module, name)
1091 self.append(klass)
1092 dispatch[GLOBAL] = load_global
/usr/lib64/python2.6/pickle.pyc in find_class(self, module, name)
1123 # Subclasses may override this
-> 1124 __import__(module)
1125 mod = sys.modules[module]
1126 klass = getattr(mod, name)
/home/n7/env/lib/python2.6/site-packages/sklearn/linear_model/__init__.py in
<module>()
22 from .ridge import Ridge, RidgeCV, RidgeClassifier,
RidgeClassifierCV, \
23 ridge_regression
---> 24 from .logistic import LogisticRegression
25 from .omp import orthogonal_mp, orthogonal_mp_gram,
OrthogonalMatchingPursuit
26 from .perceptron import Perceptron
/home/n7/env/lib/python2.6/site-packages/sklearn/linear_model/logistic.py in
<module>()
4 from ..feature_selection.selector_mixin import SelectorMixin
5 from ..svm.base import BaseLibLinear
----> 6 from ..svm.liblinear import csr_predict_prob_wrap,
predict_prob_wrap
7
8
ImportError: cannot import name csr_predict_prob_wrap
------------------------------------------------------------------------------
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
http://p.sf.net/sfu/learnmore_122512
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2013-01-15 08:39:27 UTC
Permalink
Post by Ark
Post by Andreas Mueller
That would be great.
Which version of sklearn are you using?
I am on scikit 0.12.1
Post by Andreas Mueller
Also, are you aware that you don't need the OneVsRestClassifier for
multi-class support in SGDClassifier?
SGDClassifier has multi-class support on its own.
If you didn't know this, it would be good if you could point us to the
docs that gave you the impression
you needed OneVsRestClassifier - this seems to be a common misconception.
I am using OneVsRest so as to analyze the binary classifiers for each category.
I also tried downloading the 0.13 version from source and installing it. This
time I see a different error. The steps to reproduce for version 0.13 in ipython
You can not necessarily load a classifier that was trained with one
version in another version.
Could you try retraining with 0.13-git and see if the error persists?
Ark
2013-01-16 20:07:46 UTC
Permalink
category.
Post by Andreas Mueller
Post by Ark
I also tried downloading the 0.13 version from source and installing it. This
time I see a different error. The steps to reproduce for version 0.13 in ipython
You can not necessarily load a classifier that was trained with one
version in another version.
Could you try retraining with 0.13-git and see if the error persists?
Will try again with retraining on 0.13. Regarding the previous comment, I
apologize, it was my mistake, we are using Tfidfvectorizer in the original
source. But the issue seen is still the same, I will update the code in the
issue too.

--------------------------------------------------------------
# Train the classifier on a dataset of text
# documents with
# OneVsRestClassifier(SGDClassifier(loss=log, n_iter=35))

# Classifier object dumped using joblib.dump,
# without compression, for later use.

# Load vectorizer to be used for getting the document vector.
# TfidfVectorizer(stop_words='english', smooth_idf=True,
# sublinear_tf=True, token_pattern=ur'\b(?!\d)\w\w+\b',
# ngram_range=(1, 2), use_idf=False)


print 'Loading vectorizer...'
vectorizer = joblib.load("/home/n7/classifier/vectorizer.joblib")

print 'Loading classifier...'
classifier = joblib.load("/home/n7/classifier/classifier.joblib")

with open("topredict.txt") as f:
document = f.read()

document_vector = vectorizer.transform([document])
predict=classifier.predict(document_vector)

--------------------------------------------------------------

Loading...