[Scikit-learn-general] OvR, Logistic Regression and SGD

Discussion:

Abhi

2012-11-06 00:33:06 UTC

Hello,
I have been reading and testing examples around the sklearn documentation and
am not too clear on few things and would appreciate any help regarding the
following questions:
1) What would be the advantage of training LogisticRegression vs
OneVsRestClassifier(LogisticRegression()) for multiclass. (I understand
the latter would basically train n_classes classifiers).
2) Isnt SGDClassifier(loss='log') better than LogisticRegression for large
sparse datasets? If so, why?
3) If I need predict_proba for just the best class match from the multiclass
classifier, can I use OneVsRestClassifier(SGDClassifier())

I tested on empirical data, but have approximately similar results with
LogisticRegression, SGDClassifier and LinearSVC. (For now data.shape = (10000,
400000) ). However in future the data might scale to large number of training
set and features, so wanted to get clearer idea on which approach to choose.
Thanks,
A

Gael Varoquaux

2012-11-06 06:52:46 UTC

Permalink

Different decision boundaries. Depends on your dataset.

Post by Abhi
2) Isnt SGDClassifier(loss='log') better than LogisticRegression for large
sparse datasets? If so, why?

Faster, probably.

HTH,

Gaël

Mathieu Blondel

2012-11-06 07:18:25 UTC

Permalink

Post by Abhi
Hello,
I have been reading and testing examples around the sklearn
documentation and
am not too clear on few things and would appreciate any help regarding the
1) What would be the advantage of training LogisticRegression vs
OneVsRestClassifier(LogisticRegression()) for multiclass. (I understand
the latter would basically train n_classes classifiers).

They actually do the same. liblinear uses one-vs-rest everywhere except for
the crammer-singer SVM formulation.
I wonder why we keep getting this question.

Post by Abhi
2) Isnt SGDClassifier(loss='log') better than LogisticRegression for large
sparse datasets? If so, why?

It's faster to train *once* you chose the learning rate, which is usually a
pain. You can also try LogisticRegression(tol=1e-2) or
LogisticRegression(tol=1e-1).

Post by Abhi
3) If I need predict_proba for just the best class match from the multiclass
classifier, can I use OneVsRestClassifier(SGDClassifier())

In that case you can just use predict().

Mathieu

a***@ais.uni-bonn.de

2012-11-06 08:14:35 UTC

Permalink

we should probably improve the docs on the ovr. iirc the user guide was already very explicit, maybe add something to the docstring?
abhi: did you read the user guide on the one vs rest classifier? how could we improve it to make things more clear?

Post by Abhi

Post by Abhi
Hello,
I have been reading and testing examples around the sklearn documentation and
am not too clear on few things and would appreciate any help

regarding the

Post by Abhi
1) What would be the advantage of training LogisticRegression vs
OneVsRestClassifier(LogisticRegression()) for multiclass. (I

understand

Post by Abhi
the latter would basically train n_classes classifiers).

They actually do the same. liblinear uses one-vs-rest everywhere except for
the crammer-singer SVM formulation.
I wonder why we keep getting this question.

Post by Abhi
2) Isnt SGDClassifier(loss='log') better than LogisticRegression for

large

Post by Abhi
sparse datasets? If so, why?

It's faster to train *once* you chose the learning rate, which is usually a
pain. You can also try LogisticRegression(tol=1e-2) or
LogisticRegression(tol=1e-1).

Post by Abhi
3) If I need predict_proba for just the best class match from the multiclass
classifier, can I use OneVsRestClassifier(SGDClassifier())

In that case you can just use predict().
Mathieu
------------------------------------------------------------------------
------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command
center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

Gael Varoquaux

2012-11-06 08:10:13 UTC

Permalink

Post by Abhi
1) What would be the advantage of training LogisticRegression vs
OneVsRestClassifier(LogisticRegression()) for multiclass. (I understand
the latter would basically train n_classes classifiers).
They actually do the same. liblinear uses one-vs-rest everywhere except
for the crammer-singer SVM formulation.

Ooops, sorry for my answer that was wrong. I got confused.

G

Olivier Grisel

2012-11-06 09:26:43 UTC

Permalink

Post by Mathieu Blondel

Post by Abhi
Hello,
I have been reading and testing examples around the sklearn documentation and
am not too clear on few things and would appreciate any help regarding the
1) What would be the advantage of training LogisticRegression vs
OneVsRestClassifier(LogisticRegression()) for multiclass. (I understand
the latter would basically train n_classes classifiers).

They actually do the same. liblinear uses one-vs-rest everywhere except for
the crammer-singer SVM formulation.
I wonder why we keep getting this question.

Indeed Abhi which section specific section of the documentation (or
docstring) led you to ask this question?

The note on this page is pretty explicit:

http://scikit-learn.org/dev/modules/multiclass.html

Along with the docstring:

http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression

Maybe the docstring could be made more consistent and use the
one-vs-rest notation instance of one-vs-all (which is a synonym).

Post by Mathieu Blondel

Post by Abhi
2) Isnt SGDClassifier(loss='log') better than LogisticRegression for large
sparse datasets? If so, why?

It's faster to train *once* you chose the learning rate, which is usually a
pain. You can also try LogisticRegression(tol=1e-2) or
LogisticRegression(tol=1e-1).

Actually the default learning rate schedule of scikit-learn kind of
always work but you have to adjust `n_iter` which is an additional
parameter w.r.t. LogisticRegression.

Also SGDClassifier can spare a dataset memory copy if your data is can
be natively loaded as a scipy Compressed Sparse Rows matrix. Also if
the data does not fit in memory you can load it as CSR chunks (e.g.
from a set of svmlight files on the filesystem or database or
vectorized on the fly from text content using a pre-fitted text
vectorizer) and the model can be incrementally learned using
sequential calls to the partial_fit method.

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Abhi

2012-11-07 20:07:21 UTC

Permalink

Post by Olivier Grisel
Indeed Abhi which section specific section of the documentation (or
docstring) led you to ask this question?
http://scikit-learn.org/dev/modules/multiclass.html
http://scikit-

learn.org/dev/modules/generated/sklearn.linear_model.LogisticRegression.html#skl
earn.linear_model.LogisticRegression

Post by Olivier Grisel
Maybe the docstring could be made more consistent and use the
one-vs-rest notation instance of one-vs-all (which is a synonym).

Ah, sorry I should have been clear, the docstring and the multiclass page are
indeed clear on what each classifier does, but I was just a bit confused on
the practical application when, specifically, I will go for a particular
algorithm rather than the other(when data/number of classes can scale steeply,
and the number of requests for predicting correct category per hour are very
large, hence predict/vectorization time critical.)

For e.g.
http://scikit-learn.org/dev/modules/multiclass.html#one-vs-the-rest
It is mentioned
"In addition to its computational efficiency (only n_classes classifiers are
needed), one advantage of this approach is its interpretability. "
and on http://scikit-learn.org/dev/modules/generated/
sklearn.linear_model.LogisticRegression.html
#sklearn.linear_model.LogisticRegression
"In the multiclass case, the training algorithm uses a one-vs.-all (OvA) scheme,
rather than the “true” multinomial LR."
So if OneVsRestClassifier is more computationally efficient and the underlying
method/algorithm can achieve same rate of success, why not directly use that
scheme.

Post by Olivier Grisel

Post by Mathieu Blondel

Post by Abhi
2) Isnt SGDClassifier(loss='log') better than LogisticRegression for large
sparse datasets? If so, why?

It's faster to train *once* you chose the learning rate, which is usually a
pain. You can also try LogisticRegression(tol=1e-2) or
LogisticRegression(tol=1e-1).

Actually the default learning rate schedule of scikit-learn kind of
always work but you have to adjust `n_iter` which is an additional
parameter w.r.t. LogisticRegression.
Also SGDClassifier can spare a dataset memory copy if your data is can
be natively loaded as a scipy Compressed Sparse Rows matrix. Also if
the data does not fit in memory you can load it as CSR chunks (e.g.
from a set of svmlight files on the filesystem or database or
vectorized on the fly from text content using a pre-fitted text
vectorizer) and the model can be incrementally learned using
sequential calls to the partial_fit method.

Later on I might need to split the large dataset so partial fit might be a good
option in case of mem issues. Presently I am good on memory
since I use generators to get the data, the max mem usage for the entire
training run turns out around 10-15g. Thanks for clearing that though.

Er, since I am rather new to scikit and machine learning in general I apologize
for the the simple questions.