Discussion:
Multiclass Logistic Regression.
Luca Cerone
2013-09-24 16:42:12 UTC
Dear all,

I am practising with scikit-learn to solve multiclass classification
problems.

As an exercise I am trying to build a model to predict the digits dataset
available with scikit-learn.

Ideally I would like to solve this using logistic regression, building a
predictor for each digit (one vs all approach).

When a new "digit" comes I predict the output for each of the trained
classifiers and choose the prediction with the maximum value
(as you can see I am not doing anything special, I think that it is the
naivest approach that you can follow).

So far I performed most of this steps manually, but I guess that there
might be some faster/smarter approach.

For example here is my approach that classifies a digit as 0, 1 or Other.
from sklearn.linear_models import LogisticRegression

data = digits.data
target = digits.target

import pylab as pl
idx = pl.permutation(data.shape[0])

#split the dataset
n_train_sample = 1000
idx_train = idx[0:n_train_sample]
idx_test = idx[0:n_train_sample]
data_train = data[idx_train, : ]
target_train = target[idx_train, : ]
data_test = data[idx_test, : ]
target_test = target[idx_test,:]

#build the classifier that recognize 0:
tar_tr_0 = array(map(lambda x : 1 if x == 0 else 0, target_train))
cfr_0 = LogisticRegression()
cfr_0.fit(data_train, tar_tr_0)

#build the classifier that recognize 1:
tar_tr_0 = array(map(lambda x : 1 if x == 1 else 0, target_train))
cfr_1 = LogisticRegression()
cfr_1.fit(data_train, tar_tr_1)

#build the classifier that recognizes "other":
tar_tr_other = array(map(lambda x : 1 if x > 1 else 0, target_train))
cfr_other = LogisticRegression()
cfr_other.fit(data_train, tar_tr_other)
<<<

Next of course there is some code that takes in input the various trained
classifiers, makes prediction on the test etc etc.

I did this partly for educational purposes (despite I know in theory how
multiclass classification can be performed I never did the prior steps
written before,
which I are useful to learn), partly because I got a bit lost when reading
the documentation (http://scikit-learn.org/stable/modules/multiclass.html).

For the One versus Rest I think I can
use sklearn.multiclass.OneVsRestClassifier (and now I am trying to do
this).
What I couldn't understand however is how to have access to the internal
classifiers, to check for their score etc etc.
I couldn't understand also how to setup a criterion to chose the output.
What if for example the classifier is very good at discriminating all the
digits but 4 and 1?

Also I wanted to build a classifier using some form of cross validation,
but again I got a bit lost.

Sorry if my questions are quite silly!

Thanks a lot in advance for the help!

Cheers,
Luca

P.s. what if I want to "expand" the list of features to perform logistic
regression with quadratic terms? Is there an easy way to do this?
Luca Cerone
2013-09-24 17:09:05 UTC
Ok training a OneVsAll classifier it was actually easy.
To inspect the individual classifier I can use the .estimators_ attribute?
Do the estimators in it correspond to the .classes_ that is the
estimators_[0] is trained to recognize .classes_[0] vs the other and so on?

Is there a way to check how the normalization of the data is performed?

Thanks again!
Cheers,
Luca
Post by Luca Cerone
Dear all,
I am practising with scikit-learn to solve multiclass classification
problems.
As an exercise I am trying to build a model to predict the digits dataset
available with scikit-learn.
Ideally I would like to solve this using logistic regression, building a
predictor for each digit (one vs all approach).
When a new "digit" comes I predict the output for each of the trained
classifiers and choose the prediction with the maximum value
(as you can see I am not doing anything special, I think that it is the
naivest approach that you can follow).
So far I performed most of this steps manually, but I guess that there
might be some faster/smarter approach.
For example here is my approach that classifies a digit as 0, 1 or Other.
from sklearn.linear_models import LogisticRegression
data = digits.data
target = digits.target
import pylab as pl
idx = pl.permutation(data.shape[0])
#split the dataset
n_train_sample = 1000
idx_train = idx[0:n_train_sample]
idx_test = idx[0:n_train_sample]
data_train = data[idx_train, : ]
target_train = target[idx_train, : ]
data_test = data[idx_test, : ]
target_test = target[idx_test,:]
tar_tr_0 = array(map(lambda x : 1 if x == 0 else 0, target_train))
cfr_0 = LogisticRegression()
cfr_0.fit(data_train, tar_tr_0)
tar_tr_0 = array(map(lambda x : 1 if x == 1 else 0, target_train))
cfr_1 = LogisticRegression()
cfr_1.fit(data_train, tar_tr_1)
tar_tr_other = array(map(lambda x : 1 if x > 1 else 0, target_train))
cfr_other = LogisticRegression()
cfr_other.fit(data_train, tar_tr_other)
<<<
Next of course there is some code that takes in input the various trained
classifiers, makes prediction on the test etc etc.
I did this partly for educational purposes (despite I know in theory how
multiclass classification can be performed I never did the prior steps
written before,
which I are useful to learn), partly because I got a bit lost when reading
the documentation (http://scikit-learn.org/stable/modules/multiclass.html
).
For the One versus Rest I think I can
use sklearn.multiclass.OneVsRestClassifier (and now I am trying to do
this).
What I couldn't understand however is how to have access to the internal
classifiers, to check for their score etc etc.
I couldn't understand also how to setup a criterion to chose the output.
What if for example the classifier is very good at discriminating all the
digits but 4 and 1?
Also I wanted to build a classifier using some form of cross validation,
but again I got a bit lost.
Sorry if my questions are quite silly!
Thanks a lot in advance for the help!
Cheers,
Luca
P.s. what if I want to "expand" the list of features to perform logistic
regression with quadratic terms? Is there an easy way to do this?
--
*Luca Cerone*

Tel: +447585611951
Skype: luca.cerone
Olivier Grisel
2013-09-25 09:39:02 UTC
LogisticRegression is a already multiclass classifier by default using
the One vs Rest / All strategy by default (as implemented internally
by liblinear which LogisticRegression is a wrapper of). So you don't
need to use OneVsRest in this case.

If you want more info on multiclass reductions here is the doc:
http://scikit-learn.org/stable/modules/multiclass.html
Post by Luca Cerone
Is there a way to check how the normalization of the data is performed?
What normalization? There is no normalization unless you do it
yourself with one of those tools and a pipeline:

http://scikit-learn.org/stable/modules/preprocessing.html
--
Olivier
Luca Cerone
2013-09-25 10:54:33 UTC
Dear Olivier,
Post by Olivier Grisel
LogisticRegression is a already multiclass classifier by default using
the One vs Rest / All strategy by default (as implemented internally
by liblinear which LogisticRegression is a wrapper of). So you don't
need to use OneVsRest in this case.
http://scikit-learn.org/stable/modules/multiclass.html
This morning I checked the source for LogisticRegression in
sklearn/linear_model/logistic.py and realized that by default it performs
multiclass classification
(this is not explained in the user guide
http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression,
though).
Post by Olivier Grisel
Is there a way to check how the normalization of the data is performed?
What normalization? There is no normalization unless you do it
http://scikit-learn.org/stable/modules/preprocessing.html
You are right, I got confused with LinearRegression that display something
like *normalize=None* when performing the fit.
I scribbled about checking for it in the documentation on a piece of paper
and got confused when I was looking at the documentation for
LogisticRegression and wrote the email.

There are still a few things that are not clear to me from the
documentation. Can you customize the classifier to perform a different
decision function?
Or can I "hook" a preprocessing step to be applied to the data (I am
thinking for example for polynomial logistic regression, where from the
original dataset
I want to "build" all the features of order 2, for example. I am just
asking for educational purposes, I guess there are more appropriate
methods).

Other questions that I have:
1. can I use a norm different from l1 or l2?
2. similarly, can I define my own cost function?
3. can I try alternative optimization algorithms?

I am sure these answers are in the documentation, but I couldn't find them
in the TOC (in the user guide) and I have encountered them, yet.

Thanks again for the help!

Cheers,
Luca
Lars Buitinck
2013-09-25 11:39:31 UTC
Post by Luca Cerone
This morning I checked the source for LogisticRegression in
sklearn/linear_model/logistic.py and realized that by default it performs
multiclass classification
(this is not explained in the user guide
http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression,
though).
All our classifiers support multiclass classification and this is
documented in various places.
Post by Luca Cerone
There are still a few things that are not clear to me from the
documentation. Can you customize the classifier to perform a different
decision function?
You can subclass it and override the decision_function method.
Post by Luca Cerone
Or can I "hook" a preprocessing step to be applied to the data (I am
thinking for example for polynomial logistic regression, where from the
original dataset
You can implement a polynomial expansion as a transformer object, then
tie it to logistic regression using a sklearn.pipeline.Pipeline. See
the developer's docs, esp. the "Rolling your own estimator" guide [1],
or our recent paper [2] for the conventions.
Post by Luca Cerone
1. can I use a norm different from l1 or l2?
For what?
Post by Luca Cerone
2. similarly, can I define my own cost function?
No, unless you hack the source code.
Post by Luca Cerone
3. can I try alternative optimization algorithms?
You can try SGDClassifier(loss="log") which also implements
one-vs.-all logistic regression, but trained with stochastic gradient
descent.

[1] http://scikit-learn.org/stable/developers/index.html#rolling-your-own-estimator
[2] http://staff.science.uva.nl/~buitinck/papers/scikit-learn-api.pdf
Luca Cerone
2013-09-25 12:22:31 UTC
Post by Luca Cerone
Post by Luca Cerone
(this is not explained in the user guide
http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
,
Post by Luca Cerone
though).
All our classifiers support multiclass classification and this is
documented in various places.
I am sorry, but I went into the user documentation for logistic regression
and multiclass classification and didn't find any information about it
Post by Luca Cerone
Post by Luca Cerone
There are still a few things that are not clear to me from the
documentation. Can you customize the classifier to perform a different
decision function?
You can subclass it and override the decision_function method.
Post by Luca Cerone
Or can I "hook" a preprocessing step to be applied to the data (I am
thinking for example for polynomial logistic regression, where from the
original dataset
You can implement a polynomial expansion as a transformer object, then
tie it to logistic regression using a sklearn.pipeline.Pipeline. See
the developer's docs, esp. the "Rolling your own estimator" guide [1],
or our recent paper [2] for the conventions.
Thanks, I'll look into it
Post by Luca Cerone
Post by Luca Cerone
1. can I use a norm different from l1 or l2?
For what?
for the penalty in LogisticRegression, but looking at the code it seems it
is not possible.
Post by Luca Cerone
Post by Luca Cerone
2. similarly, can I define my own cost function?
No, unless you hack the source code.
Post by Luca Cerone
3. can I try alternative optimization algorithms?
You can try SGDClassifier(loss="log") which also implements
one-vs.-all logistic regression, but trained with stochastic gradient
descent.
Isn't there an interface to implement my own optimizer and see the
performances?
Post by Luca Cerone
[1]
http://scikit-learn.org/stable/developers/index.html#rolling-your-own-estimator
[2] http://staff.science.uva.nl/~buitinck/papers/scikit-learn-api.pdf
Thanks for the links, I'll go through them!

Cheers,
Luca
Lars Buitinck
2013-09-25 12:31:22 UTC
Post by Luca Cerone
I am sorry, but I went into the user documentation for logistic regression
and multiclass classification and didn't find any information about it
Hm, maybe we should put this in a more prominent place like the
tutorial. I'll check the docs if I have time.
Post by Luca Cerone
for the penalty in LogisticRegression, but looking at the code it seems it
is not possible.
No, because there are no other options for that in Liblinear.
SGDClassifier supports a linear combination of L1 and L2, though.
Post by Luca Cerone
Isn't there an interface to implement my own optimizer and see the
performances?
Nope. We offer quite a few do-it-yourself hooks, but for the sake of
efficiency and maintainability, we have to hardcode some things.
Andreas Mueller
2013-10-19 04:45:07 UTC
Post by Lars Buitinck
Post by Luca Cerone
I am sorry, but I went into the user documentation for logistic regression
and multiclass classification and didn't find any information about it
Hm, maybe we should put this in a more prominent place like the
tutorial. I'll check the docs if I have time.
The multi-class documentation says
"You don’t need to use these estimators unless you want to experiment
with different multiclass strategies:
all classifiers in scikit-learn support multiclass classification
out-of-the-box. Below is a summary of the classifiers supported by
scikit-learn grouped by strategy:"
Maybe that should be in bold or something?
Lars Buitinck
2013-10-19 09:57:47 UTC
Post by Andreas Mueller
The multi-class documentation says
"You don’t need to use these estimators unless you want to experiment
all classifiers in scikit-learn support multiclass classification
out-of-the-box. Below is a summary of the classifiers supported by
scikit-learn grouped by strategy:"
Maybe that should be in bold or something?
Hm, I guess fixing the docs on this point is useless. All the info is

Olivier Grisel
2013-09-25 12:55:54 UTC
Post by Luca Cerone
Post by Lars Buitinck
Post by Luca Cerone
(this is not explained in the user guide
http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression,
though).
All our classifiers support multiclass classification and this is
documented in various places.
I am sorry, but I went into the user documentation for logistic regression
and multiclass classification and didn't find any information about it
Click on the LogisticRegression in the section you mentioned and you
will end up on the reference doc for this class where it is mentioned:

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression

The multiclass doc also tells explicitly that all linear models (such
as LogisticRegression) are one-vs-all by default:

http://scikit-learn.org/stable/modules/multiclass.html
--
Olivier
Luca Cerone
2013-09-25 13:14:34 UTC
Post by Luca Cerone
Post by Luca Cerone
Post by Lars Buitinck
Post by Luca Cerone
(this is not explained in the user guide
http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
,
Post by Luca Cerone
Post by Lars Buitinck
Post by Luca Cerone
though).
All our classifiers support multiclass classification and this is
documented in various places.
I am sorry, but I went into the user documentation for logistic
regression
Post by Luca Cerone
and multiclass classification and didn't find any information about it
Click on the LogisticRegression in the section you mentioned and you
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression
I feel quite stupid, but I didn't realize it was a link, I thought it was
only "bolded"....
2013-09-25 12:49:20 UTC
Post by Lars Buitinck
Post by Luca Cerone
There are still a few things that are not clear to me from the
documentation. Can you customize the classifier to perform a different
decision function?
You can subclass it and override the decision_function method.
While true, this can be misleading. You're just changing the final
step used when making predictions, it will not change learning.
Depending on the nature of the change you want to do, this could be
wrong.