Discussion:
More on broken tests
(too old to reply)
Olivier Grisel
2011-02-01 13:58:38 UTC
Permalink
Hi all,

Git bissect it accusing Gael's checkin on the transform API for
univariate selection:

$ git pull origin master
$ git bisect start HEAD HEAD~10 --
$ git bisect run make
[lots and lots of git and build and tests outputs]
a36fa7b653afb05e73acad79fe600d2ef8fe28a3 is the first bad commit
commit a36fa7b653afb05e73acad79fe600d2ef8fe28a3
Author: Gael varoquaux <***@normalesup.org>
Date: Tue Feb 1 00:08:48 2011 +0100

ENH: Add inverse transform to univariate_selection

:040000 040000 42259955686b8646a4b9784a184d369f8fa4b556
2ed2cd17c8d824c6e80888aa368bbe42a2ded8f9 M scikits
bisect run success
$ git bisect reset

The change in the use of the mixins is likely to cause the pipeline
tests to break.

More on git-bisect:
http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html (read
the examples at the bottom).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2011-02-01 14:01:08 UTC
Permalink
Post by Olivier Grisel
Git bissect it accusing Gael's checkin on the transform API for
$ git pull origin master
$ git bisect start HEAD HEAD~10 --
$ git bisect run make
[lots and lots of git and build and tests outputs]
a36fa7b653afb05e73acad79fe600d2ef8fe28a3 is the first bad commit
commit a36fa7b653afb05e73acad79fe600d2ef8fe28a3
Date: Tue Feb 1 00:08:48 2011 +0100
ENH: Add inverse transform to univariate_selection
:040000 040000 42259955686b8646a4b9784a184d369f8fa4b556
2ed2cd17c8d824c6e80888aa368bbe42a2ded8f9 M scikits
bisect run success
$ git bisect reset
Wow! OK, my bad. I simply thought that my change couldn't have broken
that, but it did. I'll fix it. Sorry for the noise, and hurray for
testing!

G
Mathieu Blondel
2011-02-01 14:29:53 UTC
Permalink
On Tue, Feb 1, 2011 at 11:01 PM, Gael Varoquaux
Post by Gael Varoquaux
Wow! OK, my bad. I simply thought that my change couldn't have broken
that, but it did. I'll fix it. Sorry for the noise, and hurray for
testing!
I have the fix and the problem was in the TransformerMixin. I will
commit unless Gael you have done it already in local.

Mathieu
Gael Varoquaux
2011-02-01 14:33:05 UTC
Permalink
Post by Mathieu Blondel
On Tue, Feb 1, 2011 at 11:01 PM, Gael Varoquaux
Post by Gael Varoquaux
Wow! OK, my bad. I simply thought that my change couldn't have broken
that, but it did. I'll fix it. Sorry for the noise, and hurray for
testing!
I have the fix and the problem was in the TransformerMixin.
Thanks!
Post by Mathieu Blondel
I will commit unless Gael you have done it already in local.
No, I haven't, I have been fixing problems elsewhere (not in the
scikit)... Go ahead.

G
Mathieu Blondel
2011-02-01 14:46:26 UTC
Permalink
On Tue, Feb 1, 2011 at 11:33 PM, Gael Varoquaux
Post by Gael Varoquaux
No, I haven't, I have been fixing problems elsewhere (not in the
scikit)... Go ahead.
So, the problem was that transform(X,y) was called when a not-None y
was passed to fit_transform. I was mis-guided by some objects like
Scaler in preprocessing/__init__.py which have transform(X,y) methods.
However, these methods don't even use y... Furthermore, to me, passing
y to transform is a semantic mistake. Just like predict, transform
shouldn't depend on y. Do you agree? If it's ok, I will remove all
references to y in the transform methods in preprocessing/__init__.py.

Mathieu
Alexandre Gramfort
2011-02-01 14:53:22 UTC
Permalink
Post by Mathieu Blondel
So, the problem was that transform(X,y) was called when a not-None y
was passed to fit_transform. I was mis-guided by some objects like
Scaler in preprocessing/__init__.py which have transform(X,y) methods.
However, these methods don't even use y... Furthermore, to me, passing
y to transform is a semantic mistake. Just like predict, transform
shouldn't depend on y. Do you agree? If it's ok, I will remove all
references to y in the transform methods in preprocessing/__init__.py.
the reason for passing y to transform is that you have supervised
transformations.
The univariate feature selection is the best example and as you mentioned
in one of your first emails on the ML, LDA can be used to transform data
(still to be done btw).

so keep the y unless you find a better way to do this without breaking
anything.

Alex
Mathieu Blondel
2011-02-01 15:10:09 UTC
Permalink
On Tue, Feb 1, 2011 at 11:53 PM, Alexandre Gramfort
Post by Alexandre Gramfort
the reason for passing y to transform is that you have supervised
transformations.
The univariate feature selection is the best example and as you mentioned
in one of your first emails on the ML, LDA can be used to transform data
(still to be done btw).
But how do you transform new (unlabeled) data then? It seems to me
that, for supervised dimensionality reduction, y is used by fit, not
by transform. I've just checked the univariate feature selection
module and y is never referenced in transform. I'd think that this
will be the same for LDA when we implement it.

Mathieu
Olivier Grisel
2011-02-01 15:15:59 UTC
Permalink
Post by Mathieu Blondel
On Tue, Feb 1, 2011 at 11:53 PM, Alexandre Gramfort
Post by Alexandre Gramfort
the reason for passing y to transform is that you have supervised
transformations.
The univariate feature selection is the best example and as you mentioned
in one of your first emails on the ML, LDA can be used to transform data
(still to be done btw).
But how do you transform new (unlabeled) data then? It seems to me
that, for supervised dimensionality reduction, y is used by fit, not
by transform. I've just checked the univariate feature selection
module and y is never referenced in transform. I'd think that this
will be the same for LDA when we implement it.
I thought the same as Mathieu.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Alexandre Gramfort
2011-02-01 15:20:35 UTC
Permalink
Post by Mathieu Blondel
But how do you transform new (unlabeled) data then? It seems to me
that, for supervised dimensionality reduction, y is used by fit, not
by transform. I've just checked the univariate feature selection
module and y is never referenced in transform. I'd think that this
will be the same for LDA when we implement it.
hum ... correct ... bad examples....

what if you're in a regression setting and you want to center y ?

should we allow this? or should the estimator be smart and handle
intercepts correctly?

Alex
Mathieu Blondel
2011-02-01 15:36:02 UTC
Permalink
On Wed, Feb 2, 2011 at 12:20 AM, Alexandre Gramfort
Post by Alexandre Gramfort
hum ... correct ... bad examples....
what if you're in a regression setting and you want to center y ?
should we allow this? or should the estimator be smart and handle
intercepts correctly?
You mean a Scaler object that can scale both X and y? Even in
regression, y is not available at transform time, only at fit time. So
we cannot use transform to return the scaled y. One way is to save the
scaled y in an attribute (member variable), so that the user can
access it after fit. However, the point of the Scaler object is to
retain a state so that you can apply the exact same scaling to new
data. For y, we don't need to retain any state so a function would be
better than an object.

Mathieu
Alexandre Gramfort
2011-02-01 15:43:14 UTC
Permalink
Post by Mathieu Blondel
You mean a Scaler object that can scale both X and y?
yes but as you explain clearly below it does not make much sense.

+1 for removing y from transform.

Alex
Post by Mathieu Blondel
Even in
regression, y is not available at transform time, only at fit time. So
we cannot use transform to return the scaled y. One way is to save the
scaled y in an attribute (member variable), so that the user can
access it after fit. However, the point of the Scaler object is to
retain a state so that you can apply the exact same scaling to new
data. For y, we don't need to retain any state so a function would be
better than an object.
Mathieu
Gael Varoquaux
2011-02-01 16:41:41 UTC
Permalink
Post by Alexandre Gramfort
Post by Mathieu Blondel
So, the problem was that transform(X,y) was called when a not-None y
was passed to fit_transform. I was mis-guided by some objects like
Scaler in preprocessing/__init__.py which have transform(X,y) methods.
However, these methods don't even use y... Furthermore, to me, passing
y to transform is a semantic mistake. Just like predict, transform
shouldn't depend on y. Do you agree? If it's ok, I will remove all
references to y in the transform methods in preprocessing/__init__.py.
the reason for passing y to transform is that you have supervised
transformations.
The univariate feature selection is the best example and as you mentioned
in one of your first emails on the ML, LDA can be used to transform data
(still to be done btw).
so keep the y unless you find a better way to do this without breaking
anything.
I think everything should be 'y=None' so that transforms can take 'y', or
not take 'y' without complaining, and that we can chain supervised and
unsupervised transforms.

Do you think that would solve the problem?

G
Alexandre Gramfort
2011-02-01 17:02:50 UTC
Permalink
the thing is that I don't see any use case of supervised transform now.
If a y is needed it appears in the fit, not in the transform as pointed out
by mathieu.

anyone sees a good reason to keep the y in transform?

Alex

On Tue, Feb 1, 2011 at 11:41 AM, Gael Varoquaux
Post by Gael Varoquaux
Post by Alexandre Gramfort
Post by Mathieu Blondel
So, the problem was that transform(X,y) was called when a not-None y
was passed to fit_transform. I was mis-guided by some objects like
Scaler in preprocessing/__init__.py which have transform(X,y) methods.
However, these methods don't even use y... Furthermore, to me, passing
y to transform is a semantic mistake. Just like predict, transform
shouldn't depend on y. Do you agree? If it's ok, I will remove all
references to y in the transform methods in preprocessing/__init__.py.
the reason for passing y to transform is that you have supervised
transformations.
The univariate feature selection is the best example and as you mentioned
in one of your first emails on the ML, LDA can be used to transform data
(still to be done btw).
so keep the y unless you find a better way to do this without breaking
anything.
I think everything should be 'y=None' so that transforms can take 'y', or
not take 'y' without complaining, and that we can chain supervised and
unsupervised transforms.
Do you think that would solve the problem?
G
Gael Varoquaux
2011-02-01 17:04:45 UTC
Permalink
Post by Alexandre Gramfort
the thing is that I don't see any use case of supervised transform now.
If a y is needed it appears in the fit, not in the transform as pointed out
by mathieu.
anyone sees a good reason to keep the y in transform?
Good point. So far I don't see any, so at least until we see a good
reason to change let's go for your suggested solution.

G
Mathieu Blondel
2011-02-01 19:00:13 UTC
Permalink
On Wed, Feb 2, 2011 at 2:04 AM, Gael Varoquaux
Post by Gael Varoquaux
Good point. So far I don't see any, so at least until we see a good
reason to change let's go for your suggested solution.
To me, passing y to transform is like passing it to predict: doesn't
make much sense. Note that, in that matter, LabelBinarizer is a bit
special.

On a related topic, after looking at the pipeline object, I see that
there are calls to fit(X, y) even when y is None. We can either add
y=None to all fit methods in the scikit or only pass y if it is not
None. I tend to prefer the second to keep the method signatures more
good looking :)

Also, it would be nice if all transform methods in the scikit had a
copy=True|False option. That would enable to write a more efficient
pipeline by setting copy=False throughout the pipeline.

Mathieu
Olivier Grisel
2011-02-01 19:09:31 UTC
Permalink
Post by Mathieu Blondel
On Wed, Feb 2, 2011 at 2:04 AM, Gael Varoquaux
Post by Gael Varoquaux
Good point. So far I don't see any, so at least until we see a good
reason to change let's go for your suggested solution.
To me, passing y to transform is like passing it to predict: doesn't
make much sense. Note that, in that matter, LabelBinarizer is a bit
special.
On a related topic, after looking at the pipeline object, I see that
there are calls to fit(X, y) even when y is None. We can either add
y=None to all fit methods in the scikit or only pass y if it is not
None. I tend to prefer the second to keep the method signatures more
good looking :)
Also, it would be nice if all transform methods in the scikit had a
copy=True|False option. That would enable to write a more efficient
pipeline by setting copy=False throughout the pipeline.
I would also like to have an option on the pipeline to wrap transform
calls with an instance of Memory().cache
from joblib to avoid recomputing partial results over and over again
when doing a gridsearch. Or maybe we could improve the gridsearch /
pipeline combo to avoid computing useless intermediary results when
the upstream parameters don't change.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Alexandre Gramfort
2011-02-01 19:13:02 UTC
Permalink
Post by Olivier Grisel
I would also like to have an option on the pipeline to wrap transform
calls with an instance of Memory().cache
from joblib to avoid recomputing partial results over and over again
when doing a gridsearch. Or maybe we could improve the gridsearch /
pipeline combo to avoid computing useless intermediary results when
the upstream parameters don't change.
hierarchical feature agglomeration (under review) is actually using an
instance of Memory
for caching in grid search settings.
It is currently transformer specific.

Alex
Gael Varoquaux
2011-02-01 19:55:54 UTC
Permalink
Post by Mathieu Blondel
To me, passing y to transform is like passing it to predict: doesn't
make much sense. Note that, in that matter, LabelBinarizer is a bit
special.
They are many cases in which it makes sens to pass y to predict. For
instance when you are doing supervised feature selection, or in general
when you are doing supervised learning in a deep learning way.
Post by Mathieu Blondel
On a related topic, after looking at the pipeline object, I see that
there are calls to fit(X, y) even when y is None. We can either add
y=None to all fit methods in the scikit or only pass y if it is not
None. I tend to prefer the second to keep the method signatures more
good looking :)
I tend to prefer the first because if I want to compare a pipeline with a
supervised transform and a pipeline with an unspupervised transform, I
shouldn't have to have custom code taking care of it. It does make the
method calls a bit uglier, but I would like to try and avoid having
different code paths for supervised and unsupervised methods.
Post by Mathieu Blondel
Also, it would be nice if all transform methods in the scikit had a
copy=True|False option. That would enable to write a more efficient
pipeline by setting copy=False throughout the pipeline.
+1.

By the way, idealy the conclusion of all these very interesting
documentation should be added to the developer documentation, which is
very much lacking love.

Gaël
Alexandre Passos
2011-02-01 20:11:25 UTC
Permalink
On Tue, Feb 1, 2011 at 17:55, Gael Varoquaux
Post by Gael Varoquaux
Post by Mathieu Blondel
To me, passing y to transform is like passing it to predict: doesn't
make much sense. Note that, in that matter, LabelBinarizer is a bit
special.
They are many cases in which it makes sens to pass y to predict. For
instance when you are doing supervised feature selection, or in general
when you are doing supervised learning in a deep learning way.
Why exactly does it make sense in deep learning? I can see a method
score(x, y) being useful in deep learning settings (as well as CRF,
structured perceptron, MIRa, passive-aggressive, etc), but which value
of y would you pass to predict?
--
 - Alexandre
Gael Varoquaux
2011-02-01 21:02:23 UTC
Permalink
Post by Alexandre Passos
Post by Gael Varoquaux
They are many cases in which it makes sens to pass y to predict. For
instance when you are doing supervised feature selection, or in general
when you are doing supervised learning in a deep learning way.
Why exactly does it make sense in deep learning? I can see a method
score(x, y) being useful in deep learning settings (as well as CRF,
structured perceptron, MIRa, passive-aggressive, etc), but which value
of y would you pass to predict?
Sorry, this is a typo, I meant 'fit'.

G

Olivier Grisel
2011-02-01 17:03:15 UTC
Permalink
Post by Gael Varoquaux
Post by Alexandre Gramfort
Post by Mathieu Blondel
So, the problem was that transform(X,y) was called when a not-None y
was passed to fit_transform. I was mis-guided by some objects like
Scaler in preprocessing/__init__.py which have transform(X,y) methods.
However, these methods don't even use y... Furthermore, to me, passing
y to transform is a semantic mistake. Just like predict, transform
shouldn't depend on y. Do you agree? If it's ok, I will remove all
references to y in the transform methods in preprocessing/__init__.py.
the reason for passing y to transform is that you have supervised
transformations.
The univariate feature selection is the best example and as you mentioned
in one of your first emails on the ML, LDA can be used to transform data
(still to be done btw).
so keep the y unless you find a better way to do this without breaking
anything.
I think everything should be 'y=None' so that transforms can take 'y', or
not take 'y' without complaining, and that we can chain supervised and
unsupervised transforms.
That sounds a bit weird to me to have a y in the prototype of purely
unsupervised transformers such as PCA, ICA, .... I am -0 for this.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Continue reading on narkive:
Loading...