Discussion:
GridSearchCV, Pipeline and fit_params problem
(too old to reply)
Adrien
2011-08-08 12:54:00 UTC
Permalink
Hello everyone,

This is my first post to this list. So before diving in, I would like to
thank all of the scikits.learn contributors for successfully marrying
machine learning and python into such an awesome library. I hope that
one day, I will have the occasion/time/courage to contribute.

Back to business. I have a set-up similar to the example "Sample
pipeline for text feature extraction and evaluation" [1], i.e. I want to
optimize by cross-validation some parameters of a pipeline (feature
extractor, classifier). I also want to feed some fixed parameters to the
"fit" method of the classifier by using the "fit_params" key-word
argument of the GridSearchCV constructor. For instance, just replace the
grid_search definition line in [1] by:

grid_search = GridSearchCV(pipeline, parameters, n_jobs=1,
fit_params={'clf__class_weight': 'auto'})

Running [1] with this modification yields:

AssertionError: Invalid parameter class_weight for estimator SGDClassifier

(I advise to put 'n_jobs=1' in order to not be flooded with error
messages). Note that removing 'clf__' yields the same thing with
"estimator Pipeline" instead.

After some digging, it seems that the problem comes from pipeline.py
where the "fit_params" arguments are not passed to the "fit" methods of
their respective pipeline steps, but are used as initialization
parameters of the steps in "_pre_transform", which calls "self._set_params".

As I am a fairly recent user of scikits.learn I was wondering if this
was a bug or if I am doing it wrong. My scikits.learn version is 0.8.1.
I checked the latest source on github and it looks like it didn't
change. I also googled a bit to see if anyone came across this. No luck
so far.

Any insight?

Sorry for the lengthy description. Thanks a lot and keep up the
excellent work!

Adrien

[1]
http://scikit-learn.sourceforge.net/stable/auto_examples/grid_search_text_feature_extraction.html
Vlad Niculae
2011-08-08 13:05:19 UTC
Permalink
Hi Adrien

I ran into this too, I think fit parameters do not work with the
pipeline, not sure if for architectural reasons or by mistake.

I thought my use case was too isolated so I didn't report it at the
time, but now that you encountered it too, I think it should be looked
into, I will try to do this.

Meanwhile: what I did then, because I was in a hurry, was to simply
break apart the pipeline and store the intermediate result. Then I ran
grid search on the classifier alone.

p.s. Your gmail username is cool!

Best,
Vlad
Adrien
2011-08-08 13:37:13 UTC
Permalink
Hello Vlad,

Thanks for the quick reply!

I needed to make this work asap so I wrote and quickly tested a small
patch to pipeline.py that I think does the job. I don't know if I should
post it here or do something else, so, for now, I attached it to this
e-mail.

Tell me if it is of any interest.

Hope this helps,

Adrien

PS: thanks for the remark on my silly user name :-)
Post by Vlad Niculae
Hi Adrien
I ran into this too, I think fit parameters do not work with the
pipeline, not sure if for architectural reasons or by mistake.
I thought my use case was too isolated so I didn't report it at the
time, but now that you encountered it too, I think it should be looked
into, I will try to do this.
Meanwhile: what I did then, because I was in a hurry, was to simply
break apart the pipeline and store the intermediate result. Then I ran
grid search on the classifier alone.
p.s. Your gmail username is cool!
Best,
Vlad
------------------------------------------------------------------------------
BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
The must-attend event for mobile developers. Connect with experts.
Get tools for creating Super Apps. See the latest technologies.
Sessions, hands-on labs, demos& much more. Register early& save!
http://p.sf.net/sfu/rim-blackberry-1
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vlad Niculae
2011-08-08 13:55:39 UTC
Permalink
Thanks for the patch!

For everybody:
With this patch, the user will not be able to do Pipeline().fit(X,
steps=steps). I'm not sure if anybody would want to be able to do
this, though, but I guess we should be consistent.

The main things I think should be checked are:
if setting fit parameters works
if setting class parameters by passing them to fit works (as in the
doctest in pipeline.py)
if malformed parameter names or ones that reference inexistent steps
do not break it

Should I set up a branch with this patch and add a couple of tests for
such behaviour? Did I miss anything?

Best,
Vlad
Post by Vlad Niculae
Hi Adrien
I ran into this too, I think fit parameters do not work with the
pipeline, not sure if for architectural reasons or by mistake.
I thought my use case was too isolated so I didn't report it at the
time, but now that you encountered it too, I think it should be looked
into, I will try to do this.
Meanwhile: what I did then, because I was in a hurry, was to simply
break apart the pipeline and store the intermediate result. Then I ran
grid search on the classifier alone.
p.s. Your gmail username is cool!
Best,
Vlad
Alexandre Gramfort
2011-08-08 14:28:33 UTC
Permalink
Post by Vlad Niculae
Should I set up a branch with this patch and add a couple of tests for
such behaviour? Did I miss anything?
go for it so everybody can see the diff properly and participate to
the discussion.

Alex
Vlad Niculae
2011-08-08 15:12:28 UTC
Permalink
https://github.com/scikit-learn/scikit-learn/pull/300
here it is, tests are not broken by it, but the issue I pointed out
still remains, and also I'd like to add some more tests.
For example, it should be tested that Adrien's use case (when the fit
method of a step expects a parameter) works.

On Mon, Aug 8, 2011 at 5:28 PM, Alexandre Gramfort
Post by Alexandre Gramfort
Post by Vlad Niculae
Should I set up a branch with this patch and add a couple of tests for
such behaviour? Did I miss anything?
go for it so everybody can see the diff properly and participate to
the discussion.
Alex
------------------------------------------------------------------------------
BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
The must-attend event for mobile developers. Connect with experts.
Get tools for creating Super Apps. See the latest technologies.
Sessions, hands-on labs, demos & much more. Register early & save!
http://p.sf.net/sfu/rim-blackberry-1
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2011-08-08 15:52:05 UTC
Permalink
Post by Vlad Niculae
https://github.com/scikit-learn/scikit-learn/pull/300
here it is, tests are not broken by it, but the issue I pointed out
still remains, and also I'd like to add some more tests.
For example, it should be tested that Adrien's use case (when the fit
method of a step expects a parameter) works.
+1
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Continue reading on narkive:
Loading...