2011-08-08 12:54:00 UTC
This is my first post to this list. So before diving in, I would like to
thank all of the scikits.learn contributors for successfully marrying
machine learning and python into such an awesome library. I hope that
one day, I will have the occasion/time/courage to contribute.
Back to business. I have a set-up similar to the example "Sample
pipeline for text feature extraction and evaluation" , i.e. I want to
optimize by cross-validation some parameters of a pipeline (feature
extractor, classifier). I also want to feed some fixed parameters to the
"fit" method of the classifier by using the "fit_params" key-word
argument of the GridSearchCV constructor. For instance, just replace the
grid_search definition line in  by:
grid_search = GridSearchCV(pipeline, parameters, n_jobs=1,
Running  with this modification yields:
AssertionError: Invalid parameter class_weight for estimator SGDClassifier
(I advise to put 'n_jobs=1' in order to not be flooded with error
messages). Note that removing 'clf__' yields the same thing with
"estimator Pipeline" instead.
After some digging, it seems that the problem comes from pipeline.py
where the "fit_params" arguments are not passed to the "fit" methods of
their respective pipeline steps, but are used as initialization
parameters of the steps in "_pre_transform", which calls "self._set_params".
As I am a fairly recent user of scikits.learn I was wondering if this
was a bug or if I am doing it wrong. My scikits.learn version is 0.8.1.
I checked the latest source on github and it looks like it didn't
change. I also googled a bit to see if anyone came across this. No luck
Sorry for the lengthy description. Thanks a lot and keep up the