michael kneier
2014-02-27 07:33:38 UTC
Hi all,
I would like to add a "combiner" class which would work with pipeline to allow users to augment the output of scikit's text feature extraction process (or other feature extraction processes). For example, after apply CountVectorizer, it is sometime desirable to augment the resulting dataset with additional features. Unless I am missing something, this is not easily done if the count vectorization is being used in a pipeline, especially if CountVectorizer parameters such as min_df are being optimized along with downstream model parameters.
After I have written code for this class, what is the easiest way to get it reviewed/incorporated into scikit?
Thanks,
Mike Kneier
I would like to add a "combiner" class which would work with pipeline to allow users to augment the output of scikit's text feature extraction process (or other feature extraction processes). For example, after apply CountVectorizer, it is sometime desirable to augment the resulting dataset with additional features. Unless I am missing something, this is not easily done if the count vectorization is being used in a pipeline, especially if CountVectorizer parameters such as min_df are being optimized along with downstream model parameters.
After I have written code for this class, what is the easiest way to get it reviewed/incorporated into scikit?
Thanks,
Mike Kneier