[Scikit-learn-general] Supervised principal component analysis in scikit-learn?

Andreas Mueller

9 years ago

Hi Henry.
Please discuss issues like these on the mailing list.
Any one particular developer might not have time to respond.

Blair's SPC is just "make_pipeline(SelectKBest(), PCA(),
LogisticRegression())". So I wouldn't say "it didn't make it through".
I'd rather say "it's already implemented".

There is indeed no supervised PCA in scikit-learn. The paper seems not
really well-established enough for inclusion in scikit-learn, see
http://scikit-learn.org/dev/faq.html#can-i-add-this-new-algorithm-that-i-or-someone-else-just-published

The paper has 50 citations, which is not a lot. It is basically a
classification or regression algorithm with some nice visualization
properties.
To include it, it would need to out-perform more established approaches
on a variety of datasets.
I only skimmed the paper but they don't even seems to compare against
linear approaches like ridge or lasso.

That doesn't mean it's not beneficial to create an open source python
implementation that is scikit-learn compatible, again see
http://scikit-learn.org/dev/faq.html#can-i-add-this-new-algorithm-that-i-or-someone-else-just-published

Cheers,
Andy

...