Andreas Mueller
2015-12-08 18:04:50 UTC
Hi Henry.
Please discuss issues like these on the mailing list.
Any one particular developer might not have time to respond.
Blair's SPC is just "make_pipeline(SelectKBest(), PCA(),
LogisticRegression())". So I wouldn't say "it didn't make it through".
I'd rather say "it's already implemented".
There is indeed no supervised PCA in scikit-learn. The paper seems not
really well-established enough for inclusion in scikit-learn, see
http://scikit-learn.org/dev/faq.html#can-i-add-this-new-algorithm-that-i-or-someone-else-just-published
The paper has 50 citations, which is not a lot. It is basically a
classification or regression algorithm with some nice visualization
properties.
To include it, it would need to out-perform more established approaches
on a variety of datasets.
I only skimmed the paper but they don't even seems to compare against
linear approaches like ridge or lasso.
That doesn't mean it's not beneficial to create an open source python
implementation that is scikit-learn compatible, again see
http://scikit-learn.org/dev/faq.html#can-i-add-this-new-algorithm-that-i-or-someone-else-just-published
Cheers,
Andy
Please discuss issues like these on the mailing list.
Any one particular developer might not have time to respond.
Blair's SPC is just "make_pipeline(SelectKBest(), PCA(),
LogisticRegression())". So I wouldn't say "it didn't make it through".
I'd rather say "it's already implemented".
There is indeed no supervised PCA in scikit-learn. The paper seems not
really well-established enough for inclusion in scikit-learn, see
http://scikit-learn.org/dev/faq.html#can-i-add-this-new-algorithm-that-i-or-someone-else-just-published
The paper has 50 citations, which is not a lot. It is basically a
classification or regression algorithm with some nice visualization
properties.
To include it, it would need to out-perform more established approaches
on a variety of datasets.
I only skimmed the paper but they don't even seems to compare against
linear approaches like ridge or lasso.
That doesn't mean it's not beneficial to create an open source python
implementation that is scikit-learn compatible, again see
http://scikit-learn.org/dev/faq.html#can-i-add-this-new-algorithm-that-i-or-someone-else-just-published
Cheers,
Andy
Hi all,
My name's Henry Lin, and I'm a Master's student at the University of
Illinois at Urbana Champaign. You might remember me from a few pull
requests from scikit-learn. (5431
<https://github.com/scikit-learn/scikit-learn/pull/5431> and 5825
<https://github.com/scikit-learn/scikit-learn/pull/5825>).
I've been recently performing research in embedding methods for
classification, and one algorithm that I've recently been interested
in is supervised principal component analysis by Barshan and et al.
here
<http://www.sciencedirect.com/science/article/pii/S0031320310005819>.
(It's not the "supervised principal components" by Bair, Hastie et
al. here <https://web.stanford.edu/%7Ehastie/Papers/spca_JASA.pdf>.)
I was wondering whether there would be any interest in adding a
supervised principal component analysis to scikit-learn. This has been
previously proposed with Bair's SPC in this pull request
<https://github.com/scikit-learn/scikit-learn/pull/5196>, but it never
made it though. (The workflow was too similar to a scikit-learn
pipeline.) On the other hand, the work by Barshan is completely
different from Bair, and I think it would be interesting to have a
supervised version of PCA added to scikit-learn. (To my knowledge,
there is currently no supervised PCA in the library.)
I am currently in contact with Elnaz Barshan, and she has given me the
code from her paper. Using her matlab code I've been able to reproduce
some of her results, and with some time I'll be able to rewrite her
work in python. I'd just like some validation from scikit-learn owners
(such as yourselves âº) to see whether it's a worthy time investment
for me to work on this project. It would entail me to verify with her
that she would like to see her code implemented in a public domain,
and then I would obviously have to implement it in python, with
scikit-learn's standards.
What do you guys think?
-Henry Lin
--
/*Henry Lin*, Research Assistant
M.S. Computer Science
University of Illinois at Urbana-Champaign 2016
My name's Henry Lin, and I'm a Master's student at the University of
Illinois at Urbana Champaign. You might remember me from a few pull
requests from scikit-learn. (5431
<https://github.com/scikit-learn/scikit-learn/pull/5431> and 5825
<https://github.com/scikit-learn/scikit-learn/pull/5825>).
I've been recently performing research in embedding methods for
classification, and one algorithm that I've recently been interested
in is supervised principal component analysis by Barshan and et al.
here
<http://www.sciencedirect.com/science/article/pii/S0031320310005819>.
(It's not the "supervised principal components" by Bair, Hastie et
al. here <https://web.stanford.edu/%7Ehastie/Papers/spca_JASA.pdf>.)
I was wondering whether there would be any interest in adding a
supervised principal component analysis to scikit-learn. This has been
previously proposed with Bair's SPC in this pull request
<https://github.com/scikit-learn/scikit-learn/pull/5196>, but it never
made it though. (The workflow was too similar to a scikit-learn
pipeline.) On the other hand, the work by Barshan is completely
different from Bair, and I think it would be interesting to have a
supervised version of PCA added to scikit-learn. (To my knowledge,
there is currently no supervised PCA in the library.)
I am currently in contact with Elnaz Barshan, and she has given me the
code from her paper. Using her matlab code I've been able to reproduce
some of her results, and with some time I'll be able to rewrite her
work in python. I'd just like some validation from scikit-learn owners
(such as yourselves âº) to see whether it's a worthy time investment
for me to work on this project. It would entail me to verify with her
that she would like to see her code implemented in a public domain,
and then I would obviously have to implement it in python, with
scikit-learn's standards.
What do you guys think?
-Henry Lin
--
/*Henry Lin*, Research Assistant
M.S. Computer Science
University of Illinois at Urbana-Champaign 2016