Hi and thanks a lot for the interest.
selection in handwritten digit recognition (zipcode).
something like H. Zou, T. Hastie and R. Tibshirani (2006). "Sparse
principal component analysis" should be useful. (there exists an
interesting penalized SVD method to solve it).
PO Hoyer (2004). "Non-negative Matrix Factorization with Sparseness
running them on the zip code data. It is my first hands-on application
after much reading up and I am very enthusiastic and motivated.
Post by Gael Varoquaux
Post by Vlad Niculae
I am working on my undergrad thesis in a NumPy environment and I plan
to use as much of scikits-learn as I can. I will research and compare
implementations of PCA, sparse PCA, NMF and sparse NMF. However apart
from PCA, I did not find any unified libraries with the others, even
though there are plenty of implementations available.
On the learn homepage it says that matrix factorization is a planned
feature. Is there work in progress on this? If not, I could attempt to
gather together and port what I find, and contribute it.
Welcome! Its great to have enthusiastic people joining us.
Matrix factorization is indeed a planned feature, and we are starting to
have a bit of methods doing this, specifically ICA and PCA
(http://scikit-learn.sourceforge.net/modules/decompositions.html). But we
are interested by adding much more (basically any 'standard' methods is
more than welcome).
I know that there is are a few NMF implementations in Python. Some of
them have no license attached to them, so the first thing to do is to ask
the authors if they are ready to license their code under a BSD license
and have it included in the scikit (with their name on it, of course).
MILK (by Luis Pedro) has an NMF implementation that is licensed under the
MIT license, so compatible with the scikit. You will also have some work
to do to compare the different implementations speed-wise and
stability-wise. This kind of work is great to gain insight on the methods
and will probably be beneficial for your research. Once you know which
code you want to contribute, simply fork the scikit on github and start
building your contribution in the fork. You will need to pay attention to
respecting the coding style of the scikit and to writing examples and
documentation (another great way of gaining insight). We will review it,
and integrate it in the scikit when it is ripe.
With regards the sparse PCA, What is your definition of sparse PCA? There
are different ways of imposing a penalty on the PCA problem. We (at the
Parietal INRIA team) have some code that implements a PCA-like problem in
a sparse dictionary learning framework, using the scikit. It's not open
source because we are still working on it, and because we need to shoot
out a publication using it before we open it. However, it will be open in
the near future (the big question is when), and we can share it with
specific people asking for it.
I suggest that you start small: small contributions are easier to
integrate. You could for instance start with NMF, and we could focus on
trying to get NMF in before we try to get any other method in. Then you
could focus on sparse NMF, or maybe we could open up our sparse PCA code,
and if it suits you, you code work on integrating it in the scikit
(shouldn't be a huge amount of work, as we have the same coding style for
our internal code). In the long run, if you want, you could make sure
that the different matrix factorization methods expose an interface as
uniform as possible (trust me, it requires some active work to fight
software entropy :P).
Purely out of curiosity, may I ask if you have a specific application in
mind for matrix factorization?
This is exciting!
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
Spend less time writing and rewriting code and more time creating great
experiences on the web. Be a part of the beta today
Scikit-learn-general mailing list