Hi and thanks a lot for the interest.

selection in handwritten digit recognition (zipcode).

something like H. Zou, T. Hastie and R. Tibshirani (2006). "Sparse

principal component analysis" should be useful. (there exists an

interesting penalized SVD method to solve it).

PO Hoyer (2004). "Non-negative Matrix Factorization with Sparseness

running them on the zip code data. It is my first hands-on application

after much reading up and I am very enthusiastic and motivated.

*Post by Gael Varoquaux**Post by Vlad Niculae*I am working on my undergrad thesis in a NumPy environment and I plan

to use as much of scikits-learn as I can. I will research and compare

implementations of PCA, sparse PCA, NMF and sparse NMF. However apart

from PCA, I did not find any unified libraries with the others, even

though there are plenty of implementations available.

On the learn homepage it says that matrix factorization is a planned

feature. Is there work in progress on this? If not, I could attempt to

gather together and port what I find, and contribute it.

Hey Vlad,

Welcome! Its great to have enthusiastic people joining us.

Matrix factorization is indeed a planned feature, and we are starting to

have a bit of methods doing this, specifically ICA and PCA

(http://scikit-learn.sourceforge.net/modules/decompositions.html). But we

are interested by adding much more (basically any 'standard' methods is

more than welcome).

I know that there is are a few NMF implementations in Python. Some of

them have no license attached to them, so the first thing to do is to ask

the authors if they are ready to license their code under a BSD license

and have it included in the scikit (with their name on it, of course).

MILK (by Luis Pedro) has an NMF implementation that is licensed under the

MIT license, so compatible with the scikit. You will also have some work

to do to compare the different implementations speed-wise and

stability-wise. This kind of work is great to gain insight on the methods

and will probably be beneficial for your research. Once you know which

code you want to contribute, simply fork the scikit on github and start

building your contribution in the fork. You will need to pay attention to

respecting the coding style of the scikit and to writing examples and

documentation (another great way of gaining insight). We will review it,

and integrate it in the scikit when it is ripe.

With regards the sparse PCA, What is your definition of sparse PCA? There

are different ways of imposing a penalty on the PCA problem. We (at the

Parietal INRIA team) have some code that implements a PCA-like problem in

a sparse dictionary learning framework, using the scikit. It's not open

source because we are still working on it, and because we need to shoot

out a publication using it before we open it. However, it will be open in

the near future (the big question is when), and we can share it with

specific people asking for it.

I suggest that you start small: small contributions are easier to

integrate. You could for instance start with NMF, and we could focus on

trying to get NMF in before we try to get any other method in. Then you

could focus on sparse NMF, or maybe we could open up our sparse PCA code,

and if it suits you, you code work on integrating it in the scikit

(shouldn't be a huge amount of work, as we have the same coding style for

our internal code). In the long run, if you want, you could make sure

that the different matrix factorization methods expose an interface as

uniform as possible (trust me, it requires some active work to fight

software entropy :P).

Purely out of curiosity, may I ask if you have a specific application in

mind for matrix factorization?

This is exciting!

Gaël

------------------------------------------------------------------------------

Beautiful is writing same markup. Internet Explorer 9 supports

standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.

Spend less time writing and rewriting code and more time creating great

experiences on the web. Be a part of the beta today

http://p.sf.net/sfu/msIE9-sfdev2dev

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general