Discussion:
Is there interest for SPCA, NMF, SNMF in learn?
(too old to reply)
Vlad Niculae
2010-11-20 15:25:05 UTC
Permalink
Hello,
First of all allow me to introduce myself, I am an undergrad student
in CS planning to enter the field of machine learning.
I am a big fan of your work here.

I am working on my undergrad thesis in a NumPy environment and I plan
to use as much of scikits-learn as I can. I will research and compare
implementations of
PCA, sparse PCA, NMF and sparse NMF. However apart from PCA, I did not
find any unified libraries with the others, even though there are
plenty of implementations available.

On the learn homepage it says that matrix factorization is a planned
feature. Is there work in progress on this? If not, I could attempt to
gather together and port what I find, and contribute it.

Yours,
Vlad N
Gael Varoquaux
2010-11-20 15:49:43 UTC
Permalink
Post by Vlad Niculae
I am working on my undergrad thesis in a NumPy environment and I plan
to use as much of scikits-learn as I can. I will research and compare
implementations of PCA, sparse PCA, NMF and sparse NMF. However apart
from PCA, I did not find any unified libraries with the others, even
though there are plenty of implementations available.
On the learn homepage it says that matrix factorization is a planned
feature. Is there work in progress on this? If not, I could attempt to
gather together and port what I find, and contribute it.
Hey Vlad,

Welcome! Its great to have enthusiastic people joining us.

Matrix factorization is indeed a planned feature, and we are starting to
have a bit of methods doing this, specifically ICA and PCA
(http://scikit-learn.sourceforge.net/modules/decompositions.html). But we
are interested by adding much more (basically any 'standard' methods is
more than welcome).

I know that there is are a few NMF implementations in Python. Some of
them have no license attached to them, so the first thing to do is to ask
the authors if they are ready to license their code under a BSD license
and have it included in the scikit (with their name on it, of course).
MILK (by Luis Pedro) has an NMF implementation that is licensed under the
MIT license, so compatible with the scikit. You will also have some work
to do to compare the different implementations speed-wise and
stability-wise. This kind of work is great to gain insight on the methods
and will probably be beneficial for your research. Once you know which
code you want to contribute, simply fork the scikit on github and start
building your contribution in the fork. You will need to pay attention to
respecting the coding style of the scikit and to writing examples and
documentation (another great way of gaining insight). We will review it,
and integrate it in the scikit when it is ripe.

With regards the sparse PCA, What is your definition of sparse PCA? There
are different ways of imposing a penalty on the PCA problem. We (at the
Parietal INRIA team) have some code that implements a PCA-like problem in
a sparse dictionary learning framework, using the scikit. It's not open
source because we are still working on it, and because we need to shoot
out a publication using it before we open it. However, it will be open in
the near future (the big question is when), and we can share it with
specific people asking for it.

I suggest that you start small: small contributions are easier to
integrate. You could for instance start with NMF, and we could focus on
trying to get NMF in before we try to get any other method in. Then you
could focus on sparse NMF, or maybe we could open up our sparse PCA code,
and if it suits you, you code work on integrating it in the scikit
(shouldn't be a huge amount of work, as we have the same coding style for
our internal code). In the long run, if you want, you could make sure
that the different matrix factorization methods expose an interface as
uniform as possible (trust me, it requires some active work to fight
software entropy :P).

Purely out of curiosity, may I ask if you have a specific application in
mind for matrix factorization?

This is exciting!

Gaël
Vlad Niculae
2010-11-20 16:23:25 UTC
Permalink
Hi and thanks a lot for the interest.

I am going to assess how useful these techniques are for feature
selection in handwritten digit recognition (zipcode).

I did not look too much into it yet but for improvement on PCA
something like H. Zou, T. Hastie and R. Tibshirani (2006). "Sparse
principal component analysis" should be useful. (there exists an
interesting penalized SVD method to solve it).

However what seems the most useful to me is the sparse NMF that
consistently produces local representations of facial data as shown in
PO Hoyer (2004). "Non-negative Matrix Factorization with Sparseness
Constraints"

My goal is mainly to gain insight on these techniques and I think a
good way to do this is bringing them to my environment of choice and
running them on the zip code data. It is my first hands-on application
after much reading up and I am very enthusiastic and motivated.


On Sat, Nov 20, 2010 at 5:49 PM, Gael Varoquaux
Post by Gael Varoquaux
Post by Vlad Niculae
I am working on my undergrad thesis in a NumPy environment and I plan
to use as much of scikits-learn as I can. I will research and compare
implementations of PCA, sparse PCA, NMF and sparse NMF. However apart
from PCA, I did not find any unified libraries with the others, even
though there are plenty of implementations available.
On the learn homepage it says that matrix factorization is a planned
feature. Is there work in progress on this? If not, I could attempt to
gather together and port what I find, and contribute it.
Hey Vlad,
Welcome! Its great to have enthusiastic people joining us.
Matrix factorization is indeed a planned feature, and we are starting to
have a bit of methods doing this, specifically ICA and PCA
(http://scikit-learn.sourceforge.net/modules/decompositions.html). But we
are interested by adding much more (basically any 'standard' methods is
more than welcome).
I know that there is are a few NMF implementations in Python. Some of
them have no license attached to them, so the first thing to do is to ask
the authors if they are ready to license their code under a BSD license
and have it included in the scikit (with their name on it, of course).
MILK (by Luis Pedro) has an NMF implementation that is licensed under the
MIT license, so compatible with the scikit. You will also have some work
to do to compare the different implementations speed-wise and
stability-wise. This kind of work is great to gain insight on the methods
and will probably be beneficial for your research. Once you know which
code you want to contribute, simply fork the scikit on github and start
building your contribution in the fork. You will need to pay attention to
respecting the coding style of the scikit and to writing examples and
documentation (another great way of gaining insight). We will review it,
and integrate it in the scikit when it is ripe.
With regards the sparse PCA, What is your definition of sparse PCA? There
are different ways of imposing a penalty on the PCA problem. We (at the
Parietal INRIA team) have some code that implements a PCA-like problem in
a sparse dictionary learning framework, using the scikit. It's not open
source because we are still working on it, and because we need to shoot
out a publication using it before we open it. However, it will be open in
the near future (the big question is when), and we can share it with
specific people asking for it.
I suggest that you start small: small contributions are easier to
integrate. You could for instance start with NMF, and we could focus on
trying to get NMF in before we try to get any other method in. Then you
could focus on sparse NMF, or maybe we could open up our sparse PCA code,
and if it suits you, you code work on integrating it in the scikit
(shouldn't be a huge amount of work, as we have the same coding style for
our internal code). In the long run, if you want, you could make sure
that the different matrix factorization methods expose an interface as
uniform as possible (trust me, it requires some active work to fight
software entropy :P).
Purely out of curiosity, may I ask if you have a specific application in
mind for matrix factorization?
This is exciting!
Gaël
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Peter Prettenhofer
2010-11-20 16:38:38 UTC
Permalink
Hi Vlad,

that's great news - I'm looking forward to having more matrix
factorization techniques in scikit-learn.

Please consider the python port of the projected gradient method for
NMF by Chih-Jen Lin [1]. It could be easily integrated into
scikit-learn since it has the same licencing as Libsvm.

Uwe Schmitt [2] also provides a bunch of NMF methods in python
including sparse NMF by Hoyer.

best,
Peter

[1] http://www.csie.ntu.edu.tw/~cjlin/nmf/index.html
[2] http://public.procoders.net/nnma/
Post by Vlad Niculae
Hi and thanks a lot for the interest.
I am going to assess how useful these techniques are for feature
selection in handwritten digit recognition (zipcode).
I did not look too much into it yet but for improvement on PCA
something like H. Zou, T. Hastie and R. Tibshirani (2006). "Sparse
principal component analysis" should be useful. (there exists an
interesting penalized SVD method to solve it).
However what seems the most useful to me is the sparse NMF that
consistently produces local representations of facial data as shown in
PO Hoyer (2004). "Non-negative Matrix Factorization with Sparseness
Constraints"
My goal is mainly to gain insight on these techniques and I think a
good way to do this is bringing them to my environment of choice and
running them on the zip code data. It is my first hands-on application
after much reading up and I am very enthusiastic and motivated.
On Sat, Nov 20, 2010 at 5:49 PM, Gael Varoquaux
Post by Gael Varoquaux
Post by Vlad Niculae
I am working on my undergrad thesis in a NumPy environment and I plan
to use as much of scikits-learn as I can. I will research and compare
implementations of PCA, sparse PCA, NMF and sparse NMF. However apart
from PCA, I did not find any unified libraries with the others, even
though there are plenty of implementations available.
On the learn homepage it says that matrix factorization is a planned
feature. Is there work in progress on this? If not, I could attempt to
gather together and port what I find, and contribute it.
Hey Vlad,
Welcome! Its great to have enthusiastic people joining us.
Matrix factorization is indeed a planned feature, and we are starting to
have a bit of methods doing this, specifically ICA and PCA
(http://scikit-learn.sourceforge.net/modules/decompositions.html). But we
are interested by adding much more (basically any 'standard' methods is
more than welcome).
I know that there is are a few NMF implementations in Python. Some of
them have no license attached to them, so the first thing to do is to ask
the authors if they are ready to license their code under a BSD license
and have it included in the scikit (with their name on it, of course).
MILK (by Luis Pedro) has an NMF implementation that is licensed under the
MIT license, so compatible with the scikit. You will also have some work
to do to compare the different implementations speed-wise and
stability-wise. This kind of work is great to gain insight on the methods
and will probably be beneficial for your research. Once you know which
code you want to contribute, simply fork the scikit on github and start
building your contribution in the fork. You will need to pay attention to
respecting the coding style of the scikit and to writing examples and
documentation (another great way of gaining insight). We will review it,
and integrate it in the scikit when it is ripe.
With regards the sparse PCA, What is your definition of sparse PCA? There
are different ways of imposing a penalty on the PCA problem. We (at the
Parietal INRIA team) have some code that implements a PCA-like problem in
a sparse dictionary learning framework, using the scikit. It's not open
source because we are still working on it, and because we need to shoot
out a publication using it before we open it. However, it will be open in
the near future (the big question is when), and we can share it with
specific people asking for it.
I suggest that you start small: small contributions are easier to
integrate. You could for instance start with NMF, and we could focus on
trying to get NMF in before we try to get any other method in. Then you
could focus on sparse NMF, or maybe we could open up our sparse PCA code,
and if it suits you, you code work on integrating it in the scikit
(shouldn't be a huge amount of work, as we have the same coding style for
our internal code). In the long run, if you want, you could make sure
that the different matrix factorization methods expose an interface as
uniform as possible (trust me, it requires some active work to fight
software entropy :P).
Purely out of curiosity, may I ask if you have a specific application in
mind for matrix factorization?
This is exciting!
Gaël
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
Alexandre Gramfort
2010-11-23 12:53:06 UTC
Permalink
Hi folks,

I'm also very interested in matrix factorization / dictionary learning stuff.

Find attached a code snippet I did on a rainy sunday afternoon using
the implementation of MILK and the digits dataset.

Could we define a roadmap of what should be done? what algorithm?
Based on existing implementations?

@Vlad : would you lead this effort?

best,
Alex

On Sat, Nov 20, 2010 at 5:38 PM, Peter Prettenhofer
Post by Peter Prettenhofer
Hi Vlad,
that's great news - I'm looking forward to having more matrix
factorization techniques in scikit-learn.
Please consider the python port of the projected gradient method for
NMF by Chih-Jen Lin [1]. It could be easily integrated into
scikit-learn since it has the same licencing as Libsvm.
Uwe Schmitt [2] also provides a bunch of NMF methods in python
including sparse NMF by Hoyer.
best,
 Peter
[1] http://www.csie.ntu.edu.tw/~cjlin/nmf/index.html
[2] http://public.procoders.net/nnma/
Post by Vlad Niculae
Hi and thanks a lot for the interest.
I am going to assess how useful these techniques are for feature
selection in handwritten digit recognition (zipcode).
I did not look too much into it yet but for improvement on PCA
something like H. Zou, T. Hastie and R. Tibshirani (2006). "Sparse
principal component analysis" should be useful. (there exists an
interesting penalized SVD method to solve it).
However what seems the most useful to me is the sparse NMF that
consistently produces local representations of facial data as shown in
PO Hoyer (2004). "Non-negative Matrix Factorization with Sparseness
Constraints"
My goal is mainly to gain insight on these techniques and I think a
good way to do this is bringing them to my environment of choice and
running them on the zip code data. It is my first hands-on application
after much reading up and I am very enthusiastic and motivated.
On Sat, Nov 20, 2010 at 5:49 PM, Gael Varoquaux
Post by Gael Varoquaux
Post by Vlad Niculae
I am working on my undergrad thesis in a NumPy environment and I plan
to use as much of scikits-learn as I can. I will research and compare
implementations of PCA, sparse PCA, NMF and sparse NMF. However apart
from PCA, I did not find any unified libraries with the others, even
though there are plenty of implementations available.
On the learn homepage it says that matrix factorization is a planned
feature. Is there work in progress on this? If not, I could attempt to
gather together and port what I find, and contribute it.
Hey Vlad,
Welcome! Its great to have enthusiastic people joining us.
Matrix factorization is indeed a planned feature, and we are starting to
have a bit of methods doing this, specifically ICA and PCA
(http://scikit-learn.sourceforge.net/modules/decompositions.html). But we
are interested by adding much more (basically any 'standard' methods is
more than welcome).
I know that there is are a few NMF implementations in Python. Some of
them have no license attached to them, so the first thing to do is to ask
the authors if they are ready to license their code under a BSD license
and have it included in the scikit (with their name on it, of course).
MILK (by Luis Pedro) has an NMF implementation that is licensed under the
MIT license, so compatible with the scikit. You will also have some work
to do to compare the different implementations speed-wise and
stability-wise. This kind of work is great to gain insight on the methods
and will probably be beneficial for your research. Once you know which
code you want to contribute, simply fork the scikit on github and start
building your contribution in the fork. You will need to pay attention to
respecting the coding style of the scikit and to writing examples and
documentation (another great way of gaining insight). We will review it,
and integrate it in the scikit when it is ripe.
With regards the sparse PCA, What is your definition of sparse PCA? There
are different ways of imposing a penalty on the PCA problem. We (at the
Parietal INRIA team) have some code that implements a PCA-like problem in
a sparse dictionary learning framework, using the scikit. It's not open
source because we are still working on it, and because we need to shoot
out a publication using it before we open it. However, it will be open in
the near future (the big question is when), and we can share it with
specific people asking for it.
I suggest that you start small: small contributions are easier to
integrate. You could for instance start with NMF, and we could focus on
trying to get NMF in before we try to get any other method in. Then you
could focus on sparse NMF, or maybe we could open up our sparse PCA code,
and if it suits you, you code work on integrating it in the scikit
(shouldn't be a huge amount of work, as we have the same coding style for
our internal code). In the long run, if you want, you could make sure
that the different matrix factorization methods expose an interface as
uniform as possible (trust me, it requires some active work to fight
software entropy :P).
Purely out of curiosity, may I ask if you have a specific application in
mind for matrix factorization?
This is exciting!
Gaël
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vlad Niculae
2010-11-24 10:59:27 UTC
Permalink
I would be happy to! Of course I would like you to tolerate my lack of
experience :)

Here is what I think it would be interesting to have.

NMF [1]
Sparse NMF [2] or [3]
Sparse coding [4]
Dictionary learning [5]
Sparse PCA [6]

Should I set up a fork for this on github?

[1] http://www.csie.ntu.edu.tw/~cjlin/nmf/index.html
[2] http://public.procoders.net/nnma/
[3] www.cs.unm.edu/~ismav/papers/ssiai-conv-nmf.pdf
[4] http://www.stanford.edu/~hllee/softwares/nips06-sparsecoding.htm
[5] http://www.di.ens.fr/willow/pdfs/icml09.pdf
[6] http://www.princeton.edu/~aspremon/DSPCA.htm




On Tue, Nov 23, 2010 at 2:53 PM, Alexandre Gramfort
Post by Alexandre Gramfort
Hi folks,
I'm also very interested in matrix factorization / dictionary learning stuff.
Find attached a code snippet I did on a rainy sunday afternoon using
the implementation of MILK and the digits dataset.
Could we define a roadmap of what should be done? what algorithm?
Based on existing implementations?
@Vlad : would you lead this effort?
best,
Alex
On Sat, Nov 20, 2010 at 5:38 PM, Peter Prettenhofer
Post by Peter Prettenhofer
Hi Vlad,
that's great news - I'm looking forward to having more matrix
factorization techniques in scikit-learn.
Please consider the python port of the projected gradient method for
NMF by Chih-Jen Lin [1]. It could be easily integrated into
scikit-learn since it has the same licencing as Libsvm.
Uwe Schmitt [2] also provides a bunch of NMF methods in python
including sparse NMF by Hoyer.
best,
 Peter
[1] http://www.csie.ntu.edu.tw/~cjlin/nmf/index.html
[2] http://public.procoders.net/nnma/
Post by Vlad Niculae
Hi and thanks a lot for the interest.
I am going to assess how useful these techniques are for feature
selection in handwritten digit recognition (zipcode).
I did not look too much into it yet but for improvement on PCA
something like H. Zou, T. Hastie and R. Tibshirani (2006). "Sparse
principal component analysis" should be useful. (there exists an
interesting penalized SVD method to solve it).
However what seems the most useful to me is the sparse NMF that
consistently produces local representations of facial data as shown in
PO Hoyer (2004). "Non-negative Matrix Factorization with Sparseness
Constraints"
My goal is mainly to gain insight on these techniques and I think a
good way to do this is bringing them to my environment of choice and
running them on the zip code data. It is my first hands-on application
after much reading up and I am very enthusiastic and motivated.
On Sat, Nov 20, 2010 at 5:49 PM, Gael Varoquaux
Post by Gael Varoquaux
Post by Vlad Niculae
I am working on my undergrad thesis in a NumPy environment and I plan
to use as much of scikits-learn as I can. I will research and compare
implementations of PCA, sparse PCA, NMF and sparse NMF. However apart
from PCA, I did not find any unified libraries with the others, even
though there are plenty of implementations available.
On the learn homepage it says that matrix factorization is a planned
feature. Is there work in progress on this? If not, I could attempt to
gather together and port what I find, and contribute it.
Hey Vlad,
Welcome! Its great to have enthusiastic people joining us.
Matrix factorization is indeed a planned feature, and we are starting to
have a bit of methods doing this, specifically ICA and PCA
(http://scikit-learn.sourceforge.net/modules/decompositions.html). But we
are interested by adding much more (basically any 'standard' methods is
more than welcome).
I know that there is are a few NMF implementations in Python. Some of
them have no license attached to them, so the first thing to do is to ask
the authors if they are ready to license their code under a BSD license
and have it included in the scikit (with their name on it, of course).
MILK (by Luis Pedro) has an NMF implementation that is licensed under the
MIT license, so compatible with the scikit. You will also have some work
to do to compare the different implementations speed-wise and
stability-wise. This kind of work is great to gain insight on the methods
and will probably be beneficial for your research. Once you know which
code you want to contribute, simply fork the scikit on github and start
building your contribution in the fork. You will need to pay attention to
respecting the coding style of the scikit and to writing examples and
documentation (another great way of gaining insight). We will review it,
and integrate it in the scikit when it is ripe.
With regards the sparse PCA, What is your definition of sparse PCA? There
are different ways of imposing a penalty on the PCA problem. We (at the
Parietal INRIA team) have some code that implements a PCA-like problem in
a sparse dictionary learning framework, using the scikit. It's not open
source because we are still working on it, and because we need to shoot
out a publication using it before we open it. However, it will be open in
the near future (the big question is when), and we can share it with
specific people asking for it.
I suggest that you start small: small contributions are easier to
integrate. You could for instance start with NMF, and we could focus on
trying to get NMF in before we try to get any other method in. Then you
could focus on sparse NMF, or maybe we could open up our sparse PCA code,
and if it suits you, you code work on integrating it in the scikit
(shouldn't be a huge amount of work, as we have the same coding style for
our internal code). In the long run, if you want, you could make sure
that the different matrix factorization methods expose an interface as
uniform as possible (trust me, it requires some active work to fight
software entropy :P).
Purely out of curiosity, may I ask if you have a specific application in
mind for matrix factorization?
This is exciting!
Gaël
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2010-11-24 11:18:23 UTC
Permalink
Post by Vlad Niculae
I would be happy to! Of course I would like you to tolerate my lack of
experience :)
This is the nice thing with github based pull requests: you can submit
some work for review and get feedback on the timeline as work
progresses towards respecting the project code conventions and good
practices as was demonstrated by Vincent on the kriging pull request.
Post by Vlad Niculae
Here is what I think it would be interesting to have.
NMF [1]
Sparse NMF [2] or [3]
Sparse coding [4]
Dictionary learning [5]
Sparse PCA [6]
Should I set up a fork for this on github?
+1 fork on github and then make a branch named "nmf" for instance
(start with one algo at a time to make it easier for others to review
your work by looking at a somewhat small and autonomous chunk of code
at a time).

If you need help working with several named git branches please ask.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Alexandre Gramfort
2010-11-24 12:52:34 UTC
Permalink
Post by Olivier Grisel
+1 fork on github and then make a branch named "nmf" for instance
(start with one algo at a time to make it easier for others to review
your work by looking at a somewhat small and autonomous chunk of code
at a time).
If you need help working with several named git branches please ask.
this means doing something like this:

git clone ***@github.com:scikit-learn/scikit-learn.git
git remote add me ***@github.com:vlad/scikit-learn.git
git co -b nmf # to create the nmf branch
git commit ... # many times
git push me nmf

then ask for the pull request on github

hope this helps.

Alex
Olivier Grisel
2010-11-24 13:02:34 UTC
Permalink
Post by Alexandre Gramfort
Post by Olivier Grisel
+1 fork on github and then make a branch named "nmf" for instance
(start with one algo at a time to make it easier for others to review
your work by looking at a somewhat small and autonomous chunk of code
at a time).
If you need help working with several named git branches please ask.
git co -b nmf # to create the nmf branch
git commit ... # many times
git push me nmf
git checkout -b nmf

the "co" alias is not defined by default (see
https://git.wiki.kernel.org/index.php/Aliases for configuring your own
aliases).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Vlad Niculae
2010-11-30 16:30:32 UTC
Permalink
I have pushed my work in progress on the NMF implementation to my repository.
https://github.com/vene/scikit-learn/blob/nmf/scikits/learn/nmf.py

It needs some tweaking because sometimes (apparently at random)
convergence is slow.
Also, I have not yet implemented transform. However I want to know if
I am on the right path.

Thanks!


On Wed, Nov 24, 2010 at 3:02 PM, Olivier Grisel
Post by Olivier Grisel
Post by Alexandre Gramfort
Post by Olivier Grisel
+1 fork on github and then make a branch named "nmf" for instance
(start with one algo at a time to make it easier for others to review
your work by looking at a somewhat small and autonomous chunk of code
at a time).
If you need help working with several named git branches please ask.
git co -b nmf # to create the nmf branch
git commit ... # many times
git push me nmf
git checkout -b nmf
the "co" alias is not defined by default (see
https://git.wiki.kernel.org/index.php/Aliases for configuring your own
aliases).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2010-11-30 22:24:34 UTC
Permalink
Post by Vlad Niculae
I have pushed my work in progress on the NMF implementation to my repository.
https://github.com/vene/scikit-learn/blob/nmf/scikits/learn/nmf.py
It needs some tweaking because sometimes (apparently at random)
convergence is slow.
Also, I have not yet implemented transform. However I want to know if
I am on the right path.
Couple of early feedback (I haven't run the code yet):

- this code is a translation of matlab code: what is the license of
the original source code? If it is not MIT or BSD we should ask the
original authors explicitly for there permission to distribute this
translation under the BSD license. If they don't accept we should drop
this code base and restart from scratch to avoid legal / copyright
issues.
- please follow the PEP8 (esp. for operator spacing) and PEP-257 for
the docstring format
- write some (fast, less than 1s) tests on toy problems + corner
cases (e.g. all zeros vectors for instance) in a dedicated test file
- you have print statements in you doctest than will fail when
executed since you do not display the expected output (you can run the
complete test suite including doctests with "make test test-doc" ,
install nosetest first offcourse)
- would be great to write some documentation in the doc folder
- maybe write a benchmark file in the benchmarks folder to plot the
evolution of the computation time with varying n_samples / n_features
(you can use the 3D plot in the SVD benchmark as an example).
- if max_iter is reached without convergence you should just print a
warning and return the current state as done in coordinate descent
implementation of ElasticNet for instance:

https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/linear_model/coordinate_descent.py#L100

- tolerance and max_iter should be model parameters defined with
default values in the constructor (maybe overridable in the fit method
as optional argument).
- you should make it possible as a constructor option to use a random
positive matrices instead of the SVD init trick: I would like to know
whether it really brings a perf boost or not in the benchmark

Ok I stop here, I think this is enough for a first feedback :)

A question:

- how does it compare with the implementation from milk Alexandre
Gramfort sent on the mailing list a couple of days ago? Have you
compare the convergence speed on non trivial datasets? You can use a
subsample for the faces array file from the face recognition example
for instance:

https://github.com/scikit-learn/scikit-learn/blob/master/examples/plot_face_recognition.py

Thanks again for contributing to the project!
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Vlad Niculae
2010-11-30 22:33:04 UTC
Permalink
Regarding the license: When Peter provided the link I believe he said
that it has the same license as libsvm so there should be no problem.
I will check and at least notify the authors anyway, but I wanted to
at least begin the work before doing this part.

Regarding the rest, thank you very much for the feedback. It is
exactly the kind of feedback that I was hoping for. I will get to work
tomorrow.

On Wed, Dec 1, 2010 at 12:24 AM, Olivier Grisel
Post by Vlad Niculae
I have pushed my work in progress on the NMF implementation to my repository.
https://github.com/vene/scikit-learn/blob/nmf/scikits/learn/nmf.py
It needs some tweaking because sometimes (apparently at random)
convergence is slow.
Also, I have not yet implemented transform. However I want to know if
I am on the right path.
 - this code is a translation of matlab code: what is the license of
the original source code? If it is not MIT or BSD we should ask the
original authors explicitly for there permission to distribute this
translation under the BSD license. If they don't accept we should drop
this code base and restart from scratch to avoid legal / copyright
issues.
 - please follow the PEP8 (esp. for operator spacing) and PEP-257 for
the docstring format
 - write some (fast, less than 1s) tests on toy problems + corner
cases (e.g. all zeros vectors for instance) in a dedicated test file
 - you have print statements in you doctest than will fail when
executed since you do not display the expected output (you can run the
complete test suite including doctests with "make test test-doc" ,
install nosetest first offcourse)
 - would be great to write some documentation in the doc folder
 - maybe write a benchmark file in the benchmarks folder to plot the
evolution of the computation time with varying n_samples / n_features
(you can use the 3D plot in the SVD benchmark as an example).
 - if max_iter is reached without convergence you should just print a
warning and return the current state as done in coordinate descent
 https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/linear_model/coordinate_descent.py#L100
 - tolerance and max_iter  should be model parameters defined with
default values in the constructor (maybe overridable in the fit method
as optional argument).
 - you should make it possible as a constructor option to use a random
positive matrices instead of the SVD init trick: I would like to know
whether it really brings a perf boost or not in the benchmark
Ok I stop here, I think this is enough for a first feedback :)
 - how does it compare with the implementation from milk Alexandre
Gramfort sent on the mailing list a couple of days ago? Have you
compare the convergence speed on non trivial datasets? You can use a
subsample for the faces array file from the face recognition example
 https://github.com/scikit-learn/scikit-learn/blob/master/examples/plot_face_recognition.py
Thanks again for contributing to the project!
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2010-11-30 22:51:50 UTC
Permalink
Post by Vlad Niculae
Regarding the license: When Peter provided the link I believe he said
that it has the same license as libsvm so there should be no problem.
I will check and at least notify the authors anyway, but I wanted to
at least begin the work before doing this part.
Ok this is great. Just state that the license is BSD in head of the
file as this is done in other scikit-learn source files (BSD is both
the license of LIBSVM / the original matlab NNMF and scikit-learn).
Post by Vlad Niculae
Regarding the rest, thank you very much for the feedback. It is
exactly the kind of feedback that I was hoping for. I will get to work
tomorrow.
Great.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Peter Prettenhofer
2010-11-30 22:52:47 UTC
Permalink
[..]
 - this code is a translation of matlab code: what is the license of
the original source code? If it is not MIT or BSD we should ask the
original authors explicitly for there permission to distribute this
translation under the BSD license. If they don't accept we should drop
this code base and restart from scratch to avoid legal / copyright
issues.
here's the copyright statement:
http://www.csie.ntu.edu.tw/~cjlin/nmf/COPYRIGHT

It's the same as the one for Libsvm - it should be fine.
 - please follow the PEP8 (esp. for operator spacing) and PEP-257 for
the docstring format
I recommend the command line tools pyflakes [1] and pep8 [2] for
static code and format checking. I've integrated them into my emacs
workflow which is really handy [3].

[1] http://www.divmod.org/trac/wiki/DivmodPyflakes
[2] http://pypi.python.org/pypi/pep8
[3] http://reinout.vanrees.org/weblog/2010/05/11/pep8-pyflakes-emacs.html

thanks for the contribution, Vlad - and thanks to Olivier for the
throughout feedback!

best,
Peter
--
Peter Prettenhofer
Vlad Niculae
2010-12-13 23:49:39 UTC
Permalink
Thank you for all the style related feedback, I got my workflow much
more sorted out now.

Here is some new progress:
https://github.com/vene/scikit-learn/commit/40ac5441518d5f7db67fbe6d7f5badfe53077cb9

The nmf code seems to work, I wrote some tests (left some failing by
default so I know to work on them ASAP)

More interestingly, I made a benchmark. Please note that the second
graph plotted is actually reconstruction error (|| X - WH ||_2), but
it's late at night and I
didn't bother yet to change the axis label. It took a while to run it
while tweaking the parameters.

Interestingly, the projected gradient method is really really fast
(but the error is the largest ~= 6) with random initialization, but
all that speed is lost when using svd-based initialization.
For tolerance=0.001 there are little differences in both speed and
error between svd projgrad nmf and multiplicative update random nmf.

However, tolerance signifies different things to the two algorithms.
Through tweaking of this parameter, svd-projgrad-nmf might be made
useful.

Best regards
Vlad


On Wed, Dec 1, 2010 at 12:52 AM, Peter Prettenhofer
Post by Peter Prettenhofer
[..]
 - this code is a translation of matlab code: what is the license of
the original source code? If it is not MIT or BSD we should ask the
original authors explicitly for there permission to distribute this
translation under the BSD license. If they don't accept we should drop
this code base and restart from scratch to avoid legal / copyright
issues.
http://www.csie.ntu.edu.tw/~cjlin/nmf/COPYRIGHT
It's the same as the one for Libsvm - it should be fine.
 - please follow the PEP8 (esp. for operator spacing) and PEP-257 for
the docstring format
I recommend the command line tools pyflakes [1] and pep8 [2] for
static code and format checking. I've integrated them into my emacs
workflow which is really handy [3].
[1] http://www.divmod.org/trac/wiki/DivmodPyflakes
[2] http://pypi.python.org/pypi/pep8
[3] http://reinout.vanrees.org/weblog/2010/05/11/pep8-pyflakes-emacs.html
thanks for the contribution, Vlad - and thanks to Olivier for the
throughout feedback!
best,
 Peter
--
Peter Prettenhofer
------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Alexandre Gramfort
2010-12-14 01:47:38 UTC
Permalink
Hi Vlad,

it looks great !

I think it would be good to add examples.
I was looking for something like plot_nmf.py but could not find it.
Maybe an example based on digits dataset would be interesting.
A better result could maybe justify the use of the SVD as initialization.

More questions:
- Why not considering : "Algorithms for Non-negative Matrix Factorization"
by Daniel D Lee, Sebastian H Seung ?

and "Non-negative Matrix Factorisation with Sparseness Constraints"
by Patrik Hoyer
in Journal of Machine Learning Research 5 (2004) 1457--1469

How would it compare?

Sorry for asking naive questions.

Alex
Post by Vlad Niculae
Thank you for all the style related feedback, I got my workflow much
more sorted out now.
https://github.com/vene/scikit-learn/commit/40ac5441518d5f7db67fbe6d7f5badfe53077cb9
The nmf code seems to work, I wrote some tests (left some failing by
default so I know to work on them ASAP)
More interestingly, I made a benchmark. Please note that the second
graph plotted is actually reconstruction error (|| X - WH ||_2), but
it's late at night and I
didn't bother yet to change the axis label. It took a while to run it
while tweaking the parameters.
Interestingly, the projected gradient method is really really fast
(but the error is the largest ~= 6) with random initialization, but
all that speed is lost when using svd-based initialization.
For tolerance=0.001 there are little differences in both speed and
error between svd projgrad nmf and multiplicative update random nmf.
However, tolerance signifies different things to the two algorithms.
Through tweaking of this parameter, svd-projgrad-nmf might be made
useful.
Best regards
Vlad
On Wed, Dec 1, 2010 at 12:52 AM, Peter Prettenhofer
Post by Peter Prettenhofer
[..]
 - this code is a translation of matlab code: what is the license of
the original source code? If it is not MIT or BSD we should ask the
original authors explicitly for there permission to distribute this
translation under the BSD license. If they don't accept we should drop
this code base and restart from scratch to avoid legal / copyright
issues.
http://www.csie.ntu.edu.tw/~cjlin/nmf/COPYRIGHT
It's the same as the one for Libsvm - it should be fine.
 - please follow the PEP8 (esp. for operator spacing) and PEP-257 for
the docstring format
I recommend the command line tools pyflakes [1] and pep8 [2] for
static code and format checking. I've integrated them into my emacs
workflow which is really handy [3].
[1] http://www.divmod.org/trac/wiki/DivmodPyflakes
[2] http://pypi.python.org/pypi/pep8
[3] http://reinout.vanrees.org/weblog/2010/05/11/pep8-pyflakes-emacs.html
thanks for the contribution, Vlad - and thanks to Olivier for the
throughout feedback!
best,
 Peter
--
Peter Prettenhofer
------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2010-12-14 10:31:14 UTC
Permalink
Post by Peter Prettenhofer
Hi Vlad,
it looks great !
I think it would be good to add examples.
I was looking for something like plot_nmf.py but could not find it.
Maybe an example based on digits dataset would be interesting.
A better result could maybe justify the use of the SVD as initialization.
- Why not considering :  "Algorithms for Non-negative Matrix Factorization"
   by Daniel D Lee, Sebastian H Seung ?
and "Non-negative Matrix Factorisation with Sparseness Constraints"
   by Patrik Hoyer
   in Journal of Machine Learning Research 5 (2004) 1457--1469
How would it compare?
Same comments here: it would be nice to check whether you code results
that look the same as the one found by the milk implementation (that
Alex extracted and posted on this thread earlier) and compare the
runtimes using your benchmark.

I would also really like to have an example to understand whether the
SVD init brings anything or not. If not we might just want to get rid
of it to make the code simpler.

Also on the code style side, can you please run pep8
http://pypi.python.org/pypi/pep8 on your code and fix the errors. To
my eye the weird looking style convention tend to distract from
actually reading the code.

Keep up with the good work and thanks again for contributing!
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Olivier Grisel
2010-12-14 10:38:48 UTC
Permalink
Post by Olivier Grisel
Post by Peter Prettenhofer
Hi Vlad,
it looks great !
I think it would be good to add examples.
I was looking for something like plot_nmf.py but could not find it.
Maybe an example based on digits dataset would be interesting.
A better result could maybe justify the use of the SVD as initialization.
- Why not considering :  "Algorithms for Non-negative Matrix Factorization"
   by Daniel D Lee, Sebastian H Seung ?
and "Non-negative Matrix Factorisation with Sparseness Constraints"
   by Patrik Hoyer
   in Journal of Machine Learning Research 5 (2004) 1457--1469
How would it compare?
Same comments here: it would be nice to check whether you code results
that look the same as the one found by the milk implementation (that
Alex extracted and posted on this thread earlier) and compare the
runtimes using your benchmark.
Actually by giving your code a second look found that alt_nmf is
indeed the version of Lee and Seung. Great. We just need some examples
on the digits dataset to see how it looks then.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Olivier Grisel
2010-12-14 10:48:38 UTC
Permalink
Another comments: it would be nice to study the impact of n_comp (to
be renamed n_components :) on the speed and quality of the results of
the various implementations.

It might be the case that the convergence is faster with SVD init for
small n_components and that Lee and Seung is more interesting for
larger n_components.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Vlad Niculae
2011-03-14 23:32:42 UTC
Permalink
Hello everybody, I am revisiting this old thread but I'm here with new stuff :)

So: I noticed from the benchmark that the projected gradient method
converges pretty fast, but if used with random initial values, it
misses by a lot, while the multiplicative update does OK. When
initialized with fast_svd, the reconstruction error improves a lot,
but the fast_svd initialization seemed quite slow to me.

So I looked for different initialization methods and I found this.
http://www.postech.ac.kr/~seungjin/publications/icassp07_ydkim.pdf
They claim that it improves sparsity. Note that because of the
reversal of the construction as opposed to the SVD initialization,
this might perform better when n_comp is close to n_features. However,
for very large n_features, it might not be usable. Of course, first
thing will be to bench it vs the other initialization method we have.

So here is my implementation of CRO-based Hierarchical Clustering:
https://github.com/vene/scikit-learn/commit/e05b7958bf3644f0ce92fffe8b84ccabeb632fc3
Of course I added the parameter NMF(initial="cro")

It's still a work in progress but I will use this intensively in the
following days in order to complete my project. I aim for a pull
request before the end of the month.
Olivier Grisel
2011-03-16 11:25:12 UTC
Permalink
Thanks for the update. Just a quick remark: I think cro clustering
should go under the scikits.learn.cluster package and that we should
introduce a new package to gather matrix decomposition methods like
nmf, pca, sparse pca, ...
--
Olivier
Alexandre Gramfort
2011-03-16 13:34:47 UTC
Permalink
Post by Olivier Grisel
Thanks for the update. Just a quick remark: I think cro clustering
should go under the scikits.learn.cluster package
+1
Post by Olivier Grisel
and that we should
introduce a new package to gather matrix decomposition methods like
nmf, pca, sparse pca, ...
we discussed this in the past but I don't remember what was the conclusion.
What to you think of :

scikits.learn.factorization
scikits.learn.matrix_factorization
scikits.learn.decomposition
scikits.learn.pca (and we put all in it)
scikits.learn.latent (just kidding)

Alex
Vlad Niculae
2011-03-16 15:45:10 UTC
Permalink
Post by Alexandre Gramfort
Post by Olivier Grisel
Thanks for the update. Just a quick remark: I think cro clustering
should go under the scikits.learn.cluster package
+1
+1, but should I change it to cluster rows instead of columns?
Post by Alexandre Gramfort
Post by Olivier Grisel
and that we should
introduce a new package to gather matrix decomposition methods like
nmf, pca, sparse pca, ...
we discussed this in the past but I don't remember what was the conclusion.
scikits.learn.factorization
scikits.learn.matrix_factorization
scikits.learn.decomposition
scikits.learn.pca (and we put all in it)
scikits.learn.latent (just kidding)
I like factorization and decomposition. Or maybe something related to
dimensionality reduction, feature extraction, to be more in the
context of learning?


The CRO implementation is slow because it basically tries at every
step every $d^2$ possible pairs, computing a small 2x2 svd for each.
Do you think that if I try to write it in cython it would result in a
sensible improvement? Or would the numpy svd call have too big
overhead?

I also intend to revisit the NNDSVD initialization method because I
found that the authors have a website where they present an algorithm
with more features than the one in the paper I used as reference.
Olivier Grisel
2011-03-16 16:32:15 UTC
Permalink
Post by Vlad Niculae
Post by Alexandre Gramfort
Post by Olivier Grisel
Thanks for the update. Just a quick remark: I think cro clustering
should go under the scikits.learn.cluster package
+1
+1, but should I change it to cluster rows instead of columns?
The API should be consistent with other cluster implementations of the
scikits, so yes.
Post by Vlad Niculae
Post by Alexandre Gramfort
Post by Olivier Grisel
and that we should
introduce a new package to gather matrix decomposition methods like
nmf, pca, sparse pca, ...
we discussed this in the past but I don't remember what was the conclusion.
scikits.learn.factorization
scikits.learn.matrix_factorization
scikits.learn.decomposition
scikits.learn.pca (and we put all in it)
scikits.learn.latent (just kidding)
I like factorization and decomposition. Or maybe something related to
dimensionality reduction, feature extraction, to be more in the
context of learning?
I prefer decomposition or factorization. Better be specific.
Furthermore when there is a sparse prior this is not always
dimensionality reduction. Also feature extraction is too vague:
distance to kmeans centers can be used for feature extraction along
with auto-encoders and RBMs.
Post by Vlad Niculae
The CRO implementation is slow because it basically tries at every
step every $d^2$ possible pairs, computing a small 2x2 svd for each.
Do you think that if I try to write it in cython it would result in a
sensible improvement? Or would the numpy svd call have too big
overhead?
You should bench and profile the existing python code to know which
part is the bottleneck.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Mathieu Blondel
2011-03-17 07:29:46 UTC
Permalink
I think that we should be careful of what we put in scikit-learn space
and user space. It seems wise to restrict to algorithms/models which
are relatively well-known and widely used by the community. Also, I
agree with Gael's vision of scikit-learn as providing
"building-blocks". Is CRO well known enough to warrant its own module?
Would that be alright to move the code to nmf.py instead?

Mathieu
Alexandre Gramfort
2011-03-17 12:49:02 UTC
Permalink
Post by Mathieu Blondel
I think that we should be careful of what we put in scikit-learn space
and user space. It seems wise to restrict to algorithms/models which
are relatively well-known and widely used by the community. Also, I
agree with Gael's vision of scikit-learn as providing
"building-blocks". Is CRO well known enough to warrant its own module?
I have to admit I have the same concerns.
Post by Mathieu Blondel
Would that be alright to move the code to nmf.py instead?
+0.5

Alex
Olivier Grisel
2011-03-17 18:12:44 UTC
Permalink
Post by Alexandre Gramfort
Post by Mathieu Blondel
I think that we should be careful of what we put in scikit-learn space
and user space. It seems wise to restrict to algorithms/models which
are relatively well-known and widely used by the community. Also, I
agree with Gael's vision of scikit-learn as providing
"building-blocks". Is CRO well known enough to warrant its own module?
I have to admit I have the same concerns.
Post by Mathieu Blondel
Would that be alright to move the code to nmf.py instead?
+0.5
Ok then :)

Also it would be worth investigating whether CRO can be replaced by
the ward hierarchical clustering from
https://github.com/scikit-learn/scikit-learn/pull/69 (if it achieves
the same goal and happen to be more scalable which I have no idea
whether this is the case or not).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Loading...