Discussion:
[Scikit-learn-general] Sparse PCA
Vlad Niculae
2011-05-10 08:42:27 UTC
Permalink
Hi all
I wanted to start working on the integration of the Sparse PCA gist. I
made a new branch, put the file inside and ran it but it doesn't work
right.
With method='lars', the update_V step sets V to zero everywhere. With
method='lasso' (I had to change maxit to max_iter in the call) I get
good results but still the assertion that the cost improves fails
often, I commented it out. Hopefully it is due to numeric instability.

I have to go to an exam now, but I wanted to send this so that I can
keep you updated. If you want to try running the code to confirm that
it's not something from my system/installation, it is in my repo in
the sparsepca branch.

PS: could I get a link to what is implemented there? At first glance
it didn't seem like what's in Mairal et al. 2010.
Best,
Vlad
Gael Varoquaux
2011-05-10 09:07:26 UTC
Permalink
Post by Vlad Niculae
I wanted to start working on the integration of the Sparse PCA gist. I
made a new branch, put the file inside and ran it but it doesn't work
right.
With method='lars', the update_V step sets V to zero everywhere. With
method='lasso' (I had to change maxit to max_iter in the call) I get
good results but still the assertion that the cost improves fails
often, I commented it out. Hopefully it is due to numeric instability.
It's a bug. We have fixed it, I'll update the Gist ASAP.
Post by Vlad Niculae
PS: could I get a link to what is implemented there? At first glance
it didn't seem like what's in Mairal et al. 2010.
update_U is (almost) what's in Mairal et al 2010. update_V is not: the
algorithm is not online, it is a conventional batch algorithm.

G
Gael Varoquaux
2011-05-10 09:17:03 UTC
Permalink
Post by Gael Varoquaux
Post by Vlad Niculae
I wanted to start working on the integration of the Sparse PCA gist. I
made a new branch, put the file inside and ran it but it doesn't work
right.
With method='lars', the update_V step sets V to zero everywhere. With
method='lasso' (I had to change maxit to max_iter in the call) I get
good results but still the assertion that the cost improves fails
often, I commented it out. Hopefully it is due to numeric instability.
It's a bug. We have fixed it, I'll update the Gist ASAP.
Should be fixed.

G
Vlad Niculae
2011-05-10 10:07:42 UTC
Permalink
Fixed indeed, works great! However

assert(dE >= -tol * E[-1])

still fails when using lasso, and tol is very very small.
I found this because I forgot an exaggerated tol value (1e-20) that I
left in from where I was trying to locate the bug.

Is this due to the hardcoded `tol=1e-8` in the call to lasso fit?

On Tue, May 10, 2011 at 12:17 PM, Gael Varoquaux
Post by Gael Varoquaux
Post by Gael Varoquaux
Post by Vlad Niculae
I wanted to start working on the integration of the Sparse PCA gist. I
made a new branch, put the file inside and ran it but it doesn't work
right.
With method='lars', the update_V step sets V to zero everywhere. With
method='lasso' (I had to change maxit to max_iter in the call) I get
good results but still the assertion that the cost improves fails
often, I commented it out. Hopefully it is due to numeric instability.
It's a bug. We have fixed it, I'll update the Gist ASAP.
Should be fixed.
G
------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vlad Niculae
2011-05-10 10:09:47 UTC
Permalink
Indeed I passed the tol from sparse_pca to _update_V as a parameter,
and then forward to Lasso fit, and the assert didn't fail this time.
Post by Vlad Niculae
Fixed indeed, works great! However
assert(dE >= -tol * E[-1])
still fails when using lasso, and tol is very very small.
I found this because I forgot an exaggerated tol value (1e-20) that I
left in from where I was trying to locate the bug.
Is this due to the hardcoded `tol=1e-8` in the call to lasso fit?
On Tue, May 10, 2011 at 12:17 PM, Gael Varoquaux
Post by Gael Varoquaux
Post by Gael Varoquaux
Post by Vlad Niculae
I wanted to start working on the integration of the Sparse PCA gist. I
made a new branch, put the file inside and ran it but it doesn't work
right.
With method='lars', the update_V step sets V to zero everywhere. With
method='lasso' (I had to change maxit to max_iter in the call) I get
good results but still the assertion that the cost improves fails
often, I commented it out. Hopefully it is due to numeric instability.
It's a bug. We have fixed it, I'll update the Gist ASAP.
Should be fixed.
G
------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gael Varoquaux
2011-05-10 10:13:31 UTC
Permalink
Post by Vlad Niculae
Indeed I passed the tol from sparse_pca to _update_V as a parameter,
and then forward to Lasso fit, and the assert didn't fail this time.
Seems like you have things under control :).

Testing this bastard is going to be a bit nasty. I suspect that there is
some work hinding here.

I have a pretty bad week, but don't hesitate to ask for some help on the
mailing list when you encounter things you don't understand. I probably
won't answer before the week end.

G
Vlad Niculae
2011-05-10 10:18:43 UTC
Permalink
Sure, and this assertion is not really that urgent, the failures are
small enough to be accounted for by stability issues, I think.
I managed to get it to fail again using tol=1e-20.

However I don't think it's bad to pass the tol through to the lasso fit.

I'm gonna leave the assert like that for now, and work on the
SparsePCA class. Sorry for spamming the thread, I just want to
document this stuff somewhere as I encounter it.

No need for feedback for now
Best,
Vlad

On Tue, May 10, 2011 at 1:13 PM, Gael Varoquaux
Post by Gael Varoquaux
Post by Vlad Niculae
Indeed I passed the tol from sparse_pca to _update_V as a parameter,
and then forward to Lasso fit, and the assert didn't fail this time.
Seems like you have things under control :).
Testing this bastard is going to be a bit nasty. I suspect that there is
some work hinding here.
I have a pretty bad week, but don't hesitate to ask for some help on the
mailing list when you encounter things you don't understand. I probably
won't answer before the week end.
G
------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Alexandre Gramfort
2011-05-10 13:32:16 UTC
Permalink
Hi Vlad,

glad to see that you've started to dig into this.

In this code every step should see the cost function decrease ie. each
update of the dictionary and every step of the sparse coding (Lasso or
LARSLasso). If it's not the case and the errors
are small than it's definitely a numerical rounding pb. Even running
Lasso with a few iterations
should make the cost function decrease. The assert(dE >= -tol * E[-1])
was definitely a sanity check in the optim.

Alex
Post by Vlad Niculae
Fixed indeed, works great! However
assert(dE >= -tol * E[-1])
still fails when using lasso, and tol is very very small.
I found this because I forgot an exaggerated tol value (1e-20) that I
left in from where I was trying to locate the bug.
Is this due to the hardcoded `tol=1e-8` in the call to lasso fit?
On Tue, May 10, 2011 at 12:17 PM, Gael Varoquaux
Post by Gael Varoquaux
Post by Gael Varoquaux
Post by Vlad Niculae
I wanted to start working on the integration of the Sparse PCA gist. I
made a new branch, put the file inside and ran it but it doesn't work
right.
With method='lars', the update_V step sets V to zero everywhere. With
method='lasso' (I had to change maxit to max_iter in the call) I get
good results but still the assertion that the cost improves fails
often, I commented it out. Hopefully it is due to numeric instability.
It's a bug. We have fixed it, I'll update the Gist ASAP.
Should be fixed.
G
------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vlad Niculae
2011-05-23 13:34:55 UTC
Permalink
Hi everybody,
Because I've been neglecting my duties lately, I wanted to let you
know that starting today I won't neglect them anymore :)
As required by the PSF, I posted regarding the SparsePCA code that I'm
working on integrating: http://venefrombucharest.wordpress.com/?p=74
I didn't put in the latest update in the gist yet.
I implemented transform as a least squares fit followed by
normalization. Unfortunately I can't test that fit transform is the
same to fit followed by transform because, well, they differ pretty
much in some (few) places. Do you have any quick thoughts on this?
IIUC, as long as V is fixed, U is optimized by the least squares
solution, isn't it?

Best,
Vlad

On Tue, May 10, 2011 at 4:32 PM, Alexandre Gramfort
Post by Alexandre Gramfort
Hi Vlad,
glad to see that you've started to dig into this.
In this code every step should see the cost function decrease ie. each
update of the dictionary and every step of the sparse coding (Lasso or
LARSLasso). If it's not the case and the errors
are small than it's definitely a numerical rounding pb. Even running
Lasso with a few iterations
should make the cost function decrease. The assert(dE >= -tol * E[-1])
was definitely a sanity check in the optim.
Alex
Post by Vlad Niculae
Fixed indeed, works great! However
assert(dE >= -tol * E[-1])
still fails when using lasso, and tol is very very small.
I found this because I forgot an exaggerated tol value (1e-20) that I
left in from where I was trying to locate the bug.
Is this due to the hardcoded `tol=1e-8` in the call to lasso fit?
On Tue, May 10, 2011 at 12:17 PM, Gael Varoquaux
Post by Gael Varoquaux
Post by Gael Varoquaux
Post by Vlad Niculae
I wanted to start working on the integration of the Sparse PCA gist. I
made a new branch, put the file inside and ran it but it doesn't work
right.
With method='lars', the update_V step sets V to zero everywhere. With
method='lasso' (I had to change maxit to max_iter in the call) I get
good results but still the assertion that the cost improves fails
often, I commented it out. Hopefully it is due to numeric instability.
It's a bug. We have fixed it, I'll update the Gist ASAP.
Should be fixed.
G
------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Alexandre Gramfort
2011-05-23 13:45:40 UTC
Permalink
Hi Vlad,
Post by Vlad Niculae
I implemented transform as a least squares fit followed by
normalization. Unfortunately I can't test that fit transform is the
same to fit followed by transform because, well, they differ pretty
much in some (few) places. Do you have any quick thoughts on this?
IIUC, as long as V is fixed, U is optimized by the least squares
solution, isn't it?
well, once the dictionary is learnt the transform should give you the
decompositions (the coefs) of new samples on this dict. There are
many ways to do it.

- do a new step of sparse coding with a Lasso (same lambda or not)
- do a least square (LS)
- do an orthogonal matching pursuit (OMP) to keep the sparsity.

both Lasso and OMP will give you sparse loadings (coefs) but LS
might be your best choice depending on your application and probably
the size of the dictionary. LS is likely to be bad with a big dict.

hope this helps
Alex
Vlad Niculae
2011-05-23 13:52:30 UTC
Permalink
I thought that we are treating the components as sparse here. What you
are saying seems to be from the "dictionary learning" point of view
where we want sparse data (sparse output from transform), but here we
look for sparse atoms. Am I wrong about this?

This has been discussed in the comments of the gist too,
https://gist.github.com/800069#gistcomment-19767

On Mon, May 23, 2011 at 4:45 PM, Alexandre Gramfort
Post by Alexandre Gramfort
Hi Vlad,
Post by Vlad Niculae
I implemented transform as a least squares fit followed by
normalization. Unfortunately I can't test that fit transform is the
same to fit followed by transform because, well, they differ pretty
much in some (few) places. Do you have any quick thoughts on this?
IIUC, as long as V is fixed, U is optimized by the least squares
solution, isn't it?
well, once the dictionary is learnt the transform should give you the
decompositions (the coefs) of new samples on this dict. There are
many ways to do it.
- do a new step of sparse coding with a Lasso (same lambda or not)
- do a least square (LS)
- do an orthogonal matching pursuit (OMP) to keep the sparsity.
both Lasso and OMP will give you sparse loadings (coefs) but LS
might be your best choice depending on your application and probably
the size of the dictionary. LS is likely to be bad with a big dict.
hope this helps
Alex
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general