Discussion:
[Scikit-learn-general] GSoC2015 Improve GMM
Wei Xue
2015-03-16 20:23:33 UTC
Permalink
Hi groups,

I am a PhD student in Florida International University, US. I am interested
in the topic improving GMM. I draft a proposal for this topic.
https://github.com/xuewei4d/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM

Here are some questions I would like to discuss.

1. -1 for coreset. The paper(
http://las.ethz.ch/files/feldman11scalable-long.pdf) is new and its
citations less than 15. The application situations are on clusters,
streaming data, which is (I think) is rare for scikit-learn.

2. Currently, I have gone over the Approximation Inference chapter in PRML
(Bishop's machine learning book) and Blei's 2006 paper. But I have not dig
much into the code, so I don't have a detailed reimplement plan yet. Do I
need to add more details into the 'Theory and Implementation' part of the
proposal?

3. Any feedback is welcome.

Thanks,
Wei Xue
Andreas Mueller
2015-03-16 20:36:09 UTC
Permalink
Hi Wei Xue.
I am also not very convinced by the core-set approach.
I'd rather focus on improving the API and fixing issues in the VBGMM and
DPGMM.
I was hoping that Murphy's book has some more details on DPGMM, but I
didn't find any yet. He doesn't seem to talk about variational inference
in Dirichlet processes.

So far I think your proposal looks solid.
It would be great if you could work on some pull requests to support
your application.

Best,
Andy
Post by Wei Xue
Hi groups,
I am a PhD student in Florida International University, US. I am
interested in the topic improving GMM. I draft a proposal for this topic.
https://github.com/xuewei4d/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM
Here are some questions I would like to discuss.
1. -1 for coreset. The
paper(http://las.ethz.ch/files/feldman11scalable-long.pdf) is new and
its citations less than 15. The application situations are on
clusters, streaming data, which is (I think) is rare for scikit-learn.
2. Currently, I have gone over the Approximation Inference chapter in
PRML (Bishop's machine learning book) and Blei's 2006 paper. But I
have not dig much into the code, so I don't have a detailed
reimplement plan yet. Do I need to add more details into the 'Theory
and Implementation' part of the proposal?
3. Any feedback is welcome.
Thanks,
Wei Xue
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Wei Xue
2015-03-24 00:09:00 UTC
Permalink
Hi Andreas,

I have submitted my updated proposal as well.


Thanks!
Wei Xue
​
Post by Andreas Mueller
Hi Wei Xue.
I am also not very convinced by the core-set approach.
I'd rather focus on improving the API and fixing issues in the VBGMM and
DPGMM.
I was hoping that Murphy's book has some more details on DPGMM, but I
didn't find any yet. He doesn't seem to talk about variational inference in
Dirichlet processes.
So far I think your proposal looks solid.
It would be great if you could work on some pull requests to support your
application.
Best,
Andy
Hi groups,
I am a PhD student in Florida International University, US. I am
interested in the topic improving GMM. I draft a proposal for this topic.
https://github.com/xuewei4d/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM
Here are some questions I would like to discuss.
1. -1 for coreset. The paper(
http://las.ethz.ch/files/feldman11scalable-long.pdf) is new and its
citations less than 15. The application situations are on clusters,
streaming data, which is (I think) is rare for scikit-learn.
2. Currently, I have gone over the Approximation Inference chapter in
PRML (Bishop's machine learning book) and Blei's 2006 paper. But I have not
dig much into the code, so I don't have a detailed reimplement plan yet. Do
I need to add more details into the 'Theory and Implementation' part of
the proposal?
3. Any feedback is welcome.
Thanks,
Wei Xue
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-24 00:11:17 UTC
Permalink
Thanks I just saw.
I'll give it a read tomorrow.

Andy
Post by Wei Xue
Hi Andreas,
I have submitted my updated proposal as well.
Thanks!
Wei Xue
​
Hi Wei Xue.
I am also not very convinced by the core-set approach.
I'd rather focus on improving the API and fixing issues in the
VBGMM and DPGMM.
I was hoping that Murphy's book has some more details on DPGMM,
but I didn't find any yet. He doesn't seem to talk about
variational inference in Dirichlet processes.
So far I think your proposal looks solid.
It would be great if you could work on some pull requests to
support your application.
Best,
Andy
Post by Wei Xue
Hi groups,
I am a PhD student in Florida International University, US. I am
interested in the topic improving GMM. I draft a proposal for this topic.
https://github.com/xuewei4d/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM
Here are some questions I would like to discuss.
1. -1 for coreset. The
paper(http://las.ethz.ch/files/feldman11scalable-long.pdf) is new
and its citations less than 15. The application situations are on
clusters, streaming data, which is (I think) is rare for
scikit-learn.
2. Currently, I have gone over the Approximation Inference
chapter in PRML (Bishop's machine learning book) and Blei's 2006
paper. But I have not dig much into the code, so I don't have a
detailed reimplement plan yet. Do I need to add more details into
the 'Theory and Implementation' part of the proposal?
3. Any feedback is welcome.
Thanks,
Wei Xue
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andy
2015-03-24 19:48:42 UTC
Permalink
Hi Wei Xue.

I think the proposal looks good and the scope should work well.
I feel like the explanation in


Implementing VBGMM

is a bit fuzzy, maybe you can rework it a bit.
Also, for the timeline, the documentation shouldn't come as an afterthought.
Ideally, each improvement is its own pull-request, so that we can start
reviewing and merging code quickly.
For something to be merged, you do need to provide benchmarks, testing
and documentation, though.

You could actually start improving the documentation and examples for
the GMM already during the time you work on the math for the rest.

Best,
Andreas
Post by Wei Xue
Hi Andreas,
I have submitted my updated proposal as well.
Thanks!
Wei Xue
​
Hi Wei Xue.
I am also not very convinced by the core-set approach.
I'd rather focus on improving the API and fixing issues in the
VBGMM and DPGMM.
I was hoping that Murphy's book has some more details on DPGMM,
but I didn't find any yet. He doesn't seem to talk about
variational inference in Dirichlet processes.
So far I think your proposal looks solid.
It would be great if you could work on some pull requests to
support your application.
Best,
Andy
Post by Wei Xue
Hi groups,
I am a PhD student in Florida International University, US. I am
interested in the topic improving GMM. I draft a proposal for this topic.
https://github.com/xuewei4d/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM
Here are some questions I would like to discuss.
1. -1 for coreset. The
paper(http://las.ethz.ch/files/feldman11scalable-long.pdf) is new
and its citations less than 15. The application situations are on
clusters, streaming data, which is (I think) is rare for
scikit-learn.
2. Currently, I have gone over the Approximation Inference
chapter in PRML (Bishop's machine learning book) and Blei's 2006
paper. But I have not dig much into the code, so I don't have a
detailed reimplement plan yet. Do I need to add more details into
the 'Theory and Implementation' part of the proposal?
3. Any feedback is welcome.
Thanks,
Wei Xue
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vlad Niculae
2015-03-25 00:02:03 UTC
Permalink
Hi Wei Xue, hi everyone,

I think Andy’s comments about testing and documentation are very important.

I have just a few things to add:

1. As confused as I am about the world around me, I still knew that the current year is 2015 :P I think that the form is asking “which year of your program you are in.”

2. I think the mathematical derivation part could be considered a documentation task as well.

Hope this helps,

Yours,
Vlad
Post by Andreas Mueller
Hi Wei Xue.
I think the proposal looks good and the scope should work well.
I feel like the explanation in
Implementing VBGMM
is a bit fuzzy, maybe you can rework it a bit.
Also, for the timeline, the documentation shouldn't come as an afterthought.
Ideally, each improvement is its own pull-request, so that we can start reviewing and merging code quickly.
For something to be merged, you do need to provide benchmarks, testing and documentation, though.
You could actually start improving the documentation and examples for the GMM already during the time you work on the math for the rest.
Best,
Andreas
Post by Wei Xue
Hi Andreas,
I have submitted my updated proposal as well.
Thanks!
Wei Xue

Hi Wei Xue.
I am also not very convinced by the core-set approach.
I'd rather focus on improving the API and fixing issues in the VBGMM and DPGMM.
I was hoping that Murphy's book has some more details on DPGMM, but I didn't find any yet. He doesn't seem to talk about variational inference in Dirichlet processes.
So far I think your proposal looks solid.
It would be great if you could work on some pull requests to support your application.
Best,
Andy
Post by Wei Xue
Hi groups,
I am a PhD student in Florida International University, US. I am interested in the topic improving GMM. I draft a proposal for this topic.
https://github.com/xuewei4d/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM
Here are some questions I would like to discuss.
1. -1 for coreset. The paper(http://las.ethz.ch/files/feldman11scalable-long.pdf) is new and its citations less than 15. The application situations are on clusters, streaming data, which is (I think) is rare for scikit-learn.
2. Currently, I have gone over the Approximation Inference chapter in PRML (Bishop's machine learning book) and Blei's 2006 paper. But I have not dig much into the code, so I don't have a detailed reimplement plan yet. Do I need to add more details into the 'Theory and Implementation' part of the proposal?
3. Any feedback is welcome.
Thanks,
Wei Xue
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.
http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.
http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Kyle Kastner
2015-03-25 01:44:53 UTC
Permalink
I like the fact that this can broken into nice parts. I also think
documentation should be farther up the list, and math part lumped in.
GMM cleanup should probably start out of the gate, as fixing that will
define what API/init changes have to stay consistent in the other two
models.

Is there any particular reason to reimplement *all* of the VBGMM and
DPGMM, or are there parts that seem to be reusable? A full on rewrite
of two estimators seems like a lot to take on, especially ones as
mathematically and statistically complicated as these. You might
elaborate on why these two need to be rewritten - specifically what
they are doing currently, and how will that change.

Will users be allowed to set/tweak the burn-in and lag for the sampler
in the DPGMM?
Post by Vlad Niculae
Hi Wei Xue, hi everyone,
I think Andy’s comments about testing and documentation are very important.
1. As confused as I am about the world around me, I still knew that the current year is 2015 :P I think that the form is asking “which year of your program you are in.”
2. I think the mathematical derivation part could be considered a documentation task as well.
Hope this helps,
Yours,
Vlad
Post by Andreas Mueller
Hi Wei Xue.
I think the proposal looks good and the scope should work well.
I feel like the explanation in
Implementing VBGMM
is a bit fuzzy, maybe you can rework it a bit.
Also, for the timeline, the documentation shouldn't come as an afterthought.
Ideally, each improvement is its own pull-request, so that we can start reviewing and merging code quickly.
For something to be merged, you do need to provide benchmarks, testing and documentation, though.
You could actually start improving the documentation and examples for the GMM already during the time you work on the math for the rest.
Best,
Andreas
Post by Wei Xue
Hi Andreas,
I have submitted my updated proposal as well.
Thanks!
Wei Xue
Hi Wei Xue.
I am also not very convinced by the core-set approach.
I'd rather focus on improving the API and fixing issues in the VBGMM and DPGMM.
I was hoping that Murphy's book has some more details on DPGMM, but I didn't find any yet. He doesn't seem to talk about variational inference in Dirichlet processes.
So far I think your proposal looks solid.
It would be great if you could work on some pull requests to support your application.
Best,
Andy
Post by Wei Xue
Hi groups,
I am a PhD student in Florida International University, US. I am interested in the topic improving GMM. I draft a proposal for this topic.
https://github.com/xuewei4d/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM
Here are some questions I would like to discuss.
1. -1 for coreset. The paper(http://las.ethz.ch/files/feldman11scalable-long.pdf) is new and its citations less than 15. The application situations are on clusters, streaming data, which is (I think) is rare for scikit-learn.
2. Currently, I have gone over the Approximation Inference chapter in PRML (Bishop's machine learning book) and Blei's 2006 paper. But I have not dig much into the code, so I don't have a detailed reimplement plan yet. Do I need to add more details into the 'Theory and Implementation' part of the proposal?
3. Any feedback is welcome.
Thanks,
Wei Xue
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.
http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.
http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-25 18:57:37 UTC
Permalink
Post by Kyle Kastner
Will users be allowed to set/tweak the burn-in and lag for the sampler
in the DPGMM?
This is variational!
Wei Xue
2015-03-25 19:00:31 UTC
Permalink
Ha, I just get confused about the sampling in DPGMM :).


Wei Xue
Post by Andreas Mueller
Post by Kyle Kastner
Will users be allowed to set/tweak the burn-in and lag for the sampler
in the DPGMM?
This is variational!
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Kyle Kastner
2015-03-25 19:20:31 UTC
Permalink
There was mention of TDP (blocked Gibbs higher up in the paper) vs
collapsed Gibbs sampling - both mentioned burn-in and lag. I was under
the impression you would have to be using one of these two to do the
computation, see page 137 of the paper just below the pictures, second
paragraph
http://www.cs.berkeley.edu/~jordan/papers/blei-jordan-ba.pdf

I could be mistaken though, this paper is pretty gnarly!

As far as handling missing values, that is an API concern that has not
been fleshed out yet. So I would not concentrate on it at this time,
other algorithms like low rank matrix completion would need to figure
this out before having the potential to be added. AKA what Andy said
(so fast at email!)
Post by Wei Xue
Ha, I just get confused about the sampling in DPGMM :).
Wei Xue
Post by Andreas Mueller
Post by Kyle Kastner
Will users be allowed to set/tweak the burn-in and lag for the sampler
in the DPGMM?
This is variational!
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-25 19:43:49 UTC
Permalink
Post by Kyle Kastner
(so fast at email!)
Aka so slow at actually getting anything done.
Andreas Mueller
2015-03-25 19:53:56 UTC
Permalink
Even higher up, it compares variation, collapsed and truncated.
So the variational does not need any sampling (which makes sense).

Btw, this paper has a couple of references for more detailed equations:
http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-449.pdf
Post by Kyle Kastner
There was mention of TDP (blocked Gibbs higher up in the paper) vs
collapsed Gibbs sampling - both mentioned burn-in and lag. I was under
the impression you would have to be using one of these two to do the
computation, see page 137 of the paper just below the pictures, second
paragraph
http://www.cs.berkeley.edu/~jordan/papers/blei-jordan-ba.pdf
I could be mistaken though, this paper is pretty gnarly!
As far as handling missing values, that is an API concern that has not
been fleshed out yet. So I would not concentrate on it at this time,
other algorithms like low rank matrix completion would need to figure
this out before having the potential to be added. AKA what Andy said
(so fast at email!)
Post by Wei Xue
Ha, I just get confused about the sampling in DPGMM :).
Wei Xue
Post by Andreas Mueller
Post by Kyle Kastner
Will users be allowed to set/tweak the burn-in and lag for the sampler
in the DPGMM?
This is variational!
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Kyle Kastner
2015-03-25 20:07:57 UTC
Permalink
OK, the mention of sampling had me worried! That clears it up, thanks.
And thanks for the paper reference!
Post by Andreas Mueller
Even higher up, it compares variation, collapsed and truncated.
So the variational does not need any sampling (which makes sense).
http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-449.pdf
Post by Kyle Kastner
There was mention of TDP (blocked Gibbs higher up in the paper) vs
collapsed Gibbs sampling - both mentioned burn-in and lag. I was under
the impression you would have to be using one of these two to do the
computation, see page 137 of the paper just below the pictures, second
paragraph
http://www.cs.berkeley.edu/~jordan/papers/blei-jordan-ba.pdf
I could be mistaken though, this paper is pretty gnarly!
As far as handling missing values, that is an API concern that has not
been fleshed out yet. So I would not concentrate on it at this time,
other algorithms like low rank matrix completion would need to figure
this out before having the potential to be added. AKA what Andy said
(so fast at email!)
Post by Wei Xue
Ha, I just get confused about the sampling in DPGMM :).
Wei Xue
Post by Andreas Mueller
Post by Kyle Kastner
Will users be allowed to set/tweak the burn-in and lag for the sampler
in the DPGMM?
This is variational!
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-25 19:54:01 UTC
Permalink
Even higher up, it compares variation, collapsed and truncated.
So the variational does not need any sampling (which makes sense).

Btw, this paper has a couple of references for more detailed equations:
http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-449.pdf
Post by Kyle Kastner
There was mention of TDP (blocked Gibbs higher up in the paper) vs
collapsed Gibbs sampling - both mentioned burn-in and lag. I was under
the impression you would have to be using one of these two to do the
computation, see page 137 of the paper just below the pictures, second
paragraph
http://www.cs.berkeley.edu/~jordan/papers/blei-jordan-ba.pdf
I could be mistaken though, this paper is pretty gnarly!
As far as handling missing values, that is an API concern that has not
been fleshed out yet. So I would not concentrate on it at this time,
other algorithms like low rank matrix completion would need to figure
this out before having the potential to be added. AKA what Andy said
(so fast at email!)
Post by Wei Xue
Ha, I just get confused about the sampling in DPGMM :).
Wei Xue
Post by Andreas Mueller
Post by Kyle Kastner
Will users be allowed to set/tweak the burn-in and lag for the sampler
in the DPGMM?
This is variational!
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Wei Xue
2015-03-25 18:59:34 UTC
Permalink
Thanks Andreas, Kyle, Vlad and Olivier for the detailed review.

1. For the part *Implementing VBGMM, *do you mean it would be better if I
add specific functions to be implemented? @Andreas.

2. For the documentation, I will rework on it and reschedule the API
specification and math part to the very first step. @Andreas, @Kyle, @Vlad.

3. For the reason of reimplement VBGMM, I think I did not make it clear, as
Kyle and Andreas pointed out. In this part, I will mainly re-implement the
updating functions part, such as ```_update_precisions``. @Kyle

4. I will add benchmarking and profiling into the test part as @Olivier
suggested.

5. For burin-in and lag mentioned by @Kyle, I guess it is about MCMC
sampling method. I took a look at Blei's paper Equation 23, I think it is
not MCMC, it is empirical approximated similar to what MCMC does. I am not
sure I understand the predictive function correctly. Any suggestion?

6. I would like to add a variance of EM estimation to GMM module, MAP
estimation. Currently, the m-step use maximum likelihood estimation with
min_covariance which prevent singular covariance estimation. I think it
would be better to add MAP estimation for m-step, because the fixed
min_covariance in ML estimation might be too aggressive in some cases. In
MAP, the effect of correcting covariance will be decreasing as the number
of data instances increases.

7. I would also like to add some functionality to deal with missing values
in GMM. The situation with missing value in the training data is not
uncommon and PRML book also mentioned that.

BTW, the draft of my proposal is updated to
https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM-module

Thanks,
Wei Xue
Post by Kyle Kastner
I like the fact that this can broken into nice parts. I also think
documentation should be farther up the list, and math part lumped in.
GMM cleanup should probably start out of the gate, as fixing that will
define what API/init changes have to stay consistent in the other two
models.
Is there any particular reason to reimplement *all* of the VBGMM and
DPGMM, or are there parts that seem to be reusable? A full on rewrite
of two estimators seems like a lot to take on, especially ones as
mathematically and statistically complicated as these. You might
elaborate on why these two need to be rewritten - specifically what
they are doing currently, and how will that change.
Will users be allowed to set/tweak the burn-in and lag for the sampler
in the DPGMM?
Post by Vlad Niculae
Hi Wei Xue, hi everyone,
I think Andy’s comments about testing and documentation are very
important.
Post by Vlad Niculae
1. As confused as I am about the world around me, I still knew that the
current year is 2015 :P I think that the form is asking “which year of your
program you are in.”
Post by Vlad Niculae
2. I think the mathematical derivation part could be considered a
documentation task as well.
Post by Vlad Niculae
Hope this helps,
Yours,
Vlad
Post by Andreas Mueller
Hi Wei Xue.
I think the proposal looks good and the scope should work well.
I feel like the explanation in
Implementing VBGMM
is a bit fuzzy, maybe you can rework it a bit.
Also, for the timeline, the documentation shouldn't come as an
afterthought.
Post by Vlad Niculae
Post by Andreas Mueller
Ideally, each improvement is its own pull-request, so that we can start
reviewing and merging code quickly.
Post by Vlad Niculae
Post by Andreas Mueller
For something to be merged, you do need to provide benchmarks, testing
and documentation, though.
Post by Vlad Niculae
Post by Andreas Mueller
You could actually start improving the documentation and examples for
the GMM already during the time you work on the math for the rest.
Post by Vlad Niculae
Post by Andreas Mueller
Best,
Andreas
Post by Wei Xue
Hi Andreas,
I have submitted my updated proposal as well.
Thanks!
Wei Xue
Hi Wei Xue.
I am also not very convinced by the core-set approach.
I'd rather focus on improving the API and fixing issues in the VBGMM
and DPGMM.
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
I was hoping that Murphy's book has some more details on DPGMM, but I
didn't find any yet. He doesn't seem to talk about variational inference in
Dirichlet processes.
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
So far I think your proposal looks solid.
It would be great if you could work on some pull requests to support
your application.
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
Best,
Andy
Post by Wei Xue
Hi groups,
I am a PhD student in Florida International University, US. I am
interested in the topic improving GMM. I draft a proposal for this topic.
https://github.com/xuewei4d/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
Post by Wei Xue
Here are some questions I would like to discuss.
1. -1 for coreset. The paper(
http://las.ethz.ch/files/feldman11scalable-long.pdf) is new and its
citations less than 15. The application situations
are on clusters, streaming data, which is (I think) is rare for
scikit-learn.
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
Post by Wei Xue
2. Currently, I have gone over the Approximation Inference chapter in
PRML (Bishop's machine learning book) and Blei's 2006 paper. But I have not
dig much into the code, so I don't have a detailed reimplement plan yet. Do
I need to add more details into the 'Theory and Implementation' part of the
proposal?
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
Post by Wei Xue
3. Any feedback is welcome.
Thanks,
Wei Xue
------------------------------------------------------------------------------
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
Post by Wei Xue
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
Post by Wei Xue
by Intel and developed in partnership with Slashdot Media, is your
hub for all
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
Post by Wei Xue
things parallel software development, from weekly thought leadership
blogs to
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
Post by Wei Xue
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
Post by Wei Xue
conversation now.
http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
things parallel software development, from weekly thought leadership
blogs to
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
things parallel software development, from weekly thought leadership
blogs to
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Vlad Niculae
Post by Andreas Mueller
Post by Wei Xue
conversation now.
http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Vlad Niculae
Post by Andreas Mueller
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Vlad Niculae
Post by Andreas Mueller
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Vlad Niculae
Post by Andreas Mueller
things parallel software development, from weekly thought leadership
blogs to
Post by Vlad Niculae
Post by Andreas Mueller
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.
http://goparallel.sourceforge.net/_______________________________________________
Post by Vlad Niculae
Post by Andreas Mueller
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Vlad Niculae
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Vlad Niculae
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Vlad Niculae
things parallel software development, from weekly thought leadership
blogs to
Post by Vlad Niculae
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-25 19:18:07 UTC
Permalink
Thanks for your feedback.
Post by Wei Xue
Thanks Andreas, Kyle, Vlad and Olivier for the detailed review.
1. For the part /Implementing VBGMM, /do you mean it would be better
I just felt the paragraph was a bit unclear, and would benefit from
saying what exactly you want to do.
Post by Wei Xue
6. I would like to add a variance of EM estimation to GMM module, MAP
estimation. Currently, the m-step use maximum likelihood estimation
with min_covariance which prevent singular covariance estimation. I
think it would be better to add MAP estimation for m-step, because the
fixed min_covariance in ML estimation might be too aggressive in some
cases. In MAP, the effect of correcting covariance will be decreasing
as the number of data instances increases.
How is this different from the VBGMM?
Post by Wei Xue
7. I would also like to add some functionality to deal with missing
values in GMM. The situation with missing value in the training data
is not uncommon and PRML book also mentioned that.
I think this is outside the scope of this project, as we generally have
avoided dealing with missing values in sklearn estimators directly.
Wei Xue
2015-03-25 19:38:34 UTC
Permalink
VBGMM is a full Bayesian estimation in both 'E-step' and 'M-step'
(although there is no such concept in VB) . The parameters in VB are random
variables, and described by a posterior distribution. The posterior
distribution is the product of the likelihood and the prior distribution.
On the other hand, although MAP estimation use the posterior distribution
as well, but it is still represented by a single value like in 'M-step'
like in EM. For example, if we use inverse Wishart distribution
W^{-1}(\Sigma|\Phi,
\nu) as the prior distribution for covariance matrix and set the
parameter \Phi
to be \alpha*I. We have \tilde{\Sigma} = \frac{n}{\nu+d+1+n}(\hat{\Sigma} +
\alpha*I) where \hat{\Sigma} is the classic estimation of covariance
matrix. As you can see, when the number of data instances increase,
the \tilde{\Sigma}
is approximated by \hat{\Sigma}. The effect \alpha is diminished. Therefore
the effect of min_covar ( \alpha ) is not prefixed, it also depends on the
number of training data we have.


Wei
Post by Andreas Mueller
Thanks for your feedback.
Thanks Andreas, Kyle, Vlad and Olivier for the detailed review.
1. For the part *Implementing VBGMM, *do you mean it would be better if
I just felt the paragraph was a bit unclear, and would benefit from saying
what exactly you want to do.
6. I would like to add a variance of EM estimation to GMM module, MAP
estimation. Currently, the m-step use maximum likelihood estimation with
min_covariance which prevent singular covariance estimation. I think it
would be better to add MAP estimation for m-step, because the fixed
min_covariance in ML estimation might be too aggressive in some cases. In
MAP, the effect of correcting covariance will be decreasing as the number
of data instances increases.
How is this different from the VBGMM?
7. I would also like to add some functionality to deal with missing
values in GMM. The situation with missing value in the training data is not
uncommon and PRML book also mentioned that.
I think this is outside the scope of this project, as we generally have
avoided dealing with missing values in sklearn estimators directly.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-25 19:45:56 UTC
Permalink
Sorry, I'm not following.
I'm not sure what you are arguing for. I know how VBGMM works, but I'm
not sure how MAP EM would work, and why it would be preferable over VBGMM.
Post by Wei Xue
VBGMM is a full Bayesian estimation in both 'E-step' and 'M-step'
(although there is no such concept in VB) . The parameters in VB are
random variables, and described by a posterior distribution. The
posterior distribution is the product of the likelihood and the prior
distribution. On the other hand, although MAP estimation use the
posterior distribution as well, but it is still represented by a
single value like in 'M-step' like in EM. For example, if we use
inverse Wishart distribution W^{-1}(\Sigma|\Phi, \nu) as the prior
distribution for covariance matrix and set the parameter \Phi to
be\alpha*I. We have \tilde{\Sigma} = \frac{n}{\nu+d+1+n}(\hat{\Sigma}
+ \alpha*I) where \hat{\Sigma} is the classic estimation of
covariance matrix. As you can see, when the number of data instances
increase, the \tilde{\Sigma} is approximated by \hat{\Sigma}. The
effect \alpha is diminished. Therefore the effect of min_covar (
\alpha ) is not prefixed, it also depends on the number of training
data we have.
Wei
Thanks for your feedback.
Post by Wei Xue
Thanks Andreas, Kyle, Vlad and Olivier for the detailed review.
1. For the part /Implementing VBGMM, /do you mean it would be
I just felt the paragraph was a bit unclear, and would benefit
from saying what exactly you want to do.
Post by Wei Xue
6. I would like to add a variance of EM estimation to GMM module,
MAP estimation. Currently, the m-step use maximum likelihood
estimation with min_covariance which prevent singular covariance
estimation. I think it would be better to add MAP estimation for
m-step, because the fixed min_covariance in ML estimation might
be too aggressive in some cases. In MAP, the effect of correcting
covariance will be decreasing as the number of data instances
increases.
How is this different from the VBGMM?
Post by Wei Xue
7. I would also like to add some functionality to deal with
missing values in GMM. The situation with missing value in the
training data is not uncommon and PRML book also mentioned that.
I think this is outside the scope of this project, as we generally
have avoided dealing with missing values in sklearn estimators
directly.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Wei Xue
2015-03-25 20:09:31 UTC
Permalink
Sorry for the confusion.

I am just saying min_covar that prevent singular covariance may be not
flexible. I think the value of min_covar is too large for estimated
covariance, sometimes. For example, a user first try a small subset of
training data using GMM with default min_covar = 0.001, then he use a
larger data set but still use min_covar = 0.001. But he could set min_covar
smaller in the larger data set. In MAP EM, when we have more data
instances, the effect of min_covar would be *automatically* diminished.

min_covar is just a regularization technique. We could justify it using MAP
estimation, but there is slight difference in the scalar coefficient
before \alpha. So MAP EM is more convincing than simply setting min_covar.
I am not saying MAP EM is preferable over VBGMM, but preferable over EM for
GMM. Does that make it clear?

Wei
Post by Andreas Mueller
Sorry, I'm not following.
I'm not sure what you are arguing for. I know how VBGMM works, but I'm not
sure how MAP EM would work, and why it would be preferable over VBGMM.
VBGMM is a full Bayesian estimation in both 'E-step' and 'M-step'
(although there is no such concept in VB) . The parameters in VB are random
variables, and described by a posterior distribution. The posterior
distribution is the product of the likelihood and the prior distribution.
On the other hand, although MAP estimation use the posterior distribution
as well, but it is still represented by a single value like in 'M-step'
like in EM. For example, if we use inverse Wishart distribution W^{-1}(\Sigma|\Phi,
\nu) as the prior distribution for covariance matrix and set the
parameter \Phi to be \alpha*I. We have \tilde{\Sigma} =
\frac{n}{\nu+d+1+n}(\hat{\Sigma} + \alpha*I) where \hat{\Sigma} is the
classic estimation of covariance matrix. As you can see, when the number
of data instances increase, the \tilde{\Sigma} is approximated by \hat{\Sigma}.
The effect \alpha is diminished. Therefore the effect of min_covar ( \alpha
) is not prefixed, it also depends on the number of training data we have.
Wei
Post by Andreas Mueller
Thanks for your feedback.
Thanks Andreas, Kyle, Vlad and Olivier for the detailed review.
1. For the part *Implementing VBGMM, *do you mean it would be better if
I just felt the paragraph was a bit unclear, and would benefit from
saying what exactly you want to do.
6. I would like to add a variance of EM estimation to GMM module, MAP
estimation. Currently, the m-step use maximum likelihood estimation with
min_covariance which prevent singular covariance estimation. I think it
would be better to add MAP estimation for m-step, because the fixed
min_covariance in ML estimation might be too aggressive in some cases. In
MAP, the effect of correcting covariance will be decreasing as the number
of data instances increases.
How is this different from the VBGMM?
7. I would also like to add some functionality to deal with missing
values in GMM. The situation with missing value in the training data is not
uncommon and PRML book also mentioned that.
I think this is outside the scope of this project, as we generally
have avoided dealing with missing values in sklearn estimators directly.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Wei Xue
2015-03-25 21:17:11 UTC
Permalink
@Andreas, on the second thought, MAP EM seems not so important. It just has
more theoretic support. We might skip this.

Wei
Post by Wei Xue
Sorry for the confusion.
I am just saying min_covar that prevent singular covariance may be not
flexible. I think the value of min_covar is too large for estimated
covariance, sometimes. For example, a user first try a small subset of
training data using GMM with default min_covar = 0.001, then he use a
larger data set but still use min_covar = 0.001. But he could set min_covar
smaller in the larger data set. In MAP EM, when we have more data
instances, the effect of min_covar would be *automatically* diminished.
min_covar is just a regularization technique. We could justify it using
MAP estimation, but there is slight difference in the scalar coefficient
before \alpha. So MAP EM is more convincing than simply setting min_covar.
I am not saying MAP EM is preferable over VBGMM, but preferable over EM for
GMM. Does that make it clear?
Wei
Post by Andreas Mueller
Sorry, I'm not following.
I'm not sure what you are arguing for. I know how VBGMM works, but I'm
not sure how MAP EM would work, and why it would be preferable over VBGMM.
VBGMM is a full Bayesian estimation in both 'E-step' and 'M-step'
(although there is no such concept in VB) . The parameters in VB are random
variables, and described by a posterior distribution. The posterior
distribution is the product of the likelihood and the prior distribution.
On the other hand, although MAP estimation use the posterior distribution
as well, but it is still represented by a single value like in 'M-step'
like in EM. For example, if we use inverse Wishart distribution W^{-1}(\Sigma|\Phi,
\nu) as the prior distribution for covariance matrix and set the
parameter \Phi to be \alpha*I. We have \tilde{\Sigma} =
\frac{n}{\nu+d+1+n}(\hat{\Sigma} + \alpha*I) where \hat{\Sigma} is the
classic estimation of covariance matrix. As you can see, when the
number of data instances increase, the \tilde{\Sigma} is approximated
by \hat{\Sigma}. The effect \alpha is diminished. Therefore the effect
of min_covar ( \alpha ) is not prefixed, it also depends on the number
of training data we have.
Wei
Post by Andreas Mueller
Thanks for your feedback.
Thanks Andreas, Kyle, Vlad and Olivier for the detailed review.
1. For the part *Implementing VBGMM, *do you mean it would be better
I just felt the paragraph was a bit unclear, and would benefit from
saying what exactly you want to do.
6. I would like to add a variance of EM estimation to GMM module, MAP
estimation. Currently, the m-step use maximum likelihood estimation with
min_covariance which prevent singular covariance estimation. I think it
would be better to add MAP estimation for m-step, because the fixed
min_covariance in ML estimation might be too aggressive in some cases. In
MAP, the effect of correcting covariance will be decreasing as the number
of data instances increases.
How is this different from the VBGMM?
7. I would also like to add some functionality to deal with missing
values in GMM. The situation with missing value in the training data is not
uncommon and PRML book also mentioned that.
I think this is outside the scope of this project, as we generally
have avoided dealing with missing values in sklearn estimators directly.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-25 21:21:36 UTC
Permalink
I don't have a strong opinion.
Maybe it is better than the current regularization, but then I was
wondering why not go all the way to VBGMM.
Though I found the min_covars hard to set, and so MAP EM might be a good
addition.
Post by Wei Xue
@Andreas, on the second thought, MAP EM seems not so important. It
just has more theoretic support. We might skip this.
Wei
Sorry for the confusion.
I am just saying min_covar that prevent singular covariance may be
not flexible. I think the value of min_covar is too large for
estimated covariance, sometimes. For example, a user first try a
small subset of training data using GMM with default min_covar =
0.001, then he use a larger data set but still use min_covar =
0.001. But he could set min_covar smaller in the larger data set.
In MAP EM, when we have more data instances, the effect of
min_covar would be *automatically* diminished.
min_covar is just a regularization technique. We could justify it
using MAP estimation, but there is slight difference in the
scalar coefficient before \alpha. So MAP EM is more convincing
than simply setting min_covar. I am not saying MAP EM is
preferable over VBGMM, but preferable over EM for GMM. Does that
make it clear?
Wei
Sorry, I'm not following.
I'm not sure what you are arguing for. I know how VBGMM works,
but I'm not sure how MAP EM would work, and why it would be
preferable over VBGMM.
Post by Wei Xue
VBGMM is a full Bayesian estimation in both 'E-step' and
'M-step' (although there is no such concept in VB) . The
parameters in VB are random variables, and described by a
posterior distribution. The posterior distribution is the
product of the likelihood and the prior distribution. On the
other hand, although MAP estimation use the posterior
distribution as well, but it is still represented by a single
value like in 'M-step' like in EM. For example, if we use
inverse Wishart distribution W^{-1}(\Sigma|\Phi, \nu) as the
prior distribution for covariance matrix and set the
parameter \Phi to be\alpha*I. We have \tilde{\Sigma} =
\frac{n}{\nu+d+1+n}(\hat{\Sigma} + \alpha*I) where
\hat{\Sigma} is the classic estimation of covariance
matrix. As you can see, when the number of data instances
increase, the \tilde{\Sigma} is approximated by \hat{\Sigma}.
The effect \alpha is diminished. Therefore the effect of
min_covar ( \alpha ) is not prefixed, it also depends on the
number of training data we have.
Wei
On Wed, Mar 25, 2015 at 3:18 PM, Andreas Mueller
Thanks for your feedback.
Post by Wei Xue
Thanks Andreas, Kyle, Vlad and Olivier for the detailed review.
1. For the part /Implementing VBGMM, /do you mean it
would be better if I add specific functions to be
I just felt the paragraph was a bit unclear, and would
benefit from saying what exactly you want to do.
Post by Wei Xue
6. I would like to add a variance of EM estimation to
GMM module, MAP estimation. Currently, the m-step use
maximum likelihood estimation with min_covariance which
prevent singular covariance estimation. I think it would
be better to add MAP estimation for m-step, because the
fixed min_covariance in ML estimation might be too
aggressive in some cases. In MAP, the effect of
correcting covariance will be decreasing as the number
of data instances increases.
How is this different from the VBGMM?
Post by Wei Xue
7. I would also like to add some functionality to deal
with missing values in GMM. The situation with missing
value in the training data is not uncommon and PRML book
also mentioned that.
I think this is outside the scope of this project, as we
generally have avoided dealing with missing values in
sklearn estimators directly.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go
Parallel Website, sponsored
by Intel and developed in partnership with Slashdot
Media, is your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a
look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel
Website, sponsored
by Intel and developed in partnership with Slashdot Media, is
your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Wei Xue
2015-03-26 03:42:52 UTC
Permalink
Dear all,

I just updated the proposal draft
<https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:-Improve-GMM-module>
on
github and melange.

Thanks,
Wei Xue
Post by Andreas Mueller
I don't have a strong opinion.
Maybe it is better than the current regularization, but then I was
wondering why not go all the way to VBGMM.
Though I found the min_covars hard to set, and so MAP EM might be a good
addition.
@Andreas, on the second thought, MAP EM seems not so important. It just
has more theoretic support. We might skip this.
Wei
Post by Wei Xue
Sorry for the confusion.
I am just saying min_covar that prevent singular covariance may be not
flexible. I think the value of min_covar is too large for estimated
covariance, sometimes. For example, a user first try a small subset of
training data using GMM with default min_covar = 0.001, then he use a
larger data set but still use min_covar = 0.001. But he could set min_covar
smaller in the larger data set. In MAP EM, when we have more data
instances, the effect of min_covar would be *automatically* diminished.
min_covar is just a regularization technique. We could justify it using
MAP estimation, but there is slight difference in the scalar coefficient
before \alpha. So MAP EM is more convincing than simply setting min_covar.
I am not saying MAP EM is preferable over VBGMM, but preferable over EM for
GMM. Does that make it clear?
Wei
Post by Andreas Mueller
Sorry, I'm not following.
I'm not sure what you are arguing for. I know how VBGMM works, but I'm
not sure how MAP EM would work, and why it would be preferable over VBGMM.
VBGMM is a full Bayesian estimation in both 'E-step' and 'M-step'
(although there is no such concept in VB) . The parameters in VB are random
variables, and described by a posterior distribution. The posterior
distribution is the product of the likelihood and the prior distribution.
On the other hand, although MAP estimation use the posterior distribution
as well, but it is still represented by a single value like in 'M-step'
like in EM. For example, if we use inverse Wishart distribution W^{-1}(\Sigma|\Phi,
\nu) as the prior distribution for covariance matrix and set the
parameter \Phi to be \alpha*I. We have \tilde{\Sigma} =
\frac{n}{\nu+d+1+n}(\hat{\Sigma} + \alpha*I) where \hat{\Sigma} is the
classic estimation of covariance matrix. As you can see, when the
number of data instances increase, the \tilde{\Sigma} is approximated
by \hat{\Sigma}. The effect \alpha is diminished. Therefore the effect
of min_covar ( \alpha ) is not prefixed, it also depends on the number
of training data we have.
Wei
Post by Andreas Mueller
Thanks for your feedback.
Thanks Andreas, Kyle, Vlad and Olivier for the detailed review.
1. For the part *Implementing VBGMM, *do you mean it would be better
I just felt the paragraph was a bit unclear, and would benefit from
saying what exactly you want to do.
6. I would like to add a variance of EM estimation to GMM module, MAP
estimation. Currently, the m-step use maximum likelihood estimation with
min_covariance which prevent singular covariance estimation. I think it
would be better to add MAP estimation for m-step, because the fixed
min_covariance in ML estimation might be too aggressive in some cases. In
MAP, the effect of correcting covariance will be decreasing as the number
of data instances increases.
How is this different from the VBGMM?
7. I would also like to add some functionality to deal with missing
values in GMM. The situation with missing value in the training data is not
uncommon and PRML book also mentioned that.
I think this is outside the scope of this project, as we generally
have avoided dealing with missing values in sklearn estimators directly.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gael Varoquaux
2015-03-26 05:55:41 UTC
Permalink
1. For the part Implementing VBGMM, do you mean it would be better if I add
My question is: why do you think that, by coding it from scratch rather
than trying to understand the existing one and improving it, you'll do a
better job? The guy who did it wasn't an incompetent, and it has been
improved since.
3. For the reason of reimplement VBGMM, I think I did not make it clear, as
Kyle and Andreas pointed out. In this part, I will mainly re-implement the
OK. You need to be clear in the proposal on what you will change, why,
and how you will judge the progress.

G
Wei Xue
2015-03-26 23:05:18 UTC
Permalink
Hi, Gaël and group

I really appreciate your comments. You are right. I'd better to stand on
the shoulders of giants rather than build all things from scratch. I went
through the 120+ comments on the very very initial PR #116 in 2011,
https://github.com/scikit-learn/scikit-learn/pull/116. It is very valuable
resource, especially the comments of Olivier, Gaël and Alex.

1. Updating rules. Alex also mentioned that he did not know other clear
reference to make sure the implemented algorithm is correct. I can
double-check the current derivation.

2. Speed. There is an intense discussion about vectorization. I think we
should do more detailed profiling and vectorization to find out
opportunities to speed up. I used to write a lot of code in Matlab, I am
also a big fan of vectorization. Using unnecessary for loop is also a bad
idea in Matlab. I could contribute more on this point.

3. Testing. It seems there are many obstacles on testing. The most
important issues are correctness, convergence and numerical
instabilities. It is always not an easy task to make sure
the nondeterministic algorithm works correctly. I didn't find related
materials on the testing of variational methods. But there are some
materials about testing MCMC. I guess we could have similar test cases like
MCMC. References
<https://hips.seas.harvard.edu/blog/2013/05/20/testing-mcmc-code-part-1-unit-tests/>

Thanks,
Wei Xue

On Thu, Mar 26, 2015 at 1:55 AM, Gael Varoquaux <
Post by Gael Varoquaux
1. For the part Implementing VBGMM, do you mean it would be better if I
add
My question is: why do you think that, by coding it from scratch rather
than trying to understand the existing one and improving it, you'll do a
better job? The guy who did it wasn't an incompetent, and it has been
improved since.
3. For the reason of reimplement VBGMM, I think I did not make it clear,
as
Kyle and Andreas pointed out. In this part, I will mainly re-implement
the
OK. You need to be clear in the proposal on what you will change, why,
and how you will judge the progress.
G
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...