Discussion:
[Scikit-learn-general] DPGMM applied to 1-dimensional vector and variance problem
Johan Mazel
2016-05-05 07:48:38 UTC
Permalink
Hi
I am trying to use the DPGMM technique (
http://scikit-learn.org/stable/modules/generated/sklearn.mixture.DPGMM.html)
to find classes of data in 1-dimensional vectors.

The extracted variance/standard devaition seems a bit off: either
underestimated or overestimated compared to dpcluster (
https://github.com/teodor-moldovan/dpcluster).
I wrote a small script that give two examples and attached it to this mail.
Please tell me if I am doing anything wrong.

Thank you very much for your time.

Regards,
Johan
Johan Mazel
2016-05-06 03:16:57 UTC
Permalink
Hello
Sorry for the double post, I think there was a problem with my previous
message.

I am trying to use the DPGMM technique (
http://scikit-learn.org/stable/modules/generated/sklearn.mixture.DPGMM.html)
to find classes of data in 1-dimensional vectors.

The extracted variance/standard devaition seems a bit off: either
underestimated or overestimated compared to dpcluster (
https://github.com/teodor-moldovan/dpcluster).
I wrote a small script that give two examples and attached it to this mail.
Please tell me if I am doing anything wrong.

Thank you very much for your time.

Regards,
Johan
Andreas Mueller
2016-05-12 16:22:55 UTC
Permalink
Hi Johan.
Unfortunately there are known problems with DPGMM
https://github.com/scikit-learn/scikit-learn/issues/2454
There is a PR to reimplement:
https://github.com/scikit-learn/scikit-learn/pull/4802
I didn't know about dpcluster, it seems unmaintained. But maybe
something to compare against?

Andy
Post by Johan Mazel
Hello
Sorry for the double post, I think there was a problem with my
previous message.
I am trying to use the DPGMM technique
(http://scikit-learn.org/stable/modules/generated/sklearn.mixture.DPGMM.html)
to find classes of data in 1-dimensional vectors.
The extracted variance/standard devaition seems a bit off: either
underestimated or overestimated compared to dpcluster
(https://github.com/teodor-moldovan/dpcluster).
I wrote a small script that give two examples and attached it to this mail.
Please tell me if I am doing anything wrong.
Thank you very much for your time.
Regards,
Johan
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...