Discussion:
Multinomial HMM Issue #1158
(too old to reply)
David Reed
2013-03-05 23:56:53 UTC
Permalink
Hi, I added a comment to issue #1158 but since it is closed, I'm not sure
if anyone would be alerted.

I am not sure if this should be closed or perhaps a second issue should be
opened.

As already stated, the attribute n_symbols only gets created when an
emission probability matrix is defined. This is not documented very well,
is sorta clumsy and also not consistent.

The HMM currently only requires the number of components (which I believe
would be better named to states). It takes the number of components and
generates a uniform transition matrix. This should also be true for the
emission matrix. If given the number of states and symbols, a uniform
emission matrix should be generated.

The emission matrix should also be an optional input.

I could take a wack at resolving these if everyone agrees.

Thanks,

Dave
Andreas Mueller
2013-03-06 07:55:13 UTC
Permalink
Hi.
Should we just deprecate / remove the HMM?
We deemed sequence prediction off-topic (Lars' words and I agree) and
there is no core-dev maintaining them.
Is there any project this could move to?
Statsmodel, pandas? There should be a go-to place for time-series modelling.
There was scikit-timeseries but that is supposed to move to pandas:
http://pytseries.sourceforge.net/

Sorry if this is not the answer you hoped for, Dave :-/

Cheers,
Andy
Post by David Reed
Hi, I added a comment to issue #1158 but since it is closed, I'm not
sure if anyone would be alerted.
I am not sure if this should be closed or perhaps a second issue
should be opened.
As already stated, the attribute n_symbols only gets created when an
emission probability matrix is defined. This is not documented very
well, is sorta clumsy and also not consistent.
The HMM currently only requires the number of components (which I
believe would be better named to states). It takes the number of
components and generates a uniform transition matrix. This should
also be true for the emission matrix. If given the number of states
and symbols, a uniform emission matrix should be generated.
The emission matrix should also be an optional input.
I could take a wack at resolving these if everyone agrees.
Thanks,
Dave
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Didier Vila
2013-03-06 08:56:31 UTC
Permalink
Even it s not perfect, it s is good to keep it . I am an user and there
s all the time a recurrent request on the topics..



Didier Vila, PhD | Risk | CapQuest Group Ltd | Fleet 27 | Rye Close |
Fleet | Hampshire | GU51 2QQ | Tel: 0871 574 7989 | Fax: 0871 574 2992 |
Email: ***@capquestco.com <mailto:***@capquestco.com>



From: Andreas Mueller [mailto:***@ais.uni-bonn.de]
Sent: 06 March 2013 07:55
To: scikit-learn-***@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] Multinomial HMM Issue #1158



Hi.
Should we just deprecate / remove the HMM?
We deemed sequence prediction off-topic (Lars' words and I agree) and
there is no core-dev maintaining them.
Is there any project this could move to?
Statsmodel, pandas? There should be a go-to place for time-series
modelling.
There was scikit-timeseries but that is supposed to move to pandas:
http://pytseries.sourceforge.net/

Sorry if this is not the answer you hoped for, Dave :-/

Cheers,
Andy


On 03/06/2013 12:56 AM, David Reed wrote:

Hi, I added a comment to issue #1158 but since it is closed, I'm
not sure if anyone would be alerted.



I am not sure if this should be closed or perhaps a second issue
should be opened.



As already stated, the attribute n_symbols only gets created
when an emission probability matrix is defined. This is not documented
very well, is sorta clumsy and also not consistent.



The HMM currently only requires the number of components (which
I believe would be better named to states). It takes the number of
components and generates a uniform transition matrix. This should also
be true for the emission matrix. If given the number of states and
symbols, a uniform emission matrix should be generated.



The emission matrix should also be an optional input.



I could take a wack at resolving these if everyone agrees.



Thanks,



Dave







------------------------------------------------------------------------
------
Symantec Endpoint Protection 12 positioned as A LEADER in The
Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice"
in the
endpoint security space. For insight on selecting the right
partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev






_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-***@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Ronnie Ghose
2013-03-06 11:38:30 UTC
Permalink
I like hmm ... would be nice if it could be included in
a probabilistic package though... or can we make a probabilistic sub
package and have it be the only denizen?
Even it s not perfect, it s is good to keep it . I am an user and there s
all the time a recurrent request on the topics.. ****
** **
Didier Vila, PhD | Risk | CapQuest Group Ltd | Fleet 27 | Rye
Close | Fleet | Hampshire | GU51 2QQ | Tel: 0871 574 7989 | Fax: 0871 574
** **
*Sent:* 06 March 2013 07:55
*Subject:* Re: [Scikit-learn-general] Multinomial HMM Issue #1158****
** **
Hi.
Should we just deprecate / remove the HMM?
We deemed sequence prediction off-topic (Lars' words and I agree) and
there is no core-dev maintaining them.
Is there any project this could move to?
Statsmodel, pandas? There should be a go-to place for time-series modelling.
http://pytseries.sourceforge.net/
Sorry if this is not the answer you hoped for, Dave :-/
Cheers,
Andy
On 03/06/2013 12:56 AM, David Reed wrote:****
Hi, I added a comment to issue #1158 but since it is closed, I'm not sure
if anyone would be alerted. ****
** **
I am not sure if this should be closed or perhaps a second issue should be
opened.****
** **
As already stated, the attribute n_symbols only gets created when an
emission probability matrix is defined. This is not documented very well,
is sorta clumsy and also not consistent.****
** **
The HMM currently only requires the number of components (which I believe
would be better named to states). It takes the number of components and
generates a uniform transition matrix. This should also be true for the
emission matrix. If given the number of states and symbols, a uniform
emission matrix should be generated. ****
** **
The emission matrix should also be an optional input. ****
** **
I could take a wack at resolving these if everyone agrees.****
** **
Thanks,****
** **
Dave****
****
------------------------------------------------------------------------------****
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester ****
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the ****
endpoint security space. For insight on selecting the right partner to ****
tackle endpoint security challenges, access the full report. ****
http://p.sf.net/sfu/symantec-dev2dev****
****
_______________________________________________****
Scikit-learn-general mailing list****
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general****
** **
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Didier Vila
2013-03-06 11:48:53 UTC
Permalink
I like HMM TOO +1 !



Didier Vila, PhD | Risk | CapQuest Group Ltd | Fleet 27 | Rye Close |
Fleet | Hampshire | GU51 2QQ | Tel: 0871 574 7989 | Fax: 0871 574 2992 |
Email: ***@capquestco.com <mailto:***@capquestco.com>



From: Ronnie Ghose [mailto:***@gmail.com]
Sent: 06 March 2013 11:39
To: scikit-learn-***@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] Multinomial HMM Issue #1158



I like hmm ... would be nice if it could be included in a probabilistic
package though... or can we make a probabilistic sub package and have it
be the only denizen?



On Wed, Mar 6, 2013 at 3:56 AM, Didier Vila <***@capquestco.com>
wrote:

Even it s not perfect, it s is good to keep it . I am an user and there
s all the time a recurrent request on the topics..



Didier Vila, PhD | Risk | CapQuest Group Ltd | Fleet 27 | Rye Close |
Fleet | Hampshire | GU51 2QQ | Tel: 0871 574 7989 | Fax: 0871 574 2992 |
Email: ***@capquestco.com <mailto:***@capquestco.com>



From: Andreas Mueller [mailto:***@ais.uni-bonn.de]
Sent: 06 March 2013 07:55
To: scikit-learn-***@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] Multinomial HMM Issue #1158



Hi.
Should we just deprecate / remove the HMM?
We deemed sequence prediction off-topic (Lars' words and I agree) and
there is no core-dev maintaining them.
Is there any project this could move to?
Statsmodel, pandas? There should be a go-to place for time-series
modelling.
There was scikit-timeseries but that is supposed to move to pandas:
http://pytseries.sourceforge.net/

Sorry if this is not the answer you hoped for, Dave :-/

Cheers,
Andy


On 03/06/2013 12:56 AM, David Reed wrote:

Hi, I added a comment to issue #1158 but since it is closed, I'm
not sure if anyone would be alerted.



I am not sure if this should be closed or perhaps a second issue
should be opened.



As already stated, the attribute n_symbols only gets created
when an emission probability matrix is defined. This is not documented
very well, is sorta clumsy and also not consistent.



The HMM currently only requires the number of components (which
I believe would be better named to states). It takes the number of
components and generates a uniform transition matrix. This should also
be true for the emission matrix. If given the number of states and
symbols, a uniform emission matrix should be generated.



The emission matrix should also be an optional input.



I could take a wack at resolving these if everyone agrees.



Thanks,



Dave






------------------------------------------------------------------------
------

Symantec Endpoint Protection 12 positioned as A LEADER in The
Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice"
in the
endpoint security space. For insight on selecting the right
partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev





_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-***@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------
------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



This e-mail is intended solely for the addressee, is strictly confidential and may also be legally privileged. If you are not the addressee please do not read, print, re-transmit, store or act in reliance on it or any attachments. Instead, please email it back to the sender and then immediately permanently delete it. E-mail communications cannot be guaranteed to be secure or error free, as information could be intercepted, corrupted, amended, lost, destroyed, arrive late or incomplete, or contain viruses. We do not accept liability for any such matters or their consequences. Anyone who communicates with us by e-mail is taken to accept the risks in doing so. Opinions, conclusions and other information in this e-mail and any attachments are solely those of the author and do not represent those of CapQuest Group Limited or any of its subsidiaries unless otherwise stated. CapQuest Group Limited (registered number 4936030), CapQuest Debt Recovery Limited (registered number 3772278), CapQuest Investments Limited (registered number 5245825), CapQuest Asset Management Limited (registered number 5245829) and CapQuest Mortgage Servicing Limited (registered number 05821008) are all limited companies registered in England and Wales with their registered offices at Fleet 27, Rye Close, Fleet, Hampshire, GU51 2QQ. Each company is a separate and independent legal entity. None of the companies have any liability for each other's acts or omissions. This communication is from the company named in the sender's details above.
Andreas Mueller
2013-03-06 11:49:33 UTC
Permalink
Post by Ronnie Ghose
I like hmm ... would be nice if it could be included in
a probabilistic package though... or can we make a probabilistic sub
package and have it be the only denizen?
Probabilistic in the sense that the algorithms in it sometimes work and
sometimes don't? ;)
Sorry, I don't understand what you mean with probabilistic sub package.
Ronnie Ghose
2013-03-06 11:54:09 UTC
Permalink
probabilistic as in something you would find here
http://www.cs.ubc.ca/~murphyk/MLbook/index.html

ones that utilize more probabilistic methods, e.g., bayesian nets.
Post by Andreas Mueller
Post by Ronnie Ghose
I like hmm ... would be nice if it could be included in
a probabilistic package though... or can we make a probabilistic sub
package and have it be the only denizen?
Probabilistic in the sense that the algorithms in it sometimes work and
sometimes don't? ;)
Sorry, I don't understand what you mean with probabilistic sub package.
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Ronnie Ghose
2013-03-06 11:55:47 UTC
Permalink
i'm thinking along the lines of MCMC, Bayes nets, etc..... ?
Post by Ronnie Ghose
probabilistic as in something you would find here
http://www.cs.ubc.ca/~murphyk/MLbook/index.html
ones that utilize more probabilistic methods, e.g., bayesian nets.
Post by Andreas Mueller
Post by Ronnie Ghose
I like hmm ... would be nice if it could be included in
a probabilistic package though... or can we make a probabilistic sub
package and have it be the only denizen?
Probabilistic in the sense that the algorithms in it sometimes work and
sometimes don't? ;)
Sorry, I don't understand what you mean with probabilistic sub package.
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2013-03-06 23:05:56 UTC
Permalink
Post by Ronnie Ghose
i'm thinking along the lines of MCMC, Bayes nets, etc..... ?
I'm always confused when people talk about "bayes nets".
as far as I am concerned, this is synonymous with "directed graphical model"
- which is quite an abstract concept.
Is there any particular algorithm that you mean?

Andreas Mueller
2013-03-06 11:58:57 UTC
Permalink
Post by Ronnie Ghose
probabilistic as in something you would find here
http://www.cs.ubc.ca/~murphyk/MLbook/index.html
<http://www.cs.ubc.ca/%7Emurphyk/MLbook/index.html>
ones that utilize more probabilistic methods, e.g., bayesian nets.
You mean like logistic regression, ARD, GMMs, Gaussian Processes,
Bayesian Regression, SGD with log-loss, Naive Bayes, LDA, QDA, PPCA...
( didn't read Murphy's book yet but he says he is more frequentist than
Bishop and those models are basically all in Bishop).

Putting all these in one sub-module seems a bit weird to me.
David Reed
2013-03-06 12:04:08 UTC
Permalink
I can't say that I'm an expert, but this module appears to be nearly
complete. I think all we need is a little better documentation, maybe a
canonical test set, and made a bit more consistent, all are relatively easy
tasks.
Post by Andreas Mueller
Post by Ronnie Ghose
I like hmm ... would be nice if it could be included in
a probabilistic package though... or can we make a probabilistic sub
package and have it be the only denizen?
Probabilistic in the sense that the algorithms in it sometimes work and
sometimes don't? ;)
Sorry, I don't understand what you mean with probabilistic sub package.
------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Lars Buitinck
2013-03-06 12:36:54 UTC
Permalink
Post by David Reed
I can't say that I'm an expert, but this module appears to be nearly
complete. I think all we need is a little better documentation, maybe a
canonical test set, and made a bit more consistent, all are relatively easy
tasks.
What do you use MultinomialHMM for?
Ronnie Ghose
2013-03-06 12:41:54 UTC
Permalink
... file issues? it's possible to fix that
Post by David Reed
I can't say that I'm an expert, but this module appears to be nearly
complete. I think all we need is a little better documentation, maybe a
canonical test set, and made a bit more consistent, all are relatively
easy
Post by David Reed
tasks.
David Reed
2013-03-06 13:13:20 UTC
Permalink
I am a Matlab convert, and was using BNT Toolbox from Kevin Murphy for some
speech stuff, and the HMM component here appears to be comparable to that.

This also appears to be comparable to Matlab's own HMM modules, but maybe
this isn't saying a lot.
Post by Ronnie Ghose
... file issues? it's possible to fix that
Post by David Reed
I can't say that I'm an expert, but this module appears to be nearly
complete. I think all we need is a little better documentation, maybe a
canonical test set, and made a bit more consistent, all are relatively
easy
Post by David Reed
tasks.
Lars Buitinck
2013-03-06 12:00:00 UTC
Permalink
Post by Didier Vila
I like hmm ... would be nice if it could be included in a probabilistic
package though... or can we make a probabilistic sub package and have it be
the only denizen?
That doesn't make sense to me. We have lots of probabilistic packages,
and besides, an organization by application makes more sense than by
theory. I.e. I'd group HMMs with stuctured perceptrons etc.

(I don't like the HMM because of its unsupervised training, lack of
built-in sequence start/stop encoding, first-orderness and crooked API
with fixed-length sequences. The only time I had a use for it is when
I needed a Viterbi algorithm for a different model, but then I lifted
the code out to modify it.)
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
Loading...