Discussion:
Contributing to scikit-learn
(too old to reply)
Vandana Bachani
2012-06-04 23:31:26 UTC
Permalink
Hi,
Me and my friend Shreyas want to contribute to the scikit-learn code.
I want to add code for neural networks (Multi-layer Perceptrons) and
Shreyas has some ideas for the Expecation-Maximization algorithm and
Gaussian Mixture Models. Please let us know how we can contribute to the
code and if we can discuss our ideas with someone on the scikit team so
that we are not reinventing something that is already there.
About me: I am a Computer Science Masters student at Texas A&M University
currently interning at Google. I have a basic version of neural networks
for classification implemented in python as part of my machine learning
class project (works well for UCI Datasets). I am planning to extend it for
regression and optimize it to make it public.
Shreyas is a PhD student at University of Texas at El Paso and is currently
interning at Google.
We are planning to pair program on these ideas to make them scikit worthy.

Thanks,
--
Vandana Bachani
Graduate Student, MSCE
Computer Science & Engineering Department
Texas A&M University, College Station
--
Vandana Bachani
Graduate Student, MSCE
Computer Science & Engineering Department
Texas A&M University, College Station
Gael Varoquaux
2012-06-05 05:27:56 UTC
Permalink
Hi Vandana and Shreyas,

Welcome and thanks for the interest,

With regards to MLP (multi-layer perceptrons), David Marek is right now
working on such feature:
https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp
you can probably pitch in with him: 4 eyes are always better than only 2.

With regard to EM for GMM, the scikit-learn has an implementation of this
class of algorithms in sklearn/mixture/gmm.py. This code is a little bit
outdated and can probably be improved in terms of readability, speed and
feature set.

Cheers,

Gaël
Post by Vandana Bachani
Hi,
Me and my friend Shreyas want to contribute to the scikit-learn code.
I want to add code for neural networks (Multi-layer Perceptrons) and
Shreyas has some ideas for the Expecation-Maximization algorithm and
Gaussian Mixture Models. Please let us know how we can contribute to the
code and if we can discuss our ideas with someone on the scikit team so
that we are not reinventing something that is already there.
About me: I am a Computer Science Masters student at Texas A&M University
currently interning at Google. I have a basic version of neural networks
for classification implemented in python as part of my machine learning
class project (works well for UCI Datasets). I am planning to extend it
for regression and optimize it to make it public.
Shreyas is a PhD student at University of Texas at El Paso and is
currently interning at Google.
We are planning to pair program on these ideas to make them scikit worthy.
Thanks,
Shreyas Karkhedkar
2012-06-05 06:07:07 UTC
Permalink
Hi Gael,

Thanks for the response. Vandana and I are really excited about
contributing to scikits.

I will go through the GMM code and will put in suggestions for refactoring
- and if possible implement some new features.

Once again, on behalf of Vandana and I, thanks for the reply.

Looking forward to work with you.

Cheers,
Shreyas

On Mon, Jun 4, 2012 at 10:27 PM, Gael Varoquaux <
Post by Gael Varoquaux
Hi Vandana and Shreyas,
Welcome and thanks for the interest,
With regards to MLP (multi-layer perceptrons), David Marek is right now
https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp
you can probably pitch in with him: 4 eyes are always better than only 2.
With regard to EM for GMM, the scikit-learn has an implementation of this
class of algorithms in sklearn/mixture/gmm.py. This code is a little bit
outdated and can probably be improved in terms of readability, speed and
feature set.
Cheers,
Gaėl
Post by Vandana Bachani
Hi,
Me and my friend Shreyas want to contribute to the scikit-learn code.
I want to add code for neural networks (Multi-layer Perceptrons) and
Shreyas has some ideas for the Expecation-Maximization algorithm and
Gaussian Mixture Models. Please let us know how we can contribute to
the
Post by Vandana Bachani
code and if we can discuss our ideas with someone on the scikit team
so
Post by Vandana Bachani
that we are not reinventing something that is already there.
About me: I am a Computer Science Masters student at Texas A&M
University
Post by Vandana Bachani
currently interning at Google. I have a basic version of neural
networks
Post by Vandana Bachani
for classification implemented in python as part of my machine
learning
Post by Vandana Bachani
class project (works well for UCI Datasets). I am planning to extend
it
Post by Vandana Bachani
for regression and optimize it to make it public.
Shreyas is a PhD student at University of Texas at El Paso and is
currently interning at Google.
We are planning to pair program on these ideas to make them scikit
worthy.
Post by Vandana Bachani
Thanks,
--
Shreyas Ashok Karkhedkar
PhD Candidate
Computer Science
University of Texas at El Paso

email:
***@miners.utep.edu
***@gmail.com

Phone:
+1-240-494-6362
Andreas Mueller
2012-06-05 08:27:46 UTC
Permalink
Hi Shreyas.
In particular, the VBGMM and DPGMM might need some attention.
Once you are a bit familiar with the GMM code, you could have a look
at issue 393 <https://github.com/scikit-learn/scikit-learn/issues/393>.
Any help would be much appreciated :)

Cheers,
Andy
Post by Shreyas Karkhedkar
Hi Gael,
Thanks for the response. Vandana and I are really excited about
contributing to scikits.
I will go through the GMM code and will put in suggestions for
refactoring - and if possible implement some new features.
Once again, on behalf of Vandana and I, thanks for the reply.
Looking forward to work with you.
Cheers,
Shreyas
On Mon, Jun 4, 2012 at 10:27 PM, Gael Varoquaux
Hi Vandana and Shreyas,
Welcome and thanks for the interest,
With regards to MLP (multi-layer perceptrons), David Marek is right now
https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp
you can probably pitch in with him: 4 eyes are always better than only 2.
With regard to EM for GMM, the scikit-learn has an implementation of this
class of algorithms in sklearn/mixture/gmm.py. This code is a little bit
outdated and can probably be improved in terms of readability, speed and
feature set.
Cheers,
Gaėl
Post by Vandana Bachani
Hi,
Me and my friend Shreyas want to contribute to the
scikit-learn code.
Post by Vandana Bachani
I want to add code for neural networks (Multi-layer
Perceptrons) and
Post by Vandana Bachani
Shreyas has some ideas for the Expecation-Maximization
algorithm and
Post by Vandana Bachani
Gaussian Mixture Models. Please let us know how we can
contribute to the
Post by Vandana Bachani
code and if we can discuss our ideas with someone on the
scikit team so
Post by Vandana Bachani
that we are not reinventing something that is already there.
About me: I am a Computer Science Masters student at Texas
A&M University
Post by Vandana Bachani
currently interning at Google. I have a basic version of
neural networks
Post by Vandana Bachani
for classification implemented in python as part of my
machine learning
Post by Vandana Bachani
class project (works well for UCI Datasets). I am planning to
extend it
Post by Vandana Bachani
for regression and optimize it to make it public.
Shreyas is a PhD student at University of Texas at El Paso and is
currently interning at Google.
We are planning to pair program on these ideas to make them
scikit worthy.
Post by Vandana Bachani
Thanks,
--
Shreyas Ashok Karkhedkar
PhD Candidate
Computer Science
University of Texas at El Paso
+1-240-494-6362
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2012-06-05 09:17:24 UTC
Permalink
Post by Gael Varoquaux
Hi Vandana and Shreyas,
Welcome and thanks for the interest,
With regards to MLP (multi-layer perceptrons), David Marek is right now
https://github.com/davidmarek/scikit-learn/tree/gsoc_mlp
you can probably pitch in with him: 4 eyes are always better than only 2.
You can also follow David's progress through his blog:

http://www.davidmarek.cz

Also don't forget to read the following first:

http://scikit-learn.org/dev/developers/index.html
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
David Marek
2012-06-05 15:09:15 UTC
Permalink
Hi,

As Gael and Olivier said, I am working on mlp this summer, it's my GSOC
project. So there is some existing code (in Cython) and you won't be able
to just use your class project, but you should definitely look at it. I
will be grateful for every help and suggestion. I have got basic
classification working as well, next steps I've got in my todo list are
hinge loss, momentum term and I always need more tests, so the code will be
modified heavily.

Good luck, I have started to work on scikit-learn only recently, but I must
say it's really great project.

David
Post by Vandana Bachani
Hi,
Me and my friend Shreyas want to contribute to the scikit-learn code.
I want to add code for neural networks (Multi-layer Perceptrons) and
Shreyas has some ideas for the Expecation-Maximization algorithm and
Gaussian Mixture Models. Please let us know how we can contribute to the
code and if we can discuss our ideas with someone on the scikit team so
that we are not reinventing something that is already there.
About me: I am a Computer Science Masters student at Texas A&M University
currently interning at Google. I have a basic version of neural networks
for classification implemented in python as part of my machine learning
class project (works well for UCI Datasets). I am planning to extend it for
regression and optimize it to make it public.
Shreyas is a PhD student at University of Texas at El Paso and is
currently interning at Google.
We are planning to pair program on these ideas to make them scikit worthy.
Thanks,
--
Vandana Bachani
Graduate Student, MSCE
Computer Science & Engineering Department
Texas A&M University, College Station
--
Vandana Bachani
Graduate Student, MSCE
Computer Science & Engineering Department
Texas A&M University, College Station
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
David Marek
2012-06-05 17:51:50 UTC
Permalink
I think you sent this mail only to me, please send all mails to mailling
list. Btw. Andreas is my mentor, so he is the one in charge here :-)

Ad 1) Afaik all you need is one hidden layer, it's certainly possible to
add the possibility, but I think we decided that it's not a priority.

Ad 2) Good idea

David

---------- Forwarded message ----------
From: Vandana Bachani <***@gmail.com>
Date: Tue, Jun 5, 2012 at 6:59 PM
Subject: Re: [Scikit-learn-general] Contributing to scikit-learn
To: ***@gmail.com


Hi David,
I think we can add the following also to the to do list:
1. Any number of hidden layers and hidden units should be supported.
2. Missing data should be handled (several UCI datasets have missing data).

I will look at the code and then send you a mail about my thoughts on the
same.

If you would like to have a look at my project report, I am attaching the
same.

Thanks,
Vandana
David Warde-Farley
2012-06-07 03:12:40 UTC
Permalink
Post by David Marek
1) Afaik all you need is one hidden layer,
The universal approximator theorem says that any continuous function can be approximated arbitrarily well if you have one hidden layer with enough hidden units, but it says nothing about the ease of finding that solution, nor about the efficiency of the solution (you can prove that certain functions that can be compactly represented by a deep network require exponentially many more hidden units if you're restricted to one layer).

However, with purely supervised training deeper networks are harder to fit (you can get to about 2 hidden layers if you're careful but beyond that it gets quite hard), so I wouldn't worry about it. In a "black box" implementation for scikit-learn, where the user isn't expected to be an expert in training neural nets, a single hidden layer is probably plenty.

David
xinfan meng
2012-06-07 03:34:10 UTC
Permalink
Deep learning literature said that the more layers you have, the less
hidden nodes in one layer you need. But I agree one hidden layer would be
sufficient now.

On Thu, Jun 7, 2012 at 11:12 AM, David Warde-Farley <
Post by David Warde-Farley
Post by David Marek
1) Afaik all you need is one hidden layer,
The universal approximator theorem says that any continuous function can
be approximated arbitrarily well if you have one hidden layer with enough
hidden units, but it says nothing about the ease of finding that solution,
nor about the efficiency of the solution (you can prove that certain
functions that can be compactly represented by a deep network require
exponentially many more hidden units if you're restricted to one layer).
However, with purely supervised training deeper networks are harder to fit
(you can get to about 2 hidden layers if you're careful but beyond that it
gets quite hard), so I wouldn't worry about it. In a "black box"
implementation for scikit-learn, where the user isn't expected to be an
expert in training neural nets, a single hidden layer is probably plenty.
David
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan蒙新泛
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
Andreas Mueller
2012-06-07 12:25:55 UTC
Permalink
Hi everybody!
David, it's your project, I'm just trying to help along ;)
About 2): Afaik there is nothing in sklearn at the moment
that can deal with missing variables and I feel the MLP
is one of the estimators where dealing with missing values
is hardest.
@David: I wouldn't keep you from trying but it seems a bit
out of the scope of the MLP. I think the idea for missing data
was to provide an additional mask as input that says
which values are missing. Dealing with this is much more natural
in naive Bayes or tree based methods than in the MLP I think.

@Vandana: For dealing with missing data, one easy way is to
set the missing variables to their mean over the dataset.
Usually for MLPs the input should be zero mean, unit variance.
So the missing variable would be just set to 0.
Do you know of any better way of dealing with missing values
in MLPs?

Cheers,
Andy
Post by David Marek
I think you sent this mail only to me, please send all mails to
mailling list. Btw. Andreas is my mentor, so he is the one in charge
here :-)
Ad 1) Afaik all you need is one hidden layer, it's certainly possible
to add the possibility, but I think we decided that it's not a priority.
Ad 2) Good idea
David
---------- Forwarded message ----------
Date: Tue, Jun 5, 2012 at 6:59 PM
Subject: Re: [Scikit-learn-general] Contributing to scikit-learn
Hi David,
1. Any number of hidden layers and hidden units should be supported.
2. Missing data should be handled (several UCI datasets have missing data).
I will look at the code and then send you a mail about my thoughts on
the same.
If you would like to have a look at my project report, I am attaching
the same.
Thanks,
Vandana
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
LI Wei
2012-06-07 15:09:11 UTC
Permalink
Intuitively maybe we can set the missing values using the average over the
nearest neighbors calculated using these existing features? Not sure
whether it is the correct way to do it :-)

Cheers,
LI, Wei

On Thu, Jun 7, 2012 at 12:25 PM, Andreas Mueller
Post by Andreas Mueller
Hi everybody!
David, it's your project, I'm just trying to help along ;)
About 2): Afaik there is nothing in sklearn at the moment
that can deal with missing variables and I feel the MLP
is one of the estimators where dealing with missing values
is hardest.
@David: I wouldn't keep you from trying but it seems a bit
out of the scope of the MLP. I think the idea for missing data
was to provide an additional mask as input that says
which values are missing. Dealing with this is much more natural
in naive Bayes or tree based methods than in the MLP I think.
@Vandana: For dealing with missing data, one easy way is to
set the missing variables to their mean over the dataset.
Usually for MLPs the input should be zero mean, unit variance.
So the missing variable would be just set to 0.
Do you know of any better way of dealing with missing values
in MLPs?
Cheers,
Andy
I think you sent this mail only to me, please send all mails to mailling
list. Btw. Andreas is my mentor, so he is the one in charge here :-)
Ad 1) Afaik all you need is one hidden layer, it's certainly possible to
add the possibility, but I think we decided that it's not a priority.
Ad 2) Good idea
David
---------- Forwarded message ----------
Date: Tue, Jun 5, 2012 at 6:59 PM
Subject: Re: [Scikit-learn-general] Contributing to scikit-learn
Hi David,
1. Any number of hidden layers and hidden units should be supported.
2. Missing data should be handled (several UCI datasets have missing data).
I will look at the code and then send you a mail about my thoughts on
the same.
If you would like to have a look at my project report, I am attaching
the same.
Thanks,
Vandana
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
David Warde-Farley
2012-06-07 16:09:46 UTC
Permalink
Post by LI Wei
Intuitively maybe we can set the missing values using the average over the
nearest neighbors calculated using these existing features? Not sure
whether it is the correct way to do it :-)
That's known as "imputation" (or in a particular variant, "k-NN impute").

In general how you treat missing values will depend a lot on your statistical
assumptions, and thus it would be very unwise to have a "one size fits all"
approach to handling missing data, at least without qualifying it as based
on one assumption or another.

Like the independent-and-identically-distributed assumption, the relevant
assumptions are "missing at random" (where the assumption is that the
probability of observing a feature is independent of that feature's value)
and "missing completely at random" (where the assumption is that the
probability of observing a given feature is independent of ALL the features
observed for that training case).

In the case of neural networks, for MAR or MCAR data, simply setting the
feature to zero is not completely crazy, especially when doing stochastic
gradient descent, as the weights update will get multiplied by that zero for
that specific training case. In fact, artificially introducing zeros
("masking noise") is a neat way to encourage robustness for some problems
even when you don't have missing data. For not-missing-at-random data you'd
need to modify the cost function to incorporate your model of how frequently
and when things drop out, and probably estimate the parameters of that model
simultaneously with the MLP parameters -- not something you can really
prepackage.

David
eat
2012-06-07 17:16:19 UTC
Permalink
Hi,
Post by LI Wei
Intuitively maybe we can set the missing values using the average over the
nearest neighbors calculated using these existing features? Not sure
whether it is the correct way to do it :-)
I think the key question is: how reliable manner one can estimate the mean
(and variance) here.

With data sets containing both missing values and outliers, I doubt that
there exists any simple, generally accepted. way to both detect outliers
(so that their impact on mean and variance is counted for) and same time
impute missing values.

However it might be possible to incorporate some domain specific
knowledge in order to move on. So, in summary, what kind of schemes there
exists to add (ad hoc) domain specific knowledge systematic manner into the
modeling process?


My 2 cents,
-eat
Post by LI Wei
Cheers,
LI, Wei
Post by Andreas Mueller
Hi everybody!
David, it's your project, I'm just trying to help along ;)
About 2): Afaik there is nothing in sklearn at the moment
that can deal with missing variables and I feel the MLP
is one of the estimators where dealing with missing values
is hardest.
@David: I wouldn't keep you from trying but it seems a bit
out of the scope of the MLP. I think the idea for missing data
was to provide an additional mask as input that says
which values are missing. Dealing with this is much more natural
in naive Bayes or tree based methods than in the MLP I think.
@Vandana: For dealing with missing data, one easy way is to
set the missing variables to their mean over the dataset.
Usually for MLPs the input should be zero mean, unit variance.
So the missing variable would be just set to 0.
Do you know of any better way of dealing with missing values
in MLPs?
Cheers,
Andy
I think you sent this mail only to me, please send all mails to mailling
list. Btw. Andreas is my mentor, so he is the one in charge here :-)
Ad 1) Afaik all you need is one hidden layer, it's certainly possible to
add the possibility, but I think we decided that it's not a priority.
Ad 2) Good idea
David
---------- Forwarded message ----------
Date: Tue, Jun 5, 2012 at 6:59 PM
Subject: Re: [Scikit-learn-general] Contributing to scikit-learn
Hi David,
1. Any number of hidden layers and hidden units should be supported.
2. Missing data should be handled (several UCI datasets have missing data).
I will look at the code and then send you a mail about my thoughts on
the same.
If you would like to have a look at my project report, I am attaching
the same.
Thanks,
Vandana
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vandana Bachani
2012-06-07 17:40:32 UTC
Permalink
Hi Andreas,

I agree missing data is not specific to MLP.
We dealt it with pretty simple as u mentioned by taking mean over the
dataset for continuous-valued attributes.
Another thing that I feel is not adequately explored in the scikit
implementations is the discrete attributes.
Classification problems with discrete input features or a mix of discrete
and continuous features cannot be handled well. Many UCI datasets have a
mix of discrete and continuous attributes.
For discrete attributes we consider the missing values as another kind of
discrete value namely 'UNKNOWN'.

And I mentioned about allowing for multiple hidden layers because its just
a flexibility we would like to give to more advanced users of MLP who might
like to experiment with different number of hidden units in case of
difficult problems.

Thanks,
Vandana
Post by eat
Hi,
Post by LI Wei
Intuitively maybe we can set the missing values using the average over
the nearest neighbors calculated using these existing features? Not sure
whether it is the correct way to do it :-)
I think the key question is: how reliable manner one can estimate the mean
(and variance) here.
With data sets containing both missing values and outliers, I doubt that
there exists any simple, generally accepted. way to both detect outliers
(so that their impact on mean and variance is counted for) and same time
impute missing values.
However it might be possible to incorporate some domain specific
knowledge in order to move on. So, in summary, what kind of schemes there
exists to add (ad hoc) domain specific knowledge systematic manner into the
modeling process?
My 2 cents,
-eat
Post by LI Wei
Cheers,
LI, Wei
On Thu, Jun 7, 2012 at 12:25 PM, Andreas Mueller <
Post by Andreas Mueller
Hi everybody!
David, it's your project, I'm just trying to help along ;)
About 2): Afaik there is nothing in sklearn at the moment
that can deal with missing variables and I feel the MLP
is one of the estimators where dealing with missing values
is hardest.
@David: I wouldn't keep you from trying but it seems a bit
out of the scope of the MLP. I think the idea for missing data
was to provide an additional mask as input that says
which values are missing. Dealing with this is much more natural
in naive Bayes or tree based methods than in the MLP I think.
@Vandana: For dealing with missing data, one easy way is to
set the missing variables to their mean over the dataset.
Usually for MLPs the input should be zero mean, unit variance.
So the missing variable would be just set to 0.
Do you know of any better way of dealing with missing values
in MLPs?
Cheers,
Andy
I think you sent this mail only to me, please send all mails to mailling
list. Btw. Andreas is my mentor, so he is the one in charge here :-)
Ad 1) Afaik all you need is one hidden layer, it's certainly possible to
add the possibility, but I think we decided that it's not a priority.
Ad 2) Good idea
David
---------- Forwarded message ----------
Date: Tue, Jun 5, 2012 at 6:59 PM
Subject: Re: [Scikit-learn-general] Contributing to scikit-learn
Hi David,
1. Any number of hidden layers and hidden units should be supported.
2. Missing data should be handled (several UCI datasets have missing data).
I will look at the code and then send you a mail about my thoughts on
the same.
If you would like to have a look at my project report, I am attaching
the same.
Thanks,
Vandana
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Vandana Bachani
Graduate Student, MSCE
Computer Science & Engineering Department
Texas A&M University, College Station
David Warde-Farley
2012-06-07 18:12:36 UTC
Permalink
Post by Vandana Bachani
Hi Andreas,
I agree missing data is not specific to MLP.
We dealt it with pretty simple as u mentioned by taking mean over the
dataset for continuous-valued attributes.
Another thing that I feel is not adequately explored in the scikit
implementations is the discrete attributes.
Classification problems with discrete input features or a mix of discrete
and continuous features cannot be handled well. Many UCI datasets have a
mix of discrete and continuous attributes.
For discrete attributes we consider the missing values as another kind of
discrete value namely 'UNKNOWN'.
How are you encoding the discrete features? As one-hot vectors?

In that case, a natural encoding for "unknown" is a zero-vector, as the
stochastic gradient step will represent a no-op with respect to all of the
weights for every possible value of that feature. Whether it's sensible
to do *only* this depends, again, on whether the data is assumed
missing-at-random or not.

David
Vandana Bachani
2012-06-07 19:42:18 UTC
Permalink
Hi David,
Yes I use one-hot encoding, but my understanding of one-hot encoding says
that each discrete attribute can be represented as a bit pattern. So the
node corresponding to that input attribute is actually a set of nodes
representing that bit pattern. An unknown just means that the bit for
unknown value is set to one and rest are set to 0. At any instance the
nodes corresponding to an input attribute will have atleast one node with a
value of 1. The downside of using one hot encoding is that it bloats up the
weight space and the number of input units but I guess thats ok as this is
one of the best ways of doing discrete attribute classification if we are
to use MLPs.

Thanks,
Vandana

On Thu, Jun 7, 2012 at 11:12 AM, David Warde-Farley <
Post by David Warde-Farley
Post by Vandana Bachani
Hi Andreas,
I agree missing data is not specific to MLP.
We dealt it with pretty simple as u mentioned by taking mean over the
dataset for continuous-valued attributes.
Another thing that I feel is not adequately explored in the scikit
implementations is the discrete attributes.
Classification problems with discrete input features or a mix of discrete
and continuous features cannot be handled well. Many UCI datasets have a
mix of discrete and continuous attributes.
For discrete attributes we consider the missing values as another kind of
discrete value namely 'UNKNOWN'.
How are you encoding the discrete features? As one-hot vectors?
In that case, a natural encoding for "unknown" is a zero-vector, as the
stochastic gradient step will represent a no-op with respect to all of the
weights for every possible value of that feature. Whether it's sensible
to do *only* this depends, again, on whether the data is assumed
missing-at-random or not.
David
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Vandana Bachani
Graduate Student, MSCE
Computer Science & Engineering Department
Texas A&M University, College Station
Continue reading on narkive:
Loading...