Discussion:
Algorithms for Manifold Learning
(too old to reply)
Daniel McNeela
2016-04-25 05:59:42 UTC
Permalink
Hi All,

My name is Daniel McNeela, and I am a student at UC Berkeley participating
in Google Summer of Code 2016. I am working on the Fovea project under the
umbrella of the International Neuroinformatics Coordinating Facility. The
abstract for my project can be found here:
https://summerofcode.withgoogle.com/projects/#5940697098092544

To be brief, Fovea is a Python tool for visualizing dynamical systems and
associated data, and an integral part of the back end for the software
involves performing both linear and nonlinear dimensionality reduction on
data sets. My project mentor would like to add scikit-learn as a dependency
since it already has a number of manifold learning algorithms implemented.
However, I am planning on using two additional algorithms that are not
currently implemented in scikit-learn, namely Sammon Mapping and Principal
Curve Analysis, and I was wondering whether the developer team would be
interested in incorporating these two algorithms into scikit-learn's
existing Manifold Learning package.

Please let me know your thoughts. Information regarding these two
algorithms can be found at the following links:

http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0910/henderson.pdf

https://web.stanford.edu/~hastie/Papers/Principal_Curves.pdf

<https://web.stanford.edu/~hastie/Papers/Principal_Curves.pdf>
Thanks for your time, and looking forward to hearing from you!


- Daniel
Olivier Grisel
2016-04-25 15:17:40 UTC
Permalink
I would advise you to first implement those 2 new estimators outside
of the scikit-learn code-base to not suffer from delays imposed by the
scikit-learn review process (that lacks man-power). But if you follow
strictly the scikit-learn code conventions and in particular the
convention for making estimator class are scikit-learn compatible.

http://scikit-learn.org/dev/developers/contributing.html#rolling-your-own-estimator

You might find this template project handy to automatically test that
your estimators are scikit-learn compatible:

https://github.com/scikit-learn-contrib/project-template

Once your new estimators pass the test_common compliance suite, we can
re-open a discussion for inclusion in the scikit-learn proper, based
on the criteria in:

http://scikit-learn.org/dev/faq.html#what-are-the-inclusion-criteria-for-new-algorithms

If the scikit-learn developers decide that those estimators do not
match those criteria you would still be welcome to contribute the
project under the http://contrib.scikit-learn.org/ umbrella.
--
Olivier Grisel
Matthieu Brucher
2016-04-25 16:11:15 UTC
Permalink
Hi Daniel,

I think in the original scikit pull request on my PhD thesis almost 10
years ago, there may have been some Sammon mapping code. IIRC, the mapping
is really crude and not robust. I think there are other cost functions for
dimensionality reduction that are far more efficient and do not have the
same drawbacks than Sammon mapping.
I don't remember my position on PCA, I know that I had a look at it but
never implemented it. What is the purpose of implementing this one in
particular?

Cheers,

Matthieu
Post by Daniel McNeela
Hi All,
My name is Daniel McNeela, and I am a student at UC Berkeley participating
in Google Summer of Code 2016. I am working on the Fovea project under the
umbrella of the International Neuroinformatics Coordinating Facility. The
https://summerofcode.withgoogle.com/projects/#5940697098092544
To be brief, Fovea is a Python tool for visualizing dynamical systems and
associated data, and an integral part of the back end for the software
involves performing both linear and nonlinear dimensionality reduction on
data sets. My project mentor would like to add scikit-learn as a dependency
since it already has a number of manifold learning algorithms implemented.
However, I am planning on using two additional algorithms that are not
currently implemented in scikit-learn, namely Sammon Mapping and Principal
Curve Analysis, and I was wondering whether the developer team would be
interested in incorporating these two algorithms into scikit-learn's
existing Manifold Learning package.
Please let me know your thoughts. Information regarding these two
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0910/henderson.pdf
https://web.stanford.edu/~hastie/Papers/Principal_Curves.pdf
<https://web.stanford.edu/~hastie/Papers/Principal_Curves.pdf>
Thanks for your time, and looking forward to hearing from you!
- Daniel
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Daniel McNeela
2016-04-26 04:52:59 UTC
Permalink
Thank you Matthieu and Olivier for your help.

It sounds like, based on what Olivier said, that a good approach would be
for me to implement the algorithms in a way that ensures compatibility with
scikit-learn and then submit them for consideration for inclusion once they
are fully completed.

Matthieu, my original reasoning behind implementing Sammon mapping was that
it seemed to be relatively intuitive to understand to the point where
people using the software could modify it and devise their own metrics to
suit the needs of their research. Having gone back today and given it a
second look, it does seem rather crude as you mentioned, so maybe it would
be best if I hold off on that.

As far as Principle Curve Analysis, I liked that it was well established (I
believe it was one of the earlier manifold learning algorithms introduced)
and that it intuitively generalizes Principle Component Analysis to account
for nonlinearity. Since Principle Component Analysis is widely used and
implemented, Principle Curve Analysis seemed like a natural algorithm to
include for nonlinear cases.

I was looking at some of the other Manifold Learning algorithms currently
in use, and it appears that Topologically Constrained Isometric Embedding
offers improvements over many of the algorithms currently in scikit-learn,
such as Isomap, LLE, and Eigenmapping. In particular, it seems to perform
more robustly in response to noisy and non-convex data. This paper offers a
nice comparison between TCIE and the existing algorithms.

http://people.csail.mit.edu/rosman/tcie_ijcv.pdf

I would certainly be interested in implementing TCIE if there's any
interest in having it included in scikit-learn.

Cheers,

Daniel

On Mon, Apr 25, 2016 at 9:11 AM, Matthieu Brucher <
Post by Matthieu Brucher
Hi Daniel,
I think in the original scikit pull request on my PhD thesis almost 10
years ago, there may have been some Sammon mapping code. IIRC, the mapping
is really crude and not robust. I think there are other cost functions for
dimensionality reduction that are far more efficient and do not have the
same drawbacks than Sammon mapping.
I don't remember my position on PCA, I know that I had a look at it but
never implemented it. What is the purpose of implementing this one in
particular?
Cheers,
Matthieu
Post by Daniel McNeela
Hi All,
My name is Daniel McNeela, and I am a student at UC Berkeley
participating in Google Summer of Code 2016. I am working on the Fovea
project under the umbrella of the International Neuroinformatics
https://summerofcode.withgoogle.com/projects/#5940697098092544
To be brief, Fovea is a Python tool for visualizing dynamical systems and
associated data, and an integral part of the back end for the software
involves performing both linear and nonlinear dimensionality reduction on
data sets. My project mentor would like to add scikit-learn as a dependency
since it already has a number of manifold learning algorithms implemented.
However, I am planning on using two additional algorithms that are not
currently implemented in scikit-learn, namely Sammon Mapping and Principal
Curve Analysis, and I was wondering whether the developer team would be
interested in incorporating these two algorithms into scikit-learn's
existing Manifold Learning package.
Please let me know your thoughts. Information regarding these two
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0910/henderson.pdf
https://web.stanford.edu/~hastie/Papers/Principal_Curves.pdf
<https://web.stanford.edu/~hastie/Papers/Principal_Curves.pdf>
Thanks for your time, and looking forward to hearing from you!
- Daniel
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Matthieu Brucher
2016-04-28 17:12:22 UTC
Permalink
Hi,

TCIE is interesting because it's the small additional step that is relevant
IMHO. With this additional step, you can build Sammon mapping on top of it
(basically just switch step 4 for Sammon optimization). I would cite here
my paper on different cost functions
https://www.researchgate.net/publication/220058500_A_Metric_Multidimensional_Scaling-Based_Nonlinear_Manifold_Learning_Approach_for_Unsupervised_Data_Reduction?ev=prf_pub
All cost function based dimensionality reduction algorithms could be
enhanced with the TCIE step, so definitely something to try out.

Cheers,

Matthieu
Post by Daniel McNeela
Thank you Matthieu and Olivier for your help.
It sounds like, based on what Olivier said, that a good approach would be
for me to implement the algorithms in a way that ensures compatibility with
scikit-learn and then submit them for consideration for inclusion once they
are fully completed.
Matthieu, my original reasoning behind implementing Sammon mapping was
that it seemed to be relatively intuitive to understand to the point where
people using the software could modify it and devise their own metrics to
suit the needs of their research. Having gone back today and given it a
second look, it does seem rather crude as you mentioned, so maybe it would
be best if I hold off on that.
As far as Principle Curve Analysis, I liked that it was well established
(I believe it was one of the earlier manifold learning algorithms
introduced) and that it intuitively generalizes Principle Component
Analysis to account for nonlinearity. Since Principle Component Analysis is
widely used and implemented, Principle Curve Analysis seemed like a natural
algorithm to include for nonlinear cases.
I was looking at some of the other Manifold Learning algorithms currently
in use, and it appears that Topologically Constrained Isometric Embedding
offers improvements over many of the algorithms currently in scikit-learn,
such as Isomap, LLE, and Eigenmapping. In particular, it seems to perform
more robustly in response to noisy and non-convex data. This paper offers a
nice comparison between TCIE and the existing algorithms.
http://people.csail.mit.edu/rosman/tcie_ijcv.pdf
I would certainly be interested in implementing TCIE if there's any
interest in having it included in scikit-learn.
Cheers,
Daniel
On Mon, Apr 25, 2016 at 9:11 AM, Matthieu Brucher <
Post by Matthieu Brucher
Hi Daniel,
I think in the original scikit pull request on my PhD thesis almost 10
years ago, there may have been some Sammon mapping code. IIRC, the mapping
is really crude and not robust. I think there are other cost functions for
dimensionality reduction that are far more efficient and do not have the
same drawbacks than Sammon mapping.
I don't remember my position on PCA, I know that I had a look at it but
never implemented it. What is the purpose of implementing this one in
particular?
Cheers,
Matthieu
Post by Daniel McNeela
Hi All,
My name is Daniel McNeela, and I am a student at UC Berkeley
participating in Google Summer of Code 2016. I am working on the Fovea
project under the umbrella of the International Neuroinformatics
https://summerofcode.withgoogle.com/projects/#5940697098092544
To be brief, Fovea is a Python tool for visualizing dynamical systems
and associated data, and an integral part of the back end for the software
involves performing both linear and nonlinear dimensionality reduction on
data sets. My project mentor would like to add scikit-learn as a dependency
since it already has a number of manifold learning algorithms implemented.
However, I am planning on using two additional algorithms that are not
currently implemented in scikit-learn, namely Sammon Mapping and Principal
Curve Analysis, and I was wondering whether the developer team would be
interested in incorporating these two algorithms into scikit-learn's
existing Manifold Learning package.
Please let me know your thoughts. Information regarding these two
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0910/henderson.pdf
https://web.stanford.edu/~hastie/Papers/Principal_Curves.pdf
<https://web.stanford.edu/~hastie/Papers/Principal_Curves.pdf>
Thanks for your time, and looking forward to hearing from you!
- Daniel
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Daniel McNeela
2016-04-30 18:05:09 UTC
Permalink
Great, I will look to implement TCIE then and get back in touch when I have
some finalized code. Ideally I will try and structure it so that it
integrates the small additional step as an improvement to the existing
algorithms, rather than creating it as a standalone function.

Best,

Daniel

On Thu, Apr 28, 2016 at 10:12 AM, Matthieu Brucher <
Post by Matthieu Brucher
Hi,
TCIE is interesting because it's the small additional step that is
relevant IMHO. With this additional step, you can build Sammon mapping on
top of it (basically just switch step 4 for Sammon optimization). I would
cite here my paper on different cost functions
https://www.researchgate.net/publication/220058500_A_Metric_Multidimensional_Scaling-Based_Nonlinear_Manifold_Learning_Approach_for_Unsupervised_Data_Reduction?ev=prf_pub
All cost function based dimensionality reduction algorithms could be
enhanced with the TCIE step, so definitely something to try out.
Cheers,
Matthieu
Post by Daniel McNeela
Thank you Matthieu and Olivier for your help.
It sounds like, based on what Olivier said, that a good approach would be
for me to implement the algorithms in a way that ensures compatibility with
scikit-learn and then submit them for consideration for inclusion once they
are fully completed.
Matthieu, my original reasoning behind implementing Sammon mapping was
that it seemed to be relatively intuitive to understand to the point where
people using the software could modify it and devise their own metrics to
suit the needs of their research. Having gone back today and given it a
second look, it does seem rather crude as you mentioned, so maybe it would
be best if I hold off on that.
As far as Principle Curve Analysis, I liked that it was well established
(I believe it was one of the earlier manifold learning algorithms
introduced) and that it intuitively generalizes Principle Component
Analysis to account for nonlinearity. Since Principle Component Analysis is
widely used and implemented, Principle Curve Analysis seemed like a natural
algorithm to include for nonlinear cases.
I was looking at some of the other Manifold Learning algorithms currently
in use, and it appears that Topologically Constrained Isometric Embedding
offers improvements over many of the algorithms currently in scikit-learn,
such as Isomap, LLE, and Eigenmapping. In particular, it seems to perform
more robustly in response to noisy and non-convex data. This paper offers a
nice comparison between TCIE and the existing algorithms.
http://people.csail.mit.edu/rosman/tcie_ijcv.pdf
I would certainly be interested in implementing TCIE if there's any
interest in having it included in scikit-learn.
Cheers,
Daniel
On Mon, Apr 25, 2016 at 9:11 AM, Matthieu Brucher <
Post by Matthieu Brucher
Hi Daniel,
I think in the original scikit pull request on my PhD thesis almost 10
years ago, there may have been some Sammon mapping code. IIRC, the mapping
is really crude and not robust. I think there are other cost functions for
dimensionality reduction that are far more efficient and do not have the
same drawbacks than Sammon mapping.
I don't remember my position on PCA, I know that I had a look at it but
never implemented it. What is the purpose of implementing this one in
particular?
Cheers,
Matthieu
Post by Daniel McNeela
Hi All,
My name is Daniel McNeela, and I am a student at UC Berkeley
participating in Google Summer of Code 2016. I am working on the Fovea
project under the umbrella of the International Neuroinformatics
https://summerofcode.withgoogle.com/projects/#5940697098092544
To be brief, Fovea is a Python tool for visualizing dynamical systems
and associated data, and an integral part of the back end for the software
involves performing both linear and nonlinear dimensionality reduction on
data sets. My project mentor would like to add scikit-learn as a dependency
since it already has a number of manifold learning algorithms implemented.
However, I am planning on using two additional algorithms that are not
currently implemented in scikit-learn, namely Sammon Mapping and Principal
Curve Analysis, and I was wondering whether the developer team would be
interested in incorporating these two algorithms into scikit-learn's
existing Manifold Learning package.
Please let me know your thoughts. Information regarding these two
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0910/henderson.pdf
https://web.stanford.edu/~hastie/Papers/Principal_Curves.pdf
<https://web.stanford.edu/~hastie/Papers/Principal_Curves.pdf>
Thanks for your time, and looking forward to hearing from you!
- Daniel
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...