Discussion:
Light manifold branch status
(too old to reply)
Matthieu Brucher
2010-09-20 18:59:13 UTC
Permalink
Hi,

As Alexandre and Gaël suggested, I've created a branch with just a few
manifold algorithms:
- the Barycenter Mapping
- for the embeddings, we have:
- LLE
- Laplacian Eigenmaps
- Diffusion Maps
- Hessian Eigenmaps

What is still missing:
- the complete documentation in manifold.rst
- cross validating LLE with the Matlab function
- add diffusion walk in Diffusion Maps

Two tests fail because of the missing parts. Coverage is almost 100%,
although path coverage could still be enhanced.

If you have comments, I'm all ears.

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Alexandre Gramfort
2010-09-21 21:30:04 UTC
Permalink
Hi Matthieu,

here is quick review.

I'm sure Gael will also have some comments sent from China :)

Alex

----

- I would use 2D embeddings/plots for examples (don't see much in 3D with
matplotlib)
- Could you a make a classic swissroll example?
- dist2hd() can be optimized see euclidian_distances() in mean_shift_.py (this
function should probably be taken out of mean_shift_.py)
- I would vote for a flat organisation of files and for filenames that match
with the method implemented:
manifold / lle.py (contains LLE class)
laplacian_eigenmaps.py (contains LaplacianEigenmaps class)
diffusion_maps.py (contains DiffusionMaps class)
base.py (would contain Embedding class)
barycenters.py
graph.py (all the routines related to graph construction)
With the current file organization and with my little background on manifold
learning, I have to admit that the file names are not really helping
me to figure
out where the algorithms are implemented.
- I vote for the optional support of pyamg to speed up things.
- I think a benchmark should be added to see how the different algorithms scale
with the number of samples. You can bench against NiPy. The algorithms should
scale to problems with thousands of points assuming the neighborhood used are
small.
- Could you explain a bit more what you mean by builder and mapper and
list a few
possibilities for these. The underlying question, is do we need such generic
programming if there is only one real option?
- You don't use : self._set_params(**params) at the beginning of the fit()
functions.


cosmits :
- we use : "import numpy as np" every else in the code
- make sure to follow PEP8 (lines < 80 char., 2 lines between functions, 1 line
between class methods, hessianMap() -> hessian_map())

PyFlakes report:

barycenters.py:6: 'math' imported but unused
barycenters.py:12: 'create_neighborer' imported but unused
barycenters.py:42: local variable 'bary' is assigned to but never used
embedding.py:3: 'numpy' imported but unused
embedding.py:4: 'math' imported but unused
setup.py:2: 'numpy' imported but unused
similarities_mds.py:10: 'math' imported but unused
tests/test_barycenters.py:7: 'assert_raises' imported but unused
tests/test_barycenters.py:7: 'assert_array_equal' imported but unused
tests/test_similarities.py:4: 'unittest' imported but unused
tests/test_similarities.py:6: 'assert_array_equal' imported but unused
tests/test_similarities.py:6: 'assert_raises' imported but unused
tests/test_similarities.py:14: 'dist2hd' imported but unused
tests/test_similarities_mds.py:4: 'unittest' imported but unused
tests/test_similarities_mds.py:6: 'assert_array_equal' imported but unused
tests/test_similarities_mds.py:6: 'assert_raises' imported but unused
tests/test_similarities_mds.py:14: 'dist2hd' imported but unused
tests/test_tools.py:5: 'assert_array_equal' imported but unused
tests/test_tools.py:5: 'assert_raises' imported but unused
Matthieu Brucher
2010-09-22 06:16:45 UTC
Permalink
Thanks for the comments !
Post by Alexandre Gramfort
Hi Matthieu,
here is quick review.
I'm sure Gael will also have some comments sent from China :)
Alex
----
- I would use 2D embeddings/plots for examples (don't see much in 3D with
matplotlib)
I can do that, but be aware that you may not see much on the IRIS
dataset in 2D :|
Post by Alexandre Gramfort
- Could you a make a classic swissroll example?
Of course, I'll try to add it in the datasets
Post by Alexandre Gramfort
- dist2hd() can be optimized see euclidian_distances() in mean_shift_.py (this
function should probably be taken out of mean_shift_.py)
Will do.
Post by Alexandre Gramfort
- I would vote for a flat organisation of files and for filenames that match
   manifold / lle.py (contains LLE class)
              laplacian_eigenmaps.py (contains LaplacianEigenmaps class)
              diffusion_maps.py (contains DiffusionMaps class)
              base.py (would contain Embedding class)
              barycenters.py
              graph.py (all the routines related to graph construction)
With the current file organization and with my little background on manifold
learning, I have to admit that the file names are not really helping
me to figure
out where the algorithms are implemented.
Yes but no :D Laplacian Eigenmaps and Diffusion Maps share 99% of
their code, they are just a different approach for exactly the same
problem. I can put HessianMap and LLE in seperate files, but not those
two :|
Flat means it can be messy in the end between embeddign algorithms and
mapping ones. Perhaps the question should be asled to Fabian?
Post by Alexandre Gramfort
- I vote for the optional support of pyamg to speed up things.
I didn't retrieve Gaël's code, it will be done.
Post by Alexandre Gramfort
- I think a benchmark should be added to see how the different algorithms scale
with the number of samples. You can bench against NiPy. The algorithms should
scale to problems with thousands of points assuming the neighborhood used are
small.
Indeed, those algorithms should scale well, it won't be the case of
the other algorithms once we integrate them.
Post by Alexandre Gramfort
- Could you explain a bit more what you mean by builder and mapper and
list a few
possibilities for these. The underlying question, is do we need such generic
programming if there is only one real option?
The builder will build a mapping between the embedded space and the
original space. By default, it uses a barycenter approach, but one
should be able to use other mappings:
- PLM
- Kernel mapping (this is what is use by NiPy albeit not in a generic way)
Post by Alexandre Gramfort
- You don't use : self._set_params(**params) at the beginning of the fit()
functions.
What is this function purpose? What is params? I followed the
BaseEstimator API at the beginning, but I wasn't noticed about the
last changes :|
Post by Alexandre Gramfort
- we use : "import numpy as np" every else in the code
- make sure to follow PEP8 (lines < 80 char., 2 lines between functions, 1 line
between class methods, hessianMap() -> hessian_map())
I still have much cosmit to do :(
Post by Alexandre Gramfort
barycenters.py:6: 'math' imported but unused
barycenters.py:12: 'create_neighborer' imported but unused
barycenters.py:42: local variable 'bary' is assigned to but never used
embedding.py:3: 'numpy' imported but unused
embedding.py:4: 'math' imported but unused
setup.py:2: 'numpy' imported but unused
similarities_mds.py:10: 'math' imported but unused
tests/test_barycenters.py:7: 'assert_raises' imported but unused
tests/test_barycenters.py:7: 'assert_array_equal' imported but unused
tests/test_similarities.py:4: 'unittest' imported but unused
tests/test_similarities.py:6: 'assert_array_equal' imported but unused
tests/test_similarities.py:6: 'assert_raises' imported but unused
tests/test_similarities.py:14: 'dist2hd' imported but unused
tests/test_similarities_mds.py:4: 'unittest' imported but unused
tests/test_similarities_mds.py:6: 'assert_array_equal' imported but unused
tests/test_similarities_mds.py:6: 'assert_raises' imported but unused
tests/test_similarities_mds.py:14: 'dist2hd' imported but unused
tests/test_tools.py:5: 'assert_array_equal' imported but unused
tests/test_tools.py:5: 'assert_raises' imported but unused
Didn't know of this tool. I'll try to clean up things a little bit.

Once more, thanks for your comments, I'll try to find time in the next
few days to fix what can be fixed.

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Gael Varoquaux
2010-09-22 13:50:12 UTC
Permalink
Just a quick comment on the comment :). I have more comming, but I am
still busy reviewing the code (during the boring talks).
Post by Matthieu Brucher
Post by Alexandre Gramfort
- I would use 2D embeddings/plots for examples (don't see much in 3D with
matplotlib)
I can do that, but be aware that you may not see much on the IRIS
dataset in 2D :|
I actually like the 3D the way you do it :).
Post by Matthieu Brucher
Post by Alexandre Gramfort
- Could you a make a classic swissroll example?
Of course, I'll try to add it in the datasets
I would prefer if you add a function to
scikits.learn.datasets.sample_generator that generates that data set.
That way we don't have to carry around a static data file.
Post by Matthieu Brucher
Flat means it can be messy in the end between embeddign algorithms and
mapping ones.
About 90% of the users will not understand what you mean by
embedding/mapping. I am not sure that the nested approach will actually
serve the users.
Post by Matthieu Brucher
Post by Alexandre Gramfort
- I vote for the optional support of pyamg to speed up things.
I didn't retrieve Gaël's code, it will be done.
There is much more than this to optimise. I haven't finished reviewing
this part of the code, but the element-wise operations on the sparse
matrices do not seem optimized, you should take the optimal sparse matrix
format to feed to arpack (check out the code in my implmentation)
Post by Matthieu Brucher
Indeed, those algorithms should scale well, it won't be the case of
the other algorithms once we integrate them.
We can't afford to integrate algorithms that don't scale fairly well.
People will download the scikit, and try it on their data. If it is 'slow
as a dog', they'll move on.
Post by Matthieu Brucher
Post by Alexandre Gramfort
- Could you explain a bit more what you mean by builder and mapper
and list a few possibilities for these. The underlying question, is
do we need such generic programming if there is only one real option?
The builder will build a mapping between the embedded space and the
original space. By default, it uses a barycenter approach, but one
- PLM
- Kernel mapping (this is what is use by NiPy albeit not in a generic way)
This bring in a huge amount of code that it hard to read, makes for
confusing API, and I suspect will be hard to maintain. The complex case
should never make the simple case harder. I am not sure what the solution
here is (I haven't really spend much time on this problem), but I know we
cannot go down this line. If we do, ever single prart of the scikit will
become 10 time more complex (we will need to start dealing with generic
kernels all over the code, regularized covariance estimations, robust
loss versions of the estimators...).

OK, will go on reviewing tomorrow.

Gaël
Matthieu Brucher
2010-09-24 12:01:40 UTC
Permalink
Post by Gael Varoquaux
Just a quick comment on the comment :). I have more comming, but I am
still busy reviewing the code (during the boring talks).
Post by Matthieu Brucher
Post by Alexandre Gramfort
- I would use 2D embeddings/plots for examples (don't see much in 3D with
matplotlib)
I can do that, but be aware that you may not see much on the IRIS
dataset in 2D :|
I actually like the 3D the way you do it :).
Post by Matthieu Brucher
Post by Alexandre Gramfort
- Could you a make a classic swissroll example?
Of course, I'll try to add it in the datasets
I would prefer if you add a function to
scikits.learn.datasets.sample_generator that generates that data set.
That way we don't have to carry around a static data file.
Indeed, it makes more sense.
Post by Gael Varoquaux
Post by Matthieu Brucher
Flat means it can be messy in the end between embeddign algorithms and
mapping ones.
About 90% of the users will not understand what you mean by
embedding/mapping. I am not sure that the nested approach will actually
serve the users.
About 90% won't need the mapping, but that doesn't mean that you
shouldn't make it available, does it?
9% of the remaining people will use the default mapping, abnd 1% will
actually want to change the mapping.
Post by Gael Varoquaux
Post by Matthieu Brucher
Post by Alexandre Gramfort
- I vote for the optional support of pyamg to speed up things.
I didn't retrieve Gaël's code, it will be done.
There is much more than this to optimise. I haven't finished reviewing
this part of the code, but the element-wise operations on the sparse
matrices do not seem optimized, you should take the optimal sparse matrix
format to feed to arpack (check out the code in my implmentation)
I don't know what I shoudl atcually do here. If I take this code
inside the manifold module, won't it be harder on you to keep both
versions synchrionized?
Post by Gael Varoquaux
Post by Matthieu Brucher
Indeed, those algorithms should scale well, it won't be the case of
the other algorithms once we integrate them.
We can't afford to integrate algorithms that don't scale fairly well.
People will download the scikit, and try it on their data. If it is 'slow
as a dog', they'll move on.
If you take the complexity of these algorithms, they can't scale.
Isomap has a distance matric to build and then an SVD, LLE has to
create for each sample barycenter weights that are then SVD'd, and I'm
not talking about the remaining algorithms that are more robust and
thus take even more time due to added complexity.

They should be fast, but there are acses when you can't be faster than
the music.
Post by Gael Varoquaux
Post by Matthieu Brucher
Post by Alexandre Gramfort
- Could you explain a bit more what you mean by builder and mapper
and list a few possibilities for these. The underlying question, is
do we need such generic programming if there is only one real option?
The builder will build a mapping between the embedded space and the
original space. By default, it uses a barycenter approach, but one
- PLM
- Kernel mapping (this is what is use by NiPy albeit not in a generic way)
This bring in a huge amount of code that it hard to read, makes for
confusing API, and I suspect will be hard to maintain. The complex case
should never make the simple case harder. I am not sure what the solution
here is (I haven't really spend much time on this problem), but I know we
cannot go down this line. If we do, ever single prart of the scikit will
become 10 time more complex (we will need to start dealing with generic
kernels all over the code, regularized covariance estimations, robust
loss versions of the estimators...).
I don't agree at all. The simple case is not hard at all. You create
your embedding and you have a mapping with it. The complex case is not
much harder: you create your embedding with an additional mapping and
voilà.

Continuing on your other mail ;)
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Gael Varoquaux
2010-09-24 11:34:27 UTC
Permalink
So, I am losing Internet connection in a few hours. I couldn't finish my
review. Sorry, it takes a while to dig in the code, and amidst a busy
conference it is a challenge.

Here are my notes on the code. To recap my general feeling: the code is
really going in the right direction. The example look right, although a
bit slow. I am still worried that there is a fairly high level of
indirection in the code: it is nested in terms of API, and has functions
calling functions, ... Finally, another general remark is that there are
many many place where the code looks fairly sub optimal (sparse matrices
with no optimisation on the sparse matrix type, no in-place operations in
core loops), and the code still is much, much slower than nipy's manifold
learning code. The combination of both the remaining complexity of the
code due to its abstraction and the slowness makes me think we shouldn't
integrate it right now, but rather iterate a bit more on it.

I am offline for 3 weeks starting now!

Gaël
Post by Matthieu Brucher
As Alexandre and Gaël suggested, I've created a branch with just a few
- the Barycenter Mapping
- LLE
- Laplacian Eigenmaps
- Diffusion Maps
- Hessian Eigenmaps
Cool, that is indeed the core functionnality that we want to get right:
easy to use, and fast.
Post by Matthieu Brucher
Two tests fail because of the missing parts.
I am not sure what that means? I didn't find time to run the tests.

The code is clearly making huge progress in terms of being accessible to
(semi) naive coders. I am starting to get the big picture of the code.

General remarks
===================

PEP8 say that keyword arguments in function calls should have no spaces
before and after '='.

I would prefer to avoid using numpy.linalg, but rather to favor
scipy.linalg: numpy.linalg is often compiled without proper blas support,
and is thus very slow.

numpy should be imported as 'np', and I would prefer to avoid the 'from
numpy import ...'. This is just a convention, but when it is used by
everybody, it makes life much easier, because anyone can dig in the
codebase without finding out how symbols are imported and what symbols
mean.

A lot of the docstrings are not properly formatted as in the numpy
docstring standard.

It's a pitty that a lot of the functionnality cannot work on a similarity
graph, but rather on samples. I guess this is a general problem that
appears in other places in the scikit, and that the best option is to set
it appart, but we will need to address this in the long term.

mapping/barycenter.py
=========================

Any reason why the Barycenter is not in the neighbors code? IMHO, it
should be a regression version of the Neighbors classifiers, and could be
named 'NeighborsBarycenter'. The best way to do this would probably be
the split Neighbors in a base class that has no predict, and derive from
this base class the current version of the KNN and your Barycenter. That
would do away with the need to pass in neigh_alternate_arguments.

mapping/builder.py
=====================

Can we just get rid of this. This seems to me as an extra indirection and
I am not sure why it is there.

Besides, you are explicitely testing for isinstance(BaseEstimator). We
want to avoid this, because it means that it is impossible to use objects
in the scikit that have not been explicitely built for that purpose.

If we do this, and move the barycenter to where the nearest neighbor is,
we can completetly do away with the mapping subdirectory, and thus reduce
the depth of the tree by one, which will make it easier to apprehend.

embedding/tools.py
=====================

embedding.tools.create_neighborer should not exist. It is an almost empty
function that is doing some hacking around the neighbors class.

embedding.tools.centered_normalized should be merged with the code that
is in scaler.

embedding.tools.dist2hd should be merged with
mean_shift.euclidian_distance, and both should be moved somewhere in
utils.

embedding/barycenters.py
==========================

'barycenter' function should be renamed to 'barycenter_weights', because
right now, it not clear what the function does from its name. The word
'barycenter' is very much overloaded in the codebase. Besides, it tales a
'kwargs' that is unused. Finally, this function should be moved to the
same file as the Barycenter object.

embedding/embedding.py
==========================

I am not happy with the name of this file. It confuses me: I am not sure
anymore what is in the embedding directory anymore, if it is not in the
embedding file. Actually, after reading the code bit more, I have
realised that this is just a base class, and actually an abstract base
class. The name should reflect this, because right now it takes a lot of
code reading to realize this. Maybe embedding.py -> base_embedding.py and
`class Embedding` -> `class BaseEmbedding`

A small remark: default values for keyword arguments:

def __init__(self, n_coords, n_neighbors = None, neigh = None,
self.n_neighbors = n_neighbors if n_neighbors is not None else 9

embedding/similarities.py
==========================

In my opinion, a function like 'laplacian_maps' that takes a 'method'
function as an input argument is tedious to use. Any chance that you can
give is a default entry? This is in 'similarities.py', but it is not used
there, only in similarities_mds. I am not sure whether it is supposed to
be an internal use function -in which case, why is it not where it is
used?- or a function to be used by the user -in which case, why is it not
in __all__?.

kwargs in most functions are impossible to use without dwelling in the
source code.
Matthieu Brucher
2010-09-24 12:32:15 UTC
Permalink
Post by Gael Varoquaux
So, I am losing Internet connection in a few hours. I couldn't finish my
review. Sorry, it takes a while to dig in the code, and amidst a busy
conference it is a challenge.
Here are my notes on the code. To recap my general feeling: the code is
really going in the right direction. The example look right, although a
bit slow. I am still worried that there is a fairly high level of
indirection in the code: it is nested in terms of API, and has functions
calling functions, ... Finally, another general remark is that there are
many many place where the code looks fairly sub optimal (sparse matrices
with no optimisation on the sparse matrix type, no in-place operations in
core loops), and the code still is much, much slower than nipy's manifold
learning code. The combination of both the remaining complexity of the
code due to its abstraction and the slowness makes me think we shouldn't
integrate it right now, but rather iterate a bit more on it.
There is perhaps one algorithm in both of them, Laplacian Eigenmaps,
which we know will not be optimal until I branch your code ;)
Post by Gael Varoquaux
I am offline for 3 weeks starting now!
Gaël
Post by Matthieu Brucher
As Alexandre and Gaël suggested, I've created a branch with just a few
- the Barycenter Mapping
  - LLE
  - Laplacian Eigenmaps
  - Diffusion Maps
  - Hessian Eigenmaps
easy to use, and fast.
No, the most used algorithm is not available: Isomap.
Post by Gael Varoquaux
Post by Matthieu Brucher
Two tests fail because of the missing parts.
I am not sure what that means? I didn't find time to run the tests.
It was not very clear to me when I reread it as well :)
I have still twio failing tests, one because I need to check the
Matlab version of LLE, and one because I don't exactly know how to
test similarity constraints.
Post by Gael Varoquaux
The code is clearly making huge progress in terms of being accessible to
(semi) naive coders. I am starting to get the big picture of the code.
General remarks
===================
PEP8 say that keyword arguments in function calls should have no spaces
before and after '='.
I would prefer to avoid using numpy.linalg, but rather to favor
scipy.linalg: numpy.linalg is often compiled without proper blas support,
and is thus very slow.
numpy should be imported as 'np', and I would prefer to avoid the 'from
numpy import ...'. This is just a convention, but when it is used by
everybody, it makes life much easier, because anyone can dig in the
codebase without finding out how symbols are imported and what symbols
mean.
All this will be done.
Post by Gael Varoquaux
A lot of the docstrings are not properly formatted as in the numpy
docstring standard.
?? I tried to be compliant, did I miss something? I can't run Sphinx
on it because of the incompatibility between Numpy and the latest
Sphinx.
Post by Gael Varoquaux
It's a pitty that a lot of the functionnality cannot work on a similarity
graph, but rather on samples. I guess this is a general problem that
appears in other places in the scikit, and that the best option is to set
it appart, but we will need to address this in the long term.
It can be extracted, as the inner core algorithm for Laplacian
Eigenmaps and Diffusion Maps is shared. The same function is called
each time.
LLE and Hessian cannot work on given similarity graphs.
Post by Gael Varoquaux
mapping/barycenter.py
=========================
Any reason why the Barycenter is not in the neighbors code? IMHO, it
should be a regression version of the Neighbors classifiers, and could be
named 'NeighborsBarycenter'. The best way to do this would probably be
the split Neighbors in a base class that has no predict, and derive from
this base class the current version of the KNN and your Barycenter. That
would do away with the need to pass in neigh_alternate_arguments.
I've already answered to this concerne, and I will write again my arguments:
- from a design point of view, it's not an inheritance, so I strongly
advise against it. I will never personnaly do this. You can do it, but
I will never lay down an eye on this code ever again ;)
- changing the neighbor algorithm is mandatory if the module is
expected to live.
Post by Gael Varoquaux
mapping/builder.py
=====================
Can we just get rid of this. This seems to me as an extra indirection and
I am not sure why it is there.
No, it cannot. I've also given the arguments why it must stay, and
noone countered my arguments.
Post by Gael Varoquaux
Besides, you are explicitely testing for isinstance(BaseEstimator). We
want to avoid this, because it means that it is impossible to use objects
in the scikit that have not been explicitely built for that purpose.
In that case, you have to find a way of getting the same
functionality. It creates a new instance or uses the one provided. the
only way I could write this was to use the BaseEstimator class.
Besides, I don't know why we would give it something that isn't a
BaseEstimator as it must have the correct interface. Unless you want
to get rid of the class in the near future?
Post by Gael Varoquaux
If we do this, and move the barycenter to where the nearest neighbor is,
we can completetly do away with the mapping subdirectory, and thus reduce
the depth of the tree by one, which will make it easier to apprehend.
I'm sorry, I'm a bit tired at the moment (my child keot me awake this
night), but I really would like that anyone wanting to remove parts of
the module at least reads my answer to their arguments.
I have said countless times that I have other mappings.
Post by Gael Varoquaux
embedding/tools.py
=====================
embedding.tools.create_neighborer should not exist. It is an almost empty
function that is doing some hacking around the neighbors class.
+1
Post by Gael Varoquaux
embedding.tools.centered_normalized should be merged with the code that
is in scaler.
+1
Post by Gael Varoquaux
embedding.tools.dist2hd should be merged with
mean_shift.euclidian_distance, and both should be moved somewhere in
utils.
+1
Post by Gael Varoquaux
embedding/barycenters.py
==========================
'barycenter' function should be renamed to 'barycenter_weights', because
right now, it not clear what the function does from its name. The word
'barycenter' is very much overloaded in the codebase. Besides, it tales a
'kwargs' that is unused. Finally, this function should be moved to the
same file as the Barycenter object.
It can make more sense indeed.
Post by Gael Varoquaux
embedding/embedding.py
==========================
I am not happy with the name of this file. It confuses me: I am not sure
anymore what is in the embedding directory anymore, if it is not in the
embedding file. Actually, after reading the code bit more, I have
realised that this is just a base class, and actually an abstract base
class. The name should reflect this, because right now it takes a lot of
code reading to realize this. Maybe embedding.py -> base_embedding.py and
`class Embedding` -> `class BaseEmbedding`
OK
Post by Gael Varoquaux
   def __init__(self, n_coords, n_neighbors = None, neigh = None,
       self.n_neighbors = n_neighbors if n_neighbors is not None else 9
OK.
Post by Gael Varoquaux
embedding/similarities.py
==========================
In my opinion, a function like 'laplacian_maps' that takes a 'method'
function as an input argument is tedious to use. Any chance that you can
give is a default entry? This is in 'similarities.py', but it is not used
there, only in similarities_mds. I am not sure whether it is supposed to
be an internal use function -in which case, why is it not where it is
used?- or a function to be used by the user -in which case, why is it not
in __all__?.
The similarities/similarities_mds split is not well thoguht, I think.
This function is what creates the similarity graph and then does the
computation. It could be refactored to get clsoer to what you expect
(a function taking a graph as parameter).
Post by Gael Varoquaux
kwargs in most functions are impossible to use without dwelling in the
source code.
If a function has a kwargs, it's that it shouldn't be used without
know the structure of the code. Such functions are mainly builders
that must use additional parameter functions. They are mainly here to
avoid code duplication.

Matthieu (also tired of his very very very slow computer)
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Matthieu Brucher
2010-10-15 21:39:37 UTC
Permalink
Hi everyone,

I got a bit stopped in the refactoring, but I got back at it with some
cleanings:
- PEP 8
- some cleant modules
- using Scaler instead of center_normalized (the function is still
there but uses the scaler)
- renamed the base class for embedding

Other stuff currently in the pipe...
Post by Matthieu Brucher
Post by Gael Varoquaux
So, I am losing Internet connection in a few hours. I couldn't finish my
review. Sorry, it takes a while to dig in the code, and amidst a busy
conference it is a challenge.
Here are my notes on the code. To recap my general feeling: the code is
really going in the right direction. The example look right, although a
bit slow. I am still worried that there is a fairly high level of
indirection in the code: it is nested in terms of API, and has functions
calling functions, ... Finally, another general remark is that there are
many many place where the code looks fairly sub optimal (sparse matrices
with no optimisation on the sparse matrix type, no in-place operations in
core loops), and the code still is much, much slower than nipy's manifold
learning code. The combination of both the remaining complexity of the
code due to its abstraction and the slowness makes me think we shouldn't
integrate it right now, but rather iterate a bit more on it.
There is perhaps one algorithm in both of them, Laplacian Eigenmaps,
which we know will not be optimal until I branch your code ;)
Post by Gael Varoquaux
I am offline for 3 weeks starting now!
Gaël
Post by Matthieu Brucher
As Alexandre and Gaël suggested, I've created a branch with just a few
- the Barycenter Mapping
  - LLE
  - Laplacian Eigenmaps
  - Diffusion Maps
  - Hessian Eigenmaps
easy to use, and fast.
No, the most used algorithm is not available: Isomap.
Post by Gael Varoquaux
Post by Matthieu Brucher
Two tests fail because of the missing parts.
I am not sure what that means? I didn't find time to run the tests.
It was not very clear to me when I reread it as well :)
I have still twio failing tests, one because I need to check the
Matlab version of LLE, and one because I don't exactly know how to
test similarity constraints.
Post by Gael Varoquaux
The code is clearly making huge progress in terms of being accessible to
(semi) naive coders. I am starting to get the big picture of the code.
General remarks
===================
PEP8 say that keyword arguments in function calls should have no spaces
before and after '='.
I would prefer to avoid using numpy.linalg, but rather to favor
scipy.linalg: numpy.linalg is often compiled without proper blas support,
and is thus very slow.
numpy should be imported as 'np', and I would prefer to avoid the 'from
numpy import ...'. This is just a convention, but when it is used by
everybody, it makes life much easier, because anyone can dig in the
codebase without finding out how symbols are imported and what symbols
mean.
All this will be done.
Post by Gael Varoquaux
A lot of the docstrings are not properly formatted as in the numpy
docstring standard.
?? I tried to be compliant, did I miss something? I can't run Sphinx
on it because of the incompatibility between Numpy and the latest
Sphinx.
Post by Gael Varoquaux
It's a pitty that a lot of the functionnality cannot work on a similarity
graph, but rather on samples. I guess this is a general problem that
appears in other places in the scikit, and that the best option is to set
it appart, but we will need to address this in the long term.
It can be extracted, as the inner core algorithm for Laplacian
Eigenmaps and Diffusion Maps is shared. The same function is called
each time.
LLE and Hessian cannot work on given similarity graphs.
Post by Gael Varoquaux
mapping/barycenter.py
=========================
Any reason why the Barycenter is not in the neighbors code? IMHO, it
should be a regression version of the Neighbors classifiers, and could be
named 'NeighborsBarycenter'. The best way to do this would probably be
the split Neighbors in a base class that has no predict, and derive from
this base class the current version of the KNN and your Barycenter. That
would do away with the need to pass in neigh_alternate_arguments.
- from a design point of view, it's not an inheritance, so I strongly
advise against it. I will never personnaly do this. You can do it, but
I will never lay down an eye on this code ever again ;)
- changing the neighbor algorithm is mandatory if the module is
expected to live.
Post by Gael Varoquaux
mapping/builder.py
=====================
Can we just get rid of this. This seems to me as an extra indirection and
I am not sure why it is there.
No, it cannot. I've also given the arguments why it must stay, and
noone countered my arguments.
Post by Gael Varoquaux
Besides, you are explicitely testing for isinstance(BaseEstimator). We
want to avoid this, because it means that it is impossible to use objects
in the scikit that have not been explicitely built for that purpose.
In that case, you have to find a way of getting the same
functionality. It creates a new instance or uses the one provided. the
only way I could write this was to use the BaseEstimator class.
Besides, I don't know why we would give it something that isn't a
BaseEstimator as it must have the correct interface. Unless you want
to get rid of the class in the near future?
Post by Gael Varoquaux
If we do this, and move the barycenter to where the nearest neighbor is,
we can completetly do away with the mapping subdirectory, and thus reduce
the depth of the tree by one, which will make it easier to apprehend.
I'm sorry, I'm a bit tired at the moment (my child keot me awake this
night), but I really would like that anyone wanting to remove parts of
the module at least reads my answer to their arguments.
I have said countless times that I have other mappings.
Post by Gael Varoquaux
embedding/tools.py
=====================
embedding.tools.create_neighborer should not exist. It is an almost empty
function that is doing some hacking around the neighbors class.
+1
Post by Gael Varoquaux
embedding.tools.centered_normalized should be merged with the code that
is in scaler.
+1
Post by Gael Varoquaux
embedding.tools.dist2hd should be merged with
mean_shift.euclidian_distance, and both should be moved somewhere in
utils.
+1
Post by Gael Varoquaux
embedding/barycenters.py
==========================
'barycenter' function should be renamed to 'barycenter_weights', because
right now, it not clear what the function does from its name. The word
'barycenter' is very much overloaded in the codebase. Besides, it tales a
'kwargs' that is unused. Finally, this function should be moved to the
same file as the Barycenter object.
It can make more sense indeed.
Post by Gael Varoquaux
embedding/embedding.py
==========================
I am not happy with the name of this file. It confuses me: I am not sure
anymore what is in the embedding directory anymore, if it is not in the
embedding file. Actually, after reading the code bit more, I have
realised that this is just a base class, and actually an abstract base
class. The name should reflect this, because right now it takes a lot of
code reading to realize this. Maybe embedding.py -> base_embedding.py and
`class Embedding` -> `class BaseEmbedding`
OK
Post by Gael Varoquaux
   def __init__(self, n_coords, n_neighbors = None, neigh = None,
       self.n_neighbors = n_neighbors if n_neighbors is not None else 9
OK.
Post by Gael Varoquaux
embedding/similarities.py
==========================
In my opinion, a function like 'laplacian_maps' that takes a 'method'
function as an input argument is tedious to use. Any chance that you can
give is a default entry? This is in 'similarities.py', but it is not used
there, only in similarities_mds. I am not sure whether it is supposed to
be an internal use function -in which case, why is it not where it is
used?- or a function to be used by the user -in which case, why is it not
in __all__?.
The similarities/similarities_mds split is not well thoguht, I think.
This function is what creates the similarity graph and then does the
computation. It could be refactored to get clsoer to what you expect
(a function taking a graph as parameter).
Post by Gael Varoquaux
kwargs in most functions are impossible to use without dwelling in the
source code.
If a function has a kwargs, it's that it shouldn't be used without
know the structure of the code. Such functions are mainly builders
that must use additional parameter functions. They are mainly here to
avoid code duplication.
Matthieu (also tired of his very very very slow computer)
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Alexandre Gramfort
2010-10-16 12:35:36 UTC
Permalink
Hi Matthieu,

I have not much to say so far except that I'm glad to see you
moving forward on the manifold module :)

Maybe it would be good if you could list the different things
you plan to do and also the ones you don't among the ones
we've mentioned with Gael in our review

Alex

On Fri, Oct 15, 2010 at 11:40 PM, Matthieu Brucher
Post by Matthieu Brucher
Hi everyone,
I got a bit stopped in the refactoring, but I got back at it with some
- PEP 8
- some cleant modules
- using Scaler instead of center_normalized (the function is still
there but uses the scaler)
- renamed the base class for embedding
Other stuff currently in the pipe...
Post by Matthieu Brucher
Post by Gael Varoquaux
So, I am losing Internet connection in a few hours. I couldn't finish my
review. Sorry, it takes a while to dig in the code, and amidst a busy
conference it is a challenge.
Here are my notes on the code. To recap my general feeling: the code is
really going in the right direction. The example look right, although a
bit slow. I am still worried that there is a fairly high level of
indirection in the code: it is nested in terms of API, and has functions
calling functions, ... Finally, another general remark is that there are
many many place where the code looks fairly sub optimal (sparse matrices
with no optimisation on the sparse matrix type, no in-place operations in
core loops), and the code still is much, much slower than nipy's manifold
learning code. The combination of both the remaining complexity of the
code due to its abstraction and the slowness makes me think we shouldn't
integrate it right now, but rather iterate a bit more on it.
There is perhaps one algorithm in both of them, Laplacian Eigenmaps,
which we know will not be optimal until I branch your code ;)
Post by Gael Varoquaux
I am offline for 3 weeks starting now!
Gaël
Post by Matthieu Brucher
As Alexandre and Gaël suggested, I've created a branch with just a few
- the Barycenter Mapping
  - LLE
  - Laplacian Eigenmaps
  - Diffusion Maps
  - Hessian Eigenmaps
easy to use, and fast.
No, the most used algorithm is not available: Isomap.
Post by Gael Varoquaux
Post by Matthieu Brucher
Two tests fail because of the missing parts.
I am not sure what that means? I didn't find time to run the tests.
It was not very clear to me when I reread it as well :)
I have still twio failing tests, one because I need to check the
Matlab version of LLE, and one because I don't exactly know how to
test similarity constraints.
Post by Gael Varoquaux
The code is clearly making huge progress in terms of being accessible to
(semi) naive coders. I am starting to get the big picture of the code.
General remarks
===================
PEP8 say that keyword arguments in function calls should have no spaces
before and after '='.
I would prefer to avoid using numpy.linalg, but rather to favor
scipy.linalg: numpy.linalg is often compiled without proper blas support,
and is thus very slow.
numpy should be imported as 'np', and I would prefer to avoid the 'from
numpy import ...'. This is just a convention, but when it is used by
everybody, it makes life much easier, because anyone can dig in the
codebase without finding out how symbols are imported and what symbols
mean.
All this will be done.
Post by Gael Varoquaux
A lot of the docstrings are not properly formatted as in the numpy
docstring standard.
?? I tried to be compliant, did I miss something? I can't run Sphinx
on it because of the incompatibility between Numpy and the latest
Sphinx.
Post by Gael Varoquaux
It's a pitty that a lot of the functionnality cannot work on a similarity
graph, but rather on samples. I guess this is a general problem that
appears in other places in the scikit, and that the best option is to set
it appart, but we will need to address this in the long term.
It can be extracted, as the inner core algorithm for Laplacian
Eigenmaps and Diffusion Maps is shared. The same function is called
each time.
LLE and Hessian cannot work on given similarity graphs.
Post by Gael Varoquaux
mapping/barycenter.py
=========================
Any reason why the Barycenter is not in the neighbors code? IMHO, it
should be a regression version of the Neighbors classifiers, and could be
named 'NeighborsBarycenter'. The best way to do this would probably be
the split Neighbors in a base class that has no predict, and derive from
this base class the current version of the KNN and your Barycenter. That
would do away with the need to pass in neigh_alternate_arguments.
- from a design point of view, it's not an inheritance, so I strongly
advise against it. I will never personnaly do this. You can do it, but
I will never lay down an eye on this code ever again ;)
- changing the neighbor algorithm is mandatory if the module is
expected to live.
Post by Gael Varoquaux
mapping/builder.py
=====================
Can we just get rid of this. This seems to me as an extra indirection and
I am not sure why it is there.
No, it cannot. I've also given the arguments why it must stay, and
noone countered my arguments.
Post by Gael Varoquaux
Besides, you are explicitely testing for isinstance(BaseEstimator). We
want to avoid this, because it means that it is impossible to use objects
in the scikit that have not been explicitely built for that purpose.
In that case, you have to find a way of getting the same
functionality. It creates a new instance or uses the one provided. the
only way I could write this was to use the BaseEstimator class.
Besides, I don't know why we would give it something that isn't a
BaseEstimator as it must have the correct interface. Unless you want
to get rid of the class in the near future?
Post by Gael Varoquaux
If we do this, and move the barycenter to where the nearest neighbor is,
we can completetly do away with the mapping subdirectory, and thus reduce
the depth of the tree by one, which will make it easier to apprehend.
I'm sorry, I'm a bit tired at the moment (my child keot me awake this
night), but I really would like that anyone wanting to remove parts of
the module at least reads my answer to their arguments.
I have said countless times that I have other mappings.
Post by Gael Varoquaux
embedding/tools.py
=====================
embedding.tools.create_neighborer should not exist. It is an almost empty
function that is doing some hacking around the neighbors class.
+1
Post by Gael Varoquaux
embedding.tools.centered_normalized should be merged with the code that
is in scaler.
+1
Post by Gael Varoquaux
embedding.tools.dist2hd should be merged with
mean_shift.euclidian_distance, and both should be moved somewhere in
utils.
+1
Post by Gael Varoquaux
embedding/barycenters.py
==========================
'barycenter' function should be renamed to 'barycenter_weights', because
right now, it not clear what the function does from its name. The word
'barycenter' is very much overloaded in the codebase. Besides, it tales a
'kwargs' that is unused. Finally, this function should be moved to the
same file as the Barycenter object.
It can make more sense indeed.
Post by Gael Varoquaux
embedding/embedding.py
==========================
I am not happy with the name of this file. It confuses me: I am not sure
anymore what is in the embedding directory anymore, if it is not in the
embedding file. Actually, after reading the code bit more, I have
realised that this is just a base class, and actually an abstract base
class. The name should reflect this, because right now it takes a lot of
code reading to realize this. Maybe embedding.py -> base_embedding.py and
`class Embedding` -> `class BaseEmbedding`
OK
Post by Gael Varoquaux
   def __init__(self, n_coords, n_neighbors = None, neigh = None,
       self.n_neighbors = n_neighbors if n_neighbors is not None else 9
OK.
Post by Gael Varoquaux
embedding/similarities.py
==========================
In my opinion, a function like 'laplacian_maps' that takes a 'method'
function as an input argument is tedious to use. Any chance that you can
give is a default entry? This is in 'similarities.py', but it is not used
there, only in similarities_mds. I am not sure whether it is supposed to
be an internal use function -in which case, why is it not where it is
used?- or a function to be used by the user -in which case, why is it not
in __all__?.
The similarities/similarities_mds split is not well thoguht, I think.
This function is what creates the similarity graph and then does the
computation. It could be refactored to get clsoer to what you expect
(a function taking a graph as parameter).
Post by Gael Varoquaux
kwargs in most functions are impossible to use without dwelling in the
source code.
If a function has a kwargs, it's that it shouldn't be used without
know the structure of the code. Such functions are mainly builders
that must use additional parameter functions. They are mainly here to
avoid code duplication.
Matthieu (also tired of his very very very slow computer)
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Matthieu Brucher
2010-10-16 12:48:17 UTC
Permalink
No sweat ;)

Short term (i.e. manifold light branch):
- finish LLE and Diffusion maps tests
- move dist2hd to an appropriate location in utils, as Gaël suggested
- create a Swissroll generator
- add support for pyamg, as the original code from Gaël suggested
- move create_neighborer somewhere appropriate
- write manifold.rst
- fix some not-compliant docstrings
- last will be deleting the embedding and mapping folder to regroup
everything and adapt the different file names

Won't do:
- make Barycenter a subclass of Neighbors (see the reasons I've
mentioned earlier, those reasons also apply for the KNN class which
should not derive from Neighbors either)

There may be some places where the code is not perfect (kwargs that
should not be present), they will be changed when encountered
(scikits.learn current state has some import numpy as N, so I guess if
there are one or two functions that are not perfect, it won't matter).
I have still some questions about self._set_params(**params) which is
unknown to me (I guess there are some discussions I missed about the
Estimator class).

Long/mid term:
- restore the distance-based embedding algorithms
- add the Cython Floyd-Warshall algorithm instead of the numpy-based one
- add PLM as a mapping solution

In the end, a lot will still be needed, as the kernel-based mapping,
more current embedding algorithms, but there is only so much I can do
in my too scarce time (especially now that I'm not in the field
anymore :().

Matthieu
Post by Alexandre Gramfort
Hi Matthieu,
I have not much to say so far except that I'm glad to see you
moving forward on the manifold module :)
Maybe it would be good if you could list the different things
you plan to do and also the ones you don't among the ones
we've mentioned with Gael in our review
Alex
On Fri, Oct 15, 2010 at 11:40 PM, Matthieu Brucher
Post by Matthieu Brucher
Hi everyone,
I got a bit stopped in the refactoring, but I got back at it with some
- PEP 8
- some cleant modules
- using Scaler instead of center_normalized (the function is still
there but uses the scaler)
- renamed the base class for embedding
Other stuff currently in the pipe...
Post by Matthieu Brucher
Post by Gael Varoquaux
So, I am losing Internet connection in a few hours. I couldn't finish my
review. Sorry, it takes a while to dig in the code, and amidst a busy
conference it is a challenge.
Here are my notes on the code. To recap my general feeling: the code is
really going in the right direction. The example look right, although a
bit slow. I am still worried that there is a fairly high level of
indirection in the code: it is nested in terms of API, and has functions
calling functions, ... Finally, another general remark is that there are
many many place where the code looks fairly sub optimal (sparse matrices
with no optimisation on the sparse matrix type, no in-place operations in
core loops), and the code still is much, much slower than nipy's manifold
learning code. The combination of both the remaining complexity of the
code due to its abstraction and the slowness makes me think we shouldn't
integrate it right now, but rather iterate a bit more on it.
There is perhaps one algorithm in both of them, Laplacian Eigenmaps,
which we know will not be optimal until I branch your code ;)
Post by Gael Varoquaux
I am offline for 3 weeks starting now!
Gaël
Post by Matthieu Brucher
As Alexandre and Gaël suggested, I've created a branch with just a few
- the Barycenter Mapping
  - LLE
  - Laplacian Eigenmaps
  - Diffusion Maps
  - Hessian Eigenmaps
easy to use, and fast.
No, the most used algorithm is not available: Isomap.
Post by Gael Varoquaux
Post by Matthieu Brucher
Two tests fail because of the missing parts.
I am not sure what that means? I didn't find time to run the tests.
It was not very clear to me when I reread it as well :)
I have still twio failing tests, one because I need to check the
Matlab version of LLE, and one because I don't exactly know how to
test similarity constraints.
Post by Gael Varoquaux
The code is clearly making huge progress in terms of being accessible to
(semi) naive coders. I am starting to get the big picture of the code.
General remarks
===================
PEP8 say that keyword arguments in function calls should have no spaces
before and after '='.
I would prefer to avoid using numpy.linalg, but rather to favor
scipy.linalg: numpy.linalg is often compiled without proper blas support,
and is thus very slow.
numpy should be imported as 'np', and I would prefer to avoid the 'from
numpy import ...'. This is just a convention, but when it is used by
everybody, it makes life much easier, because anyone can dig in the
codebase without finding out how symbols are imported and what symbols
mean.
All this will be done.
Post by Gael Varoquaux
A lot of the docstrings are not properly formatted as in the numpy
docstring standard.
?? I tried to be compliant, did I miss something? I can't run Sphinx
on it because of the incompatibility between Numpy and the latest
Sphinx.
Post by Gael Varoquaux
It's a pitty that a lot of the functionnality cannot work on a similarity
graph, but rather on samples. I guess this is a general problem that
appears in other places in the scikit, and that the best option is to set
it appart, but we will need to address this in the long term.
It can be extracted, as the inner core algorithm for Laplacian
Eigenmaps and Diffusion Maps is shared. The same function is called
each time.
LLE and Hessian cannot work on given similarity graphs.
Post by Gael Varoquaux
mapping/barycenter.py
=========================
Any reason why the Barycenter is not in the neighbors code? IMHO, it
should be a regression version of the Neighbors classifiers, and could be
named 'NeighborsBarycenter'. The best way to do this would probably be
the split Neighbors in a base class that has no predict, and derive from
this base class the current version of the KNN and your Barycenter. That
would do away with the need to pass in neigh_alternate_arguments.
- from a design point of view, it's not an inheritance, so I strongly
advise against it. I will never personnaly do this. You can do it, but
I will never lay down an eye on this code ever again ;)
- changing the neighbor algorithm is mandatory if the module is
expected to live.
Post by Gael Varoquaux
mapping/builder.py
=====================
Can we just get rid of this. This seems to me as an extra indirection and
I am not sure why it is there.
No, it cannot. I've also given the arguments why it must stay, and
noone countered my arguments.
Post by Gael Varoquaux
Besides, you are explicitely testing for isinstance(BaseEstimator). We
want to avoid this, because it means that it is impossible to use objects
in the scikit that have not been explicitely built for that purpose.
In that case, you have to find a way of getting the same
functionality. It creates a new instance or uses the one provided. the
only way I could write this was to use the BaseEstimator class.
Besides, I don't know why we would give it something that isn't a
BaseEstimator as it must have the correct interface. Unless you want
to get rid of the class in the near future?
Post by Gael Varoquaux
If we do this, and move the barycenter to where the nearest neighbor is,
we can completetly do away with the mapping subdirectory, and thus reduce
the depth of the tree by one, which will make it easier to apprehend.
I'm sorry, I'm a bit tired at the moment (my child keot me awake this
night), but I really would like that anyone wanting to remove parts of
the module at least reads my answer to their arguments.
I have said countless times that I have other mappings.
Post by Gael Varoquaux
embedding/tools.py
=====================
embedding.tools.create_neighborer should not exist. It is an almost empty
function that is doing some hacking around the neighbors class.
+1
Post by Gael Varoquaux
embedding.tools.centered_normalized should be merged with the code that
is in scaler.
+1
Post by Gael Varoquaux
embedding.tools.dist2hd should be merged with
mean_shift.euclidian_distance, and both should be moved somewhere in
utils.
+1
Post by Gael Varoquaux
embedding/barycenters.py
==========================
'barycenter' function should be renamed to 'barycenter_weights', because
right now, it not clear what the function does from its name. The word
'barycenter' is very much overloaded in the codebase. Besides, it tales a
'kwargs' that is unused. Finally, this function should be moved to the
same file as the Barycenter object.
It can make more sense indeed.
Post by Gael Varoquaux
embedding/embedding.py
==========================
I am not happy with the name of this file. It confuses me: I am not sure
anymore what is in the embedding directory anymore, if it is not in the
embedding file. Actually, after reading the code bit more, I have
realised that this is just a base class, and actually an abstract base
class. The name should reflect this, because right now it takes a lot of
code reading to realize this. Maybe embedding.py -> base_embedding.py and
`class Embedding` -> `class BaseEmbedding`
OK
Post by Gael Varoquaux
   def __init__(self, n_coords, n_neighbors = None, neigh = None,
       self.n_neighbors = n_neighbors if n_neighbors is not None else 9
OK.
Post by Gael Varoquaux
embedding/similarities.py
==========================
In my opinion, a function like 'laplacian_maps' that takes a 'method'
function as an input argument is tedious to use. Any chance that you can
give is a default entry? This is in 'similarities.py', but it is not used
there, only in similarities_mds. I am not sure whether it is supposed to
be an internal use function -in which case, why is it not where it is
used?- or a function to be used by the user -in which case, why is it not
in __all__?.
The similarities/similarities_mds split is not well thoguht, I think.
This function is what creates the similarity graph and then does the
computation. It could be refactored to get clsoer to what you expect
(a function taking a graph as parameter).
Post by Gael Varoquaux
kwargs in most functions are impossible to use without dwelling in the
source code.
If a function has a kwargs, it's that it shouldn't be used without
know the structure of the code. Such functions are mainly builders
that must use additional parameter functions. They are mainly here to
avoid code duplication.
Matthieu (also tired of his very very very slow computer)
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Matthieu Brucher
2010-11-10 12:51:48 UTC
Permalink
Post by Matthieu Brucher
No sweat ;)
- finish LLE and Diffusion maps tests
LLE is still broken :(
Post by Matthieu Brucher
- move dist2hd to an appropriate location in utils, as Gaël suggested
- create a Swissroll generator
Done with an example (plot_swissroll.py) on the 4 embedding algorithms
Post by Matthieu Brucher
- add support for pyamg, as the original code from Gaël suggested
- move create_neighborer somewhere appropriate
- write manifold.rst
Not finished but more information and references
Post by Matthieu Brucher
- fix some not-compliant docstrings
- last will be deleting the embedding and mapping folder to regroup
everything and adapt the different file names
The list isn't really shorter, but I hope to have more time in the
next few weeks (music can consume a lot of free time!)

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Matthieu Brucher
2010-11-23 22:04:13 UTC
Permalink
LAtest news
Post by Matthieu Brucher
Post by Matthieu Brucher
No sweat ;)
- finish LLE and Diffusion maps tests
LLE is still broken :(
OK now
Post by Matthieu Brucher
Post by Matthieu Brucher
- move dist2hd to an appropriate location in utils, as Gaël suggested
Done
Post by Matthieu Brucher
Post by Matthieu Brucher
- create a Swissroll generator
Done with an example (plot_swissroll.py) on the 4 embedding algorithms
Fixed the return values
Post by Matthieu Brucher
Post by Matthieu Brucher
- add support for pyamg, as the original code from Gaël suggested
Still not done, I can't manage to make it work correctly. According to
the bench, Laplacian Eigenmap is far faster than all the other
methods, and it's the only one that can benefit from pyamg, so I don't
know if it's really worth it.
Post by Matthieu Brucher
Post by Matthieu Brucher
- move create_neighborer somewhere appropriate
Not done at the moment |
Post by Matthieu Brucher
Post by Matthieu Brucher
- write manifold.rst
Not finished but more information and references
Post by Matthieu Brucher
- fix some not-compliant docstrings
More docstrings fixed, but I found other in other modules that do not
comply with the scikit policy.
Post by Matthieu Brucher
Post by Matthieu Brucher
- last will be deleting the embedding and mapping folder to regroup
everything and adapt the different file names
Not done until the pyamg question is solved

There are only three topics left, so the code should be almost release-ready.

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Gael Varoquaux
2010-11-23 23:45:36 UTC
Permalink
Post by Matthieu Brucher
Post by Matthieu Brucher
Post by Matthieu Brucher
- add support for pyamg, as the original code from Gaël suggested
Still not done, I can't manage to make it work correctly. According to
the bench, Laplacian Eigenmap is far faster than all the other
methods, and it's the only one that can benefit from pyamg, so I don't
know if it's really worth it.
I haven't looked at the LLE code (yet), but can't this be formalized as a
Laplace Embedding of a graph. In which case, the laplace_embedding
function in cluster can be used (it works on any graph). We should move
it outside cluster. I am not sure where to, though?
Post by Matthieu Brucher
Post by Matthieu Brucher
Post by Matthieu Brucher
- move create_neighborer somewhere appropriate
Not done at the moment |
I'll have to do the review (going the bed right now: I am wasted), but I
remember that the way the code used to be written, it was hard to follow.
Post by Matthieu Brucher
Post by Matthieu Brucher
Post by Matthieu Brucher
- write manifold.rst
Not finished but more information and references
And pretty pictures (generated by examples)?
Post by Matthieu Brucher
More docstrings fixed, but I found other in other modules that do not
comply with the scikit policy.
File bugs please, we need to fix all this.

Looking forward to merging,

Gael
Matthieu Brucher
2010-11-24 07:16:12 UTC
Permalink
Post by Gael Varoquaux
Post by Matthieu Brucher
Post by Matthieu Brucher
Post by Matthieu Brucher
- add support for pyamg, as the original code from Gaël suggested
Still not done, I can't manage to make it work correctly. According to
the bench, Laplacian Eigenmap is far faster than all the other
methods, and it's the only one that can benefit from pyamg, so I don't
know if it's really worth it.
I haven't looked at the LLE code (yet), but can't this be formalized as a
Laplace Embedding of a graph. In which case, the laplace_embedding
function in cluster can be used (it works on any graph). We should move
it outside cluster. I am not sure where to, though?
LLE can be thought of the square of a laplacian embedding, but the
code may still be different. I should check if I can do the same I did
with Diffusion maps.

I tried to use the function spectral_embedding from the cluster
module, but the results are completely different and cannot be used as
is :|
Post by Gael Varoquaux
Post by Matthieu Brucher
Post by Matthieu Brucher
Post by Matthieu Brucher
- move create_neighborer somewhere appropriate
Not done at the moment |
I'll have to do the review (going the bed right now: I am wasted), but I
remember that the way the code used to be written, it was hard to follow.
The way it is written didn't change. I still have a lot of functions,
but I think their names are explicit enough (and if not, this can be
easilly changed). The overall depth of the current implementation of
Laplacian Eigenmaps is, for instance, the same than the one in cluster
if we add the fact that the similarity matrix must be computed
beforehand.
Post by Gael Varoquaux
Post by Matthieu Brucher
Post by Matthieu Brucher
Post by Matthieu Brucher
- write manifold.rst
Not finished but more information and references
And pretty pictures (generated by examples)?
Yes, I should do this, I have to check how the other packages manage this.
Post by Gael Varoquaux
Post by Matthieu Brucher
More docstrings fixed, but I found other in other modules that do not
comply with the scikit policy.
File bugs please, we need to fix all this.
OK, will do as soon as I find some of these (don't want to pollute my
branch with this cosmits, it is not appropriate, I think).
Post by Gael Varoquaux
Looking forward to merging,
Me too :D

Thanks for staying late :)

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Olivier Grisel
2010-11-24 10:56:24 UTC
Permalink
Post by Matthieu Brucher
Post by Gael Varoquaux
Post by Matthieu Brucher
Post by Matthieu Brucher
Post by Matthieu Brucher
- add support for pyamg, as the original code from Gaël suggested
Still not done, I can't manage to make it work correctly. According to
the bench, Laplacian Eigenmap is far faster than all the other
methods, and it's the only one that can benefit from pyamg, so I don't
know if it's really worth it.
I haven't looked at the LLE code (yet), but can't this be formalized as a
Laplace Embedding of a graph. In which case, the laplace_embedding
function in cluster can be used (it works on any graph). We should move
it outside cluster. I am not sure where to, though?
LLE can be thought of the square of a laplacian embedding, but the
code may still be different. I should check if I can do the same I did
with Diffusion maps.
I tried to use the function spectral_embedding from the cluster
module, but the results are completely different and cannot be used as
is :|
Post by Gael Varoquaux
Post by Matthieu Brucher
Post by Matthieu Brucher
Post by Matthieu Brucher
- move create_neighborer somewhere appropriate
Not done at the moment |
I'll have to do the review (going the bed right now: I am wasted), but I
remember that the way the code used to be written, it was hard to follow.
The way it is written didn't change. I still have a lot of functions,
but I think their names are explicit enough (and if not, this can be
easilly changed). The overall depth of the current implementation of
Laplacian Eigenmaps is, for instance, the same than the one in cluster
if we add the fact that the similarity matrix must be computed
beforehand.
Post by Gael Varoquaux
Post by Matthieu Brucher
Post by Matthieu Brucher
Post by Matthieu Brucher
- write manifold.rst
Not finished but more information and references
And pretty pictures (generated by examples)?
Yes, I should do this, I have to check how the other packages manage this.
If the example file name starts with plot_* and does a call call to
pylab.plot then the result of that plot will be embedded in the
auto_examples section of the documentation (this achieved thanks to a
sphinx plugin, the latest stable version of sphinx might be required
to build them).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Matthieu Brucher
2010-11-24 11:13:07 UTC
Permalink
Post by Olivier Grisel
Post by Matthieu Brucher
Post by Gael Varoquaux
Post by Matthieu Brucher
Post by Matthieu Brucher
Post by Matthieu Brucher
- add support for pyamg, as the original code from Gaël suggested
Still not done, I can't manage to make it work correctly. According to
the bench, Laplacian Eigenmap is far faster than all the other
methods, and it's the only one that can benefit from pyamg, so I don't
know if it's really worth it.
I haven't looked at the LLE code (yet), but can't this be formalized as a
Laplace Embedding of a graph. In which case, the laplace_embedding
function in cluster can be used (it works on any graph). We should move
it outside cluster. I am not sure where to, though?
LLE can be thought of the square of a laplacian embedding, but the
code may still be different. I should check if I can do the same I did
with Diffusion maps.
I tried to use the function spectral_embedding from the cluster
module, but the results are completely different and cannot be used as
is :|
Post by Gael Varoquaux
Post by Matthieu Brucher
Post by Matthieu Brucher
Post by Matthieu Brucher
- move create_neighborer somewhere appropriate
Not done at the moment |
I'll have to do the review (going the bed right now: I am wasted), but I
remember that the way the code used to be written, it was hard to follow.
The way it is written didn't change. I still have a lot of functions,
but I think their names are explicit enough (and if not, this can be
easilly changed). The overall depth of the current implementation of
Laplacian Eigenmaps is, for instance, the same than the one in cluster
if we add the fact that the similarity matrix must be computed
beforehand.
Post by Gael Varoquaux
Post by Matthieu Brucher
Post by Matthieu Brucher
Post by Matthieu Brucher
- write manifold.rst
Not finished but more information and references
And pretty pictures (generated by examples)?
Yes, I should do this, I have to check how the other packages manage this.
If the example file name starts with plot_* and does a call call to
pylab.plot then the result of that plot will be embedded in the
auto_examples section of the documentation (this achieved thanks to a
sphinx plugin, the latest stable version of sphinx might be required
to build them).
In that case, there isn't much to do, as there are three different
examples. I should perhaps add a link in manifold.rst to these
different examples?

I may add a lot of different examples in the future, as there are
several different classification examples that may benefit from the
module. Is there a way of desactivating their generation?

Thanks for the tip ;)

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Olivier Grisel
2010-11-24 11:22:26 UTC
Permalink
Post by Matthieu Brucher
Post by Olivier Grisel
If the example file name starts with plot_* and does a call call to
pylab.plot then the result of that plot will be embedded in the
auto_examples section of the documentation (this achieved thanks to a
sphinx plugin, the latest stable version of sphinx might be required
to build them).
In that case, there isn't much to do, as there are three different
examples. I should perhaps add a link in manifold.rst to these
different examples?
+1
Post by Matthieu Brucher
I may add a lot of different examples in the future, as there are
several different classification examples that may benefit from the
module. Is there a way of desactivating their generation?
Nope. But there is now a way to build the HTML documentation without
the examples plots as Fabian explained a couple of days ago IIRC.

I think we should try to keep the individual example runtime below 1
minute long if they are used to plot something displayed in the
documentation.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Matthieu Brucher
2010-11-24 11:26:52 UTC
Permalink
Post by Olivier Grisel
Post by Matthieu Brucher
Post by Olivier Grisel
If the example file name starts with plot_* and does a call call to
pylab.plot then the result of that plot will be embedded in the
auto_examples section of the documentation (this achieved thanks to a
sphinx plugin, the latest stable version of sphinx might be required
to build them).
In that case, there isn't much to do, as there are three different
examples. I should perhaps add a link in manifold.rst to these
different examples?
+1
Post by Matthieu Brucher
I may add a lot of different examples in the future, as there are
several different classification examples that may benefit from the
module. Is there a way of desactivating their generation?
Nope. But there is now a way to build the HTML documentation without
the examples plots as Fabian explained a couple of days ago IIRC.
I think we should try to keep the individual example runtime below 1
minute long if they are used to plot something displayed in the
documentation.
Indeed. I have to check the plot_swissroll.py in that matter (perhaps
simply reducing the number of samples).

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Gael Varoquaux
2010-11-25 22:43:34 UTC
Permalink
OK, here is a quick review.

I am sorry, I couldn't spend too much time on it. I am very tired
currently as I am working on a big deadline.

I hope that I am not too blunt: I am sleep deprieved and I would
appreciate if someone else could do another review.

To summarize my review, I think that the code still needs a lot of
reorganisation, and thus I think that meerging it in before the 0.6
release would be counter productive, as it would lead to API breakage
after the release.

I thought that the branch was very light, but that's still a lot of code.
Some specific comments (not exaustive: I haven't looked at performance).

1. mapping/builder: This file should die. A factory that is not a class is
a antipattern, as far as I am concerned: shows that the class
constructors are hard to use out of the box. They must be refactored,
rather than introducing a factory.

2. mapping/barycenter: should be moved to the scikits.learn.neihgbors
module, and renamed 'NeighborsBarycenter', and be made a subclass of
Neighbors.

3. Once the two above are done, we can get rid of the nested directory
structure, and fold everything in the manifold directory.

4. embedding/barycenter: must die. It's usecase is not clearly different
than the other barycenter. If some refactoring is needed in nthe main
barycenter, that is probably the right solution.

5. Graph structures must all be sparse matrices.

6. embedding/tools: create_graph/create_neighborer: all helper
functions. We need to get rid of them as they are too specific to a
usecase, and thus once we have found the righ abstractions, they
should be writable in a few lines of Python code (each of them have
longer documentation than code, by the way). One thing that might be
useful, as I think that it is resonnably general-purpose, is some
code that creates the neighbors graph. It could be a function
located in the neighbors module, returning a sparse matrix.

7. embedding/tools: centered_normalized: isn't the code in
preprocessing useful for that? If not, we need to fix it.

8. embedding/similarities_mds: numpy -> np.

9. General remark: please no dictionnaries of additional arguments. We
have everything we need in the scikit to avoid this pattern. We have
already had many discussions about the solutions on the list (google
is failing on me right now, so I can't give pointers, but the
corresponding delegation patterns are used in the text-extraction
utilities).


G'night, I am waisted ;)

Gaël
Alexandre Gramfort
2010-11-26 00:13:37 UTC
Permalink
Hi,

I have to admit that I agree the points raised by Gael.
I would add that I find weird not to have file names
that reflect the content of the files.

I said it before but why not having something simple to catch like:

manifold/
|-- hessian_map.py
|-- laplacian_map.py
|-- lle.py
`-- utils.py

A mapper and a builder is not intuitive.

We have to keep things simple to improve usability, facilitate
maintenance, simplify the identification of computational
bottlenecks etc.

Right now the benchmark is not running since dist2hd has been moved.
Also I think I've said it before but a much faster implementation of
dist2hd can be found in the mean_shift.py.

I agree with Gael that the current API make code quite verbose
which is not very pythonic.

I'm not the one to decide but personally I believe the current
module is still overdesigned, and not computationally
efficient. E.g. in similarities.py I read

t = np.eye(len(self.X_), len(self.X_)) - W

to me it means this code won't run with more that 10 000 samples even
if n_features == 2. This is to me not acceptable as spectral clustering
methods when properly implemented fit with thousands of dimensions.

Alex

On Thu, Nov 25, 2010 at 5:43 PM, Gael Varoquaux
Post by Gael Varoquaux
OK, here is a quick review.
I am sorry, I couldn't spend too much time on it. I am very tired
currently as I am working on a big deadline.
I hope that I am not too blunt: I am sleep deprieved and I would
appreciate if someone else could do another review.
To summarize my review, I think that the code still needs a lot of
reorganisation, and thus I think that meerging it in before the 0.6
release would be counter productive, as it would lead to API breakage
after the release.
I thought that the branch was very light, but that's still a lot of code.
Some specific comments (not exaustive: I haven't looked at performance).
 1. mapping/builder: This file should die. A factory that is not a class is
   a antipattern, as far as I am concerned: shows that the class
   constructors are hard to use out of the box. They must be refactored,
   rather than introducing a factory.
 2. mapping/barycenter: should be moved to the scikits.learn.neihgbors
   module, and renamed 'NeighborsBarycenter', and be made a subclass of
   Neighbors.
 3. Once the two above are done, we can get rid of the nested directory
   structure, and fold everything in the manifold directory.
 4. embedding/barycenter: must die. It's usecase is not clearly different
   than the other barycenter. If some refactoring is needed in nthe main
   barycenter, that is probably the right solution.
 5. Graph structures must all be sparse matrices.
 6. embedding/tools: create_graph/create_neighborer: all helper
   functions. We need to get rid of them as they are too specific to a
   usecase, and thus once we have found the righ abstractions, they
   should be writable in a few lines of Python code (each of them have
   longer documentation than code, by the way). One thing that might be
   useful, as I think that it is resonnably general-purpose, is some
   code that creates the neighbors graph. It could be a function
   located in the neighbors module, returning a sparse matrix.
 7. embedding/tools: centered_normalized: isn't the code in
   preprocessing useful for that? If not, we need to fix it.
 8. embedding/similarities_mds: numpy -> np.
 9. General remark: please no dictionnaries of additional arguments. We
   have everything we need in the scikit to avoid this pattern. We have
   already had many discussions about the solutions on the list (google
   is failing on me right now, so I can't give pointers, but the
   corresponding delegation patterns are used in the text-extraction
   utilities).
G'night, I am waisted ;)
Gaël
------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gael Varoquaux
2010-11-26 06:28:41 UTC
Permalink
Post by Alexandre Gramfort
manifold/
|-- hessian_map.py
|-- laplacian_map.py
|-- lle.py
`-- utils.py
Yes, I would like that.

G
Matthieu Brucher
2010-11-26 10:56:44 UTC
Permalink
Post by Alexandre Gramfort
Hi,
I have to admit that I agree the points raised by Gael.
I would add that I find weird not to have file names
that reflect the content of the files.
That's because I've started by regrouping all the algorithms with a
similar approach.
Post by Alexandre Gramfort
manifold/
|-- hessian_map.py
|-- laplacian_map.py
|-- lle.py
`-- utils.py
I'll try to be closer to this. It is more obvious than similarity.py
or distance.py, ...
Post by Alexandre Gramfort
A mapper and a builder is not intuitive.
Argument following in a next paragraph.
Post by Alexandre Gramfort
We have to keep things simple to improve usability, facilitate
maintenance, simplify the identification of computational
bottlenecks etc.
I think they are as simple as it can get with all the shared code
between these algorithms.
Post by Alexandre Gramfort
Right now the benchmark is not running since dist2hd has been moved.
I may have forget to add it, I have to check.
Post by Alexandre Gramfort
Also I think I've said it before but a much faster implementation of
dist2hd can be found in the mean_shift.py.
I found it (sorry I forgot you mentioned it, and now that I recall, I
said I would use your version :|), but as it is not my code and I
don't know where it is used, I will only copy it in utils.
Post by Alexandre Gramfort
I agree with Gael that the current API make code quite verbose
which is not very pythonic.
I'm not the one to decide but personally I believe the current
module is still overdesigned, and not computationally
efficient.
This is two different things. The design was created for a complete
manifold package. What you have currently is not half of it. So yes,
it can be seen as overdesigned (not inefficient, because the different
hotspots are localized to specific functions).

As an example, I will talk about create_neighborer and create_graph.
These functions exists because there are issues in Riemannian manifold
(i.e. the one the module tries to address) with short-cuts, and so
that other neighbors can be selected. One year after the first two
articles that really started this field, there was an article to the
issue of short-cuts and to clues as how to minimize them. So not
designing the module for these issues is underdesigning it. These
issues have to be taken into account from the start (I didn't have
much questions in ICIP about my thesis work, but one of them was this
specific issue, which surprised me at the time).

I didn't try to address all manifold algorithms, but I think that
creating a testbed for the obvious issues of manifold learning is the
least that should be done.
Post by Alexandre Gramfort
E.g. in similarities.py I read
t = np.eye(len(self.X_), len(self.X_)) - W
to me it means this code won't run with more that 10 000 samples even
if n_features == 2. This is to me not acceptable as spectral clustering
methods when properly implemented fit with thousands of dimensions.
This is of course an obvious thing to fix.
It has to be reminded that at the time the module was started,
scipy.sparse was not as usable as today. There was also the issue of
ARPACK, which meant that I had to use dense matrices. I've changed it
in some places, I'll try as much as I can to use sparse matrices where
it can be helpful (LLE and Laplacian Eigenmaps).
Also don't forget that a lot of manifold learning algorithms have to
use a dense matrix (unfortunately), namely all distance-based ones,
and some similarity-based as well (as diffusion maps). For them, the
number of samples must be limited.

I concur that they are some efficiency issues in the code, but I
think that the overall design is better than other parts of the
scikit, even if it has still progress to do.

Thanks for the feedback,

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Matthieu Brucher
2010-11-26 10:56:38 UTC
Permalink
HI Gael, I'm sorry, butthere are some points that I strongly disagree
with, and no one gave an agrument against my point of view.
Post by Gael Varoquaux
OK, here is a quick review.
I am sorry, I couldn't spend too much time on it. I am very tired
currently as I am working on a big deadline.
I hope that I am not too blunt: I am sleep deprieved and I would
appreciate if someone else could do another review.
To summarize my review, I think that the code still needs a lot of
reorganisation, and thus I think that meerging it in before the 0.6
release would be counter productive, as it would lead to API breakage
after the release.
The API related to the different embedding algorithms and mappings
will remain stable, as everything is already imported from the
manifold module directly.
It's also difficult for me not to merge from the master, and not
benefit from new stuff coming inside the scikit :(
Post by Gael Varoquaux
I thought that the branch was very light, but that's still a lot of code.
Some specific comments (not exaustive: I haven't looked at performance).
It's far far less than before! I tried to be as conservative as I
could, but the algorithms still are very different (some use
neighbors, other don't, some have a complicated matrix construction,
other don't, ...)
Post by Gael Varoquaux
 1. mapping/builder: This file should die. A factory that is not a class is
   a antipattern, as far as I am concerned: shows that the class
   constructors are hard to use out of the box. They must be refactored,
   rather than introducing a factory.
It's not about the constructor, it's about the fact that this function
is called by the embedding to create or use an instance. It is not
meant to be public either.
Post by Gael Varoquaux
 2. mapping/barycenter: should be moved to the scikits.learn.neihgbors
   module, and renamed 'NeighborsBarycenter', and be made a subclass of
   Neighbors.
If you talk about antipattern, this is an official one, so this will
never be done. Ever. Especially as I have also given algorithms
reasons.
Inheritance was never created for this. I suggest some book from
Bertrand Meyer (father of Eiffel) on this matter, if you want some
additional insight.
NeighborsBarycenter is not a Neighbors, so it cannot be derived from
Neighbors. The rule of thumb for public inheritance for a IS A
relationship is that if you broaden the scope of the parent class,
then you can inherit.
Here, we have a IS IMPLEMENTED IN TERMS OF relationship, because a
Barycenter class does not have the same interface as aNeighbors class.
I hope that with these obvious arguments, this topic can now be closed
(still not used to creating dynamix relationship as they call them at
work).
Post by Gael Varoquaux
 3. Once the two above are done, we can get rid of the nested directory
   structure, and fold everything in the manifold directory.
This is the plan as soon as I figure out how to use pyamg.
Post by Gael Varoquaux
 4. embedding/barycenter: must die. It's usecase is not clearly different
   than the other barycenter. If some refactoring is needed in nthe main
   barycenter, that is probably the right solution.
Indeed, once the Barycenter class is moved one level up.
Post by Gael Varoquaux
 5. Graph structures must all be sparse matrices.
Still not used to sparse matrices :D
Post by Gael Varoquaux
 6. embedding/tools: create_graph/create_neighborer: all helper
   functions. We need to get rid of them as they are too specific to a
   usecase, and thus once we have found the righ abstractions, they
   should be writable in a few lines of Python code (each of them have
   longer documentation than code, by the way). One thing that might be
   useful, as I think that it is resonnably general-purpose, is some
   code that creates the neighbors graph. It could be a function
   located in the neighbors module, returning a sparse matrix.
A function returning the graph of neighbors would be great indeed (it
is something really close to create_graph).
I thought about utils.graph that already have some code close to it?
Post by Gael Varoquaux
 7. embedding/tools: centered_normalized: isn't the code in
   preprocessing useful for that? If not, we need to fix it.
I already use some code of the scikit for this. It is used for the
diffusion map, getting rid of it would mean a free function somewhere
to whitten a signal, or a lambda (and I followed the discussion on
lambdas yesterday).
Post by Gael Varoquaux
 8. embedding/similarities_mds: numpy -> np.
oops...
Post by Gael Varoquaux
 9. General remark: please no dictionnaries of additional arguments. We
   have everything we need in the scikit to avoid this pattern. We have
   already had many discussions about the solutions on the list (google
   is failing on me right now, so I can't give pointers, but the
   corresponding delegation patterns are used in the text-extraction
   utilities).
I'm trying to remove them one by one, but it's not that easy.

I'm also going to be very blunt, but here are my thoughts:
- there are a lot of cases where the scikit does not follow its own
advice (coming into mind are some files in utils, the LARS group of
functions), with even some anti (Python) patterns (dictionary directly
in the function arguments, ...)
- a lot of code in the scikit is far more complicated to follow
because it is not beautiful (function with more than three levels of
loops, more than 10 lines, ...). I prefer small functions with
explicit names (although English is not my mother tongue, which can
lead to unobvious names), as per Clean Code (Martin). It's not more
complicated to follow with a correct IDE than a huge function, it is
simpler.

There are a lot of sensible points that are made in all your reviews,
but at the same time, there is a feeling of misjustice (I know it's
not meant to be so, but this is the feeling that I get) and of
misunderstanding of the manifold (or even software design) field that
is not really helpful. I'd like my reviewers to understand my
arguments as much as I try to understand theirs.

Cheers ;)

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Olivier Grisel
2010-11-26 11:01:58 UTC
Permalink
Post by Matthieu Brucher
HI Gael, I'm sorry, butthere are some points that I strongly disagree
with, and no one gave an agrument against my point of view.
Post by Gael Varoquaux
OK, here is a quick review.
I am sorry, I couldn't spend too much time on it. I am very tired
currently as I am working on a big deadline.
I hope that I am not too blunt: I am sleep deprieved and I would
appreciate if someone else could do another review.
To summarize my review, I think that the code still needs a lot of
reorganisation, and thus I think that meerging it in before the 0.6
release would be counter productive, as it would lead to API breakage
after the release.
The API related to the different embedding algorithms and mappings
will remain stable, as everything is already imported from the
manifold module directly.
It's also difficult for me not to merge from the master, and not
benefit from new stuff coming inside the scikit :(
But you can and should merge *from* the scikit-learn master whenever you want.

It will make the code review much easier too.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Matthieu Brucher
2010-11-26 11:09:55 UTC
Permalink
Post by Olivier Grisel
Post by Matthieu Brucher
HI Gael, I'm sorry, butthere are some points that I strongly disagree
with, and no one gave an agrument against my point of view.
Post by Gael Varoquaux
OK, here is a quick review.
I am sorry, I couldn't spend too much time on it. I am very tired
currently as I am working on a big deadline.
I hope that I am not too blunt: I am sleep deprieved and I would
appreciate if someone else could do another review.
To summarize my review, I think that the code still needs a lot of
reorganisation, and thus I think that meerging it in before the 0.6
release would be counter productive, as it would lead to API breakage
after the release.
The API related to the different embedding algorithms and mappings
will remain stable, as everything is already imported from the
manifold module directly.
It's also difficult for me not to merge from the master, and not
benefit from new stuff coming inside the scikit :(
But you can and should merge *from* the scikit-learn master whenever you want.
It will make the code review much easier too.
OK, I take your word on this point. I have trouble following the
different policies in the different projects :) (like nipy where you
should not do this, although I don't program with it)
I like merging from master, so the first thing I'll be doing before
tackling Gaël's and Alexandre's points will be doing a merge.

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Alexandre Gramfort
2010-11-26 22:03:31 UTC
Permalink
Matthieu,

I suggest you merge your branch with the current master,
then fix as many things as possible among the points raised
above. Then you should issue a pull request on github so it's easier
to keep track of our comments and your commits. Our current workflow
by emails is definitely not optimal.

Do not hesitate to include a first list of bullet points recalling
all the points raised by our reviews. It took us some time with Gael
to go through your work and give you some feedback. I'd really like
not to do it again.

At the end of the day we need a manifold module with state of the art
performance. I'd really like to help but right now if I touch your
code I'm going to start moving/renaming files, moving some functions
in common locations of the scikit etc. and I want to agree with you
on this before spending some time doing it.

Alex

PS : to me laplacian eigenmaps and diffusion maps work with dense
affinity matrices but can be well approximated using a k-NN graph
to scale to big datasets.

On Fri, Nov 26, 2010 at 6:09 AM, Matthieu Brucher
Post by Matthieu Brucher
Post by Olivier Grisel
Post by Matthieu Brucher
HI Gael, I'm sorry, butthere are some points that I strongly disagree
with, and no one gave an agrument against my point of view.
Post by Gael Varoquaux
OK, here is a quick review.
I am sorry, I couldn't spend too much time on it. I am very tired
currently as I am working on a big deadline.
I hope that I am not too blunt: I am sleep deprieved and I would
appreciate if someone else could do another review.
To summarize my review, I think that the code still needs a lot of
reorganisation, and thus I think that meerging it in before the 0.6
release would be counter productive, as it would lead to API breakage
after the release.
The API related to the different embedding algorithms and mappings
will remain stable, as everything is already imported from the
manifold module directly.
It's also difficult for me not to merge from the master, and not
benefit from new stuff coming inside the scikit :(
But you can and should merge *from* the scikit-learn master whenever you want.
It will make the code review much easier too.
OK, I take your word on this point. I have trouble following the
different policies in the different projects :) (like nipy where you
should not do this, although I don't program with it)
I like merging from master, so the first thing I'll be doing before
tackling Gaël's and Alexandre's points will be doing a merge.
Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Matthieu Brucher
2010-11-26 22:45:09 UTC
Permalink
Post by Alexandre Gramfort
Matthieu,
I suggest you merge your branch with the current master,
then fix as many things as possible among the points raised
above. Then you should issue a pull request on github so it's easier
to keep track of our comments and your commits. Our current workflow
by emails is definitely not optimal.
OK. I'm not used to github but this is not an excuse :)
Post by Alexandre Gramfort
Do not hesitate to include a first list of bullet points recalling
all the points raised by our reviews. It took us some time with Gael
to go through your work and give you some feedback. I'd really like
not to do it again.
Of course, it would be too bad, and it is already part of my todo/check list.
Post by Alexandre Gramfort
At the end of the day we need a manifold module with state of the art
performance. I'd really like to help but right now if I touch your
code I'm going to start moving/renaming files, moving some functions
in common locations of the scikit etc. and I want to agree with you
on this before spending some time doing it.
I hope that speed will be enhanced with some other factorization and
resuse of the sparse stuff.
Post by Alexandre Gramfort
PS : to me laplacian eigenmaps and diffusion maps work with dense
affinity matrices but can be well approximated using a k-NN graph
to scale to big datasets.
Laplacian Eigenmaps only use a k-NN graph, but diffusion maps use the
whole whittened affinity matrix. Diffusion maps can use a diffusion
time, but the inner algorithm is strictly identical. So for me, the
only real difference is the sparsity and thus scalability of Lapalcian
Eigenmaps (cf also the two references I've included in the
documentation).

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Alexandre Gramfort
2010-11-27 00:57:41 UTC
Permalink
Post by Matthieu Brucher
Post by Alexandre Gramfort
I suggest you merge your branch with the current master,
then fix as many things as possible among the points raised
above. Then you should issue a pull request on github so it's easier
to keep track of our comments and your commits. Our current workflow
by emails is definitely not optimal.
OK. I'm not used to github but this is not an excuse :)
indeed :)
Post by Matthieu Brucher
Post by Alexandre Gramfort
Do not hesitate to include a first list of bullet points recalling
all the points raised by our reviews. It took us some time with Gael
to go through your work and give you some feedback. I'd really like
not to do it again.
Of course, it would be too bad, and it is already part of my todo/check list.
great
Post by Matthieu Brucher
Post by Alexandre Gramfort
At the end of the day we need a manifold module with state of the art
performance. I'd really like to help but right now if I touch your
code I'm going to start moving/renaming files, moving some functions
in common locations of the scikit etc. and I want to agree with you
on this before spending some time doing it.
I hope that speed will be enhanced with some other factorization and
resuse of the sparse stuff.
Post by Alexandre Gramfort
PS : to me laplacian eigenmaps and diffusion maps work with dense
affinity matrices but can be well approximated using a k-NN graph
to scale to big datasets.
Laplacian Eigenmaps only use a k-NN graph, but diffusion maps use the
whole whittened affinity matrix.
I agree but do you think that diffusion maps results are really affected
when considering distances further than k points to be 0 ?
Post by Matthieu Brucher
Diffusion maps can use a diffusion
time, but the inner algorithm is strictly identical. So for me, the
only real difference is the sparsity and thus scalability of Lapalcian
Eigenmaps (cf also the two references I've included in the
documentation).
ok you're the expert :)

++
Alex
Matthieu Brucher
2010-11-27 22:13:15 UTC
Permalink
Post by Alexandre Gramfort
Post by Matthieu Brucher
Laplacian Eigenmaps only use a k-NN graph, but diffusion maps use the
whole whittened affinity matrix.
I agree but do you think that diffusion maps results are really affected
when considering distances further than k points to be 0 ?
Perhaps, but the issue would be to find the optimal k per point. This
may be a case where selecting another Nieghbors class than the default
one is handy.
Post by Alexandre Gramfort
Post by Matthieu Brucher
Diffusion maps can use a diffusion
time, but the inner algorithm is strictly identical. So for me, the
only real difference is the sparsity and thus scalability of Lapalcian
Eigenmaps (cf also the two references I've included in the
documentation).
ok you're the expert :)
I'm trying to fix the LLE "mistake" you found, but I can get to
retrieve the lowest nonzero eigenvalues of my sparse matrix. I did
find the correct ones with eigh on dense matrices, but eigen_symmetric
is not working :|
Do we have something that can compute the lowest non-zero eigenvalues
(perhaps an option in eigen_symmetric?)

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Matthieu Brucher
2010-11-27 22:15:23 UTC
Permalink
Post by Matthieu Brucher
I'm trying to fix the LLE "mistake" you found, but I can get to
retrieve the lowest nonzero eigenvalues of my sparse matrix. I did
find the correct ones with eigh on dense matrices, but eigen_symmetric
is not working :|
Do we have something that can compute the lowest non-zero eigenvalues
(perhaps an option in eigen_symmetric?)
Just to say that the lowest eigenvalue is always 0. I tried by
shifting the eigenvalues to 1 (solving 1-M instead of M), but
eigen_symmetric fails and gives me only 1s.

Matthieu
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Matthieu Brucher
2010-11-27 22:53:38 UTC
Permalink
Post by Matthieu Brucher
Post by Matthieu Brucher
I'm trying to fix the LLE "mistake" you found, but I can get to
retrieve the lowest nonzero eigenvalues of my sparse matrix. I did
find the correct ones with eigh on dense matrices, but eigen_symmetric
is not working :|
Do we have something that can compute the lowest non-zero eigenvalues
(perhaps an option in eigen_symmetric?)
Just to say that the lowest eigenvalue is always 0. I tried by
shifting the eigenvalues to 1 (solving 1-M instead of M), but
eigen_symmetric fails and gives me only 1s.
Strangely enough, the Matlab code also uses Arpack and doesn't seem to
have the issue I face :|
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Alexandre Gramfort
2010-10-16 15:34:44 UTC
Permalink
sounds like a plan !

cheers,
Alex

On Sat, Oct 16, 2010 at 2:48 PM, Matthieu Brucher
Post by Matthieu Brucher
No sweat ;)
- finish LLE and Diffusion maps tests
- move dist2hd to an appropriate location in utils, as Gaël suggested
- create a Swissroll generator
- add support for pyamg, as the original code from Gaël suggested
- move create_neighborer somewhere appropriate
- write manifold.rst
- fix some not-compliant docstrings
- last will be deleting the embedding and mapping folder to regroup
everything and adapt the different file names
- make Barycenter a subclass of Neighbors (see the reasons I've
mentioned earlier, those reasons also apply for the KNN class which
should not derive from Neighbors either)
There may be some places where the code is not perfect (kwargs that
should not be present), they will be changed when encountered
(scikits.learn current state has some import numpy as N, so I guess if
there are one or two functions that are not perfect, it won't matter).
I have still some questions about self._set_params(**params) which is
unknown to me (I guess there are some discussions I missed about the
Estimator class).
- restore the distance-based embedding algorithms
- add the Cython Floyd-Warshall algorithm instead of the numpy-based one
- add PLM as a mapping solution
In the end, a lot will still be needed, as the kernel-based mapping,
more current embedding algorithms, but there is only so much I can do
in my too scarce time (especially now that I'm not in the field
anymore :().
Matthieu
Post by Alexandre Gramfort
Hi Matthieu,
I have not much to say so far except that I'm glad to see you
moving forward on the manifold module :)
Maybe it would be good if you could list the different things
you plan to do and also the ones you don't among the ones
we've mentioned with Gael in our review
Alex
On Fri, Oct 15, 2010 at 11:40 PM, Matthieu Brucher
Post by Matthieu Brucher
Hi everyone,
I got a bit stopped in the refactoring, but I got back at it with some
- PEP 8
- some cleant modules
- using Scaler instead of center_normalized (the function is still
there but uses the scaler)
- renamed the base class for embedding
Other stuff currently in the pipe...
Post by Matthieu Brucher
Post by Gael Varoquaux
So, I am losing Internet connection in a few hours. I couldn't finish my
review. Sorry, it takes a while to dig in the code, and amidst a busy
conference it is a challenge.
Here are my notes on the code. To recap my general feeling: the code is
really going in the right direction. The example look right, although a
bit slow. I am still worried that there is a fairly high level of
indirection in the code: it is nested in terms of API, and has functions
calling functions, ... Finally, another general remark is that there are
many many place where the code looks fairly sub optimal (sparse matrices
with no optimisation on the sparse matrix type, no in-place operations in
core loops), and the code still is much, much slower than nipy's manifold
learning code. The combination of both the remaining complexity of the
code due to its abstraction and the slowness makes me think we shouldn't
integrate it right now, but rather iterate a bit more on it.
There is perhaps one algorithm in both of them, Laplacian Eigenmaps,
which we know will not be optimal until I branch your code ;)
Post by Gael Varoquaux
I am offline for 3 weeks starting now!
Gaël
Post by Matthieu Brucher
As Alexandre and Gaël suggested, I've created a branch with just a few
- the Barycenter Mapping
  - LLE
  - Laplacian Eigenmaps
  - Diffusion Maps
  - Hessian Eigenmaps
easy to use, and fast.
No, the most used algorithm is not available: Isomap.
Post by Gael Varoquaux
Post by Matthieu Brucher
Two tests fail because of the missing parts.
I am not sure what that means? I didn't find time to run the tests.
It was not very clear to me when I reread it as well :)
I have still twio failing tests, one because I need to check the
Matlab version of LLE, and one because I don't exactly know how to
test similarity constraints.
Post by Gael Varoquaux
The code is clearly making huge progress in terms of being accessible to
(semi) naive coders. I am starting to get the big picture of the code.
General remarks
===================
PEP8 say that keyword arguments in function calls should have no spaces
before and after '='.
I would prefer to avoid using numpy.linalg, but rather to favor
scipy.linalg: numpy.linalg is often compiled without proper blas support,
and is thus very slow.
numpy should be imported as 'np', and I would prefer to avoid the 'from
numpy import ...'. This is just a convention, but when it is used by
everybody, it makes life much easier, because anyone can dig in the
codebase without finding out how symbols are imported and what symbols
mean.
All this will be done.
Post by Gael Varoquaux
A lot of the docstrings are not properly formatted as in the numpy
docstring standard.
?? I tried to be compliant, did I miss something? I can't run Sphinx
on it because of the incompatibility between Numpy and the latest
Sphinx.
Post by Gael Varoquaux
It's a pitty that a lot of the functionnality cannot work on a similarity
graph, but rather on samples. I guess this is a general problem that
appears in other places in the scikit, and that the best option is to set
it appart, but we will need to address this in the long term.
It can be extracted, as the inner core algorithm for Laplacian
Eigenmaps and Diffusion Maps is shared. The same function is called
each time.
LLE and Hessian cannot work on given similarity graphs.
Post by Gael Varoquaux
mapping/barycenter.py
=========================
Any reason why the Barycenter is not in the neighbors code? IMHO, it
should be a regression version of the Neighbors classifiers, and could be
named 'NeighborsBarycenter'. The best way to do this would probably be
the split Neighbors in a base class that has no predict, and derive from
this base class the current version of the KNN and your Barycenter. That
would do away with the need to pass in neigh_alternate_arguments.
- from a design point of view, it's not an inheritance, so I strongly
advise against it. I will never personnaly do this. You can do it, but
I will never lay down an eye on this code ever again ;)
- changing the neighbor algorithm is mandatory if the module is
expected to live.
Post by Gael Varoquaux
mapping/builder.py
=====================
Can we just get rid of this. This seems to me as an extra indirection and
I am not sure why it is there.
No, it cannot. I've also given the arguments why it must stay, and
noone countered my arguments.
Post by Gael Varoquaux
Besides, you are explicitely testing for isinstance(BaseEstimator). We
want to avoid this, because it means that it is impossible to use objects
in the scikit that have not been explicitely built for that purpose.
In that case, you have to find a way of getting the same
functionality. It creates a new instance or uses the one provided. the
only way I could write this was to use the BaseEstimator class.
Besides, I don't know why we would give it something that isn't a
BaseEstimator as it must have the correct interface. Unless you want
to get rid of the class in the near future?
Post by Gael Varoquaux
If we do this, and move the barycenter to where the nearest neighbor is,
we can completetly do away with the mapping subdirectory, and thus reduce
the depth of the tree by one, which will make it easier to apprehend.
I'm sorry, I'm a bit tired at the moment (my child keot me awake this
night), but I really would like that anyone wanting to remove parts of
the module at least reads my answer to their arguments.
I have said countless times that I have other mappings.
Post by Gael Varoquaux
embedding/tools.py
=====================
embedding.tools.create_neighborer should not exist. It is an almost empty
function that is doing some hacking around the neighbors class.
+1
Post by Gael Varoquaux
embedding.tools.centered_normalized should be merged with the code that
is in scaler.
+1
Post by Gael Varoquaux
embedding.tools.dist2hd should be merged with
mean_shift.euclidian_distance, and both should be moved somewhere in
utils.
+1
Post by Gael Varoquaux
embedding/barycenters.py
==========================
'barycenter' function should be renamed to 'barycenter_weights', because
right now, it not clear what the function does from its name. The word
'barycenter' is very much overloaded in the codebase. Besides, it tales a
'kwargs' that is unused. Finally, this function should be moved to the
same file as the Barycenter object.
It can make more sense indeed.
Post by Gael Varoquaux
embedding/embedding.py
==========================
I am not happy with the name of this file. It confuses me: I am not sure
anymore what is in the embedding directory anymore, if it is not in the
embedding file. Actually, after reading the code bit more, I have
realised that this is just a base class, and actually an abstract base
class. The name should reflect this, because right now it takes a lot of
code reading to realize this. Maybe embedding.py -> base_embedding.py and
`class Embedding` -> `class BaseEmbedding`
OK
Post by Gael Varoquaux
   def __init__(self, n_coords, n_neighbors = None, neigh = None,
       self.n_neighbors = n_neighbors if n_neighbors is not None else 9
OK.
Post by Gael Varoquaux
embedding/similarities.py
==========================
In my opinion, a function like 'laplacian_maps' that takes a 'method'
function as an input argument is tedious to use. Any chance that you can
give is a default entry? This is in 'similarities.py', but it is not used
there, only in similarities_mds. I am not sure whether it is supposed to
be an internal use function -in which case, why is it not where it is
used?- or a function to be used by the user -in which case, why is it not
in __all__?.
The similarities/similarities_mds split is not well thoguht, I think.
This function is what creates the similarity graph and then does the
computation. It could be refactored to get clsoer to what you expect
(a function taking a graph as parameter).
Post by Gael Varoquaux
kwargs in most functions are impossible to use without dwelling in the
source code.
If a function has a kwargs, it's that it shouldn't be used without
know the structure of the code. Such functions are mainly builders
that must use additional parameter functions. They are mainly here to
avoid code duplication.
Matthieu (also tired of his very very very slow computer)
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Loading...