Discussion:
Suggestion: break up the metrics module
(too old to reply)
Robert Layton
2014-10-14 19:53:35 UTC
Permalink
Currently the word "metrics" is overloaded with at least two type of
algorithms in that module. The first is evaluation metrics and the second
is functions dealing with distance metrics.

My suggestion is to:
1) Move the evaluation metrics to a new top level folder called "evaluation"
2) Move the distance metrics to a new top level folder called "distance"
3) Create pointers with deprecation warnings from the metrics folder to the
above two folders.

This would be a big job -- lots of documentation to fix etc. So I wanted to
get suggestions before I start.

Thoughts?

- Robert
Lars Buitinck
2014-10-14 20:16:42 UTC
Permalink
Post by Robert Layton
Currently the word "metrics" is overloaded with at least two type of
algorithms in that module. The first is evaluation metrics and the second is
functions dealing with distance metrics.
1) Move the evaluation metrics to a new top level folder called "evaluation"
2) Move the distance metrics to a new top level folder called "distance"
3) Create pointers with deprecation warnings from the metrics folder to the
above two folders.
This would be a big job -- lots of documentation to fix etc. So I wanted to
get suggestions before I start.
Thoughts?
Didn't we already have a plan to move out the evaluation stuff?

Btw., there are also similarity functions in the module. Putting those
in a "distance" module seems a bit strange, so I suggest we just keep
the name for at least the distance stuff. (I know "metric" is the
mathematician's term for distance, but "similarity metric" is common
enough, I think.)
Joel Nothman
2014-10-14 21:01:27 UTC
Permalink
We had a plan to move out the model selection stuff. Presently that talked
about moving scorers, but not necessarily the metrics underlying them....
Post by Robert Layton
Post by Robert Layton
Currently the word "metrics" is overloaded with at least two type of
algorithms in that module. The first is evaluation metrics and the
second is
Post by Robert Layton
functions dealing with distance metrics.
1) Move the evaluation metrics to a new top level folder called
"evaluation"
Post by Robert Layton
2) Move the distance metrics to a new top level folder called "distance"
3) Create pointers with deprecation warnings from the metrics folder to
the
Post by Robert Layton
above two folders.
This would be a big job -- lots of documentation to fix etc. So I wanted
to
Post by Robert Layton
get suggestions before I start.
Thoughts?
Didn't we already have a plan to move out the evaluation stuff?
Btw., there are also similarity functions in the module. Putting those
in a "distance" module seems a bit strange, so I suggest we just keep
the name for at least the distance stuff. (I know "metric" is the
mathematician's term for distance, but "similarity metric" is common
enough, I think.)
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gael Varoquaux
2014-10-14 21:08:02 UTC
Permalink
Post by Robert Layton
Currently the word "metrics" is overloaded with at least two type of
algorithms in that module. The first is evaluation metrics and the
second is functions dealing with distance metrics.
Please, let's just try as much as possible to avoid such changes.

The goal of such a change is to make things prettier, or more logical,
according to a certain logic. The benefit is that, to certain, it will
make more sens. What's important to keep in mind, is that most users
don't understand the fine details of the acceptance of the names, and
that none of the module names make a huge amount of sens. Documentation
and Google searchs is what really sorts users out.

By changing module names, or any kind of API, we are making these Google
searchs unreliable, so we are actually making it harder for the users.

In addition, we are breaking people's code. Yes we have a deprecation
cycle, but it's costly for everybody to follow our changes.

Thus, for an API change (and that's an API change), there needs to be
clear benefits, IMHO.

Gaël
Daniel Vainsencher
2014-10-15 08:42:46 UTC
Permalink
Google is nice and I certainly use it, but I also often use
- IPython autocomplete
- Browsing the file system
and when they work, they are less of a distraction than Google.

So having *casual user discoverable* categories really is an actual
benefit. Of course breakage is cost, so changes should happen only if
the resulting categories actually make things easier to discover, and
should be concentrated into few releases so stuff isn't breaking all the
time.

Another "not real" benefit is that stuff that "doesn't make a huge
amount of sense" just looks bad, makes the project look bad, and
discourages users. At least IMHO.

For me at least "metrics" is a pretty confusing name, just because of
the ambiguity mentioned in this thread. How about "quality_measures" and
"distance_measures"?

Python has import XYZ as X, so are long names still bad?

Daniel
Post by Gael Varoquaux
Post by Robert Layton
Currently the word "metrics" is overloaded with at least two type of
algorithms in that module. The first is evaluation metrics and the
second is functions dealing with distance metrics.
Please, let's just try as much as possible to avoid such changes.
The goal of such a change is to make things prettier, or more logical,
according to a certain logic. The benefit is that, to certain, it will
make more sens. What's important to keep in mind, is that most users
don't understand the fine details of the acceptance of the names, and
that none of the module names make a huge amount of sens. Documentation
and Google searchs is what really sorts users out.
By changing module names, or any kind of API, we are making these Google
searchs unreliable, so we are actually making it harder for the users.
In addition, we are breaking people's code. Yes we have a deprecation
cycle, but it's costly for everybody to follow our changes.
Thus, for an API change (and that's an API change), there needs to be
clear benefits, IMHO.
Gaël
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Arnaud Joly
2014-10-15 08:49:07 UTC
Permalink
I totally agree with Gael.

I would welcome improvements in the narrative documentation
of http://scikit-learn.org/stable/modules/metrics.html about
distances and kernels. It feels empty compare to
http://scikit-learn.org/stable/modules/model_evaluation.html

Best regards,
Arnaud
Post by Gael Varoquaux
Post by Robert Layton
Currently the word "metrics" is overloaded with at least two type of
algorithms in that module. The first is evaluation metrics and the
second is functions dealing with distance metrics.
Please, let's just try as much as possible to avoid such changes.
The goal of such a change is to make things prettier, or more logical,
according to a certain logic. The benefit is that, to certain, it will
make more sens. What's important to keep in mind, is that most users
don't understand the fine details of the acceptance of the names, and
that none of the module names make a huge amount of sens. Documentation
and Google searchs is what really sorts users out.
By changing module names, or any kind of API, we are making these Google
searchs unreliable, so we are actually making it harder for the users.
In addition, we are breaking people's code. Yes we have a deprecation
cycle, but it's costly for everybody to follow our changes.
Thus, for an API change (and that's an API change), there needs to be
clear benefits, IMHO.
Gaël
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Michael Eickenberg
2014-10-15 08:59:28 UTC
Permalink
+1 for what Gaël and Arnaud say.

In addition to that, I don't know if a distinction between two groups of
metrics is necessarily straightforward. At which number of properties would
one draw the line? Triangular inequality? Is KL divergence an "evaluation
metric" or a "distance metric"? It almost seems to be an opinion question.

Michael
Post by Arnaud Joly
I totally agree with Gael.
I would welcome improvements in the narrative documentation
of http://scikit-learn.org/stable/modules/metrics.html about
distances and kernels. It feels empty compare to
http://scikit-learn.org/stable/modules/model_evaluation.html
Best regards,
Arnaud
Currently the word "metrics" is overloaded with at least two type of
algorithms in that module. The first is evaluation metrics and the
second is functions dealing with distance metrics.
Please, let's just try as much as possible to avoid such changes.
The goal of such a change is to make things prettier, or more logical,
according to a certain logic. The benefit is that, to certain, it will
make more sens. What's important to keep in mind, is that most users
don't understand the fine details of the acceptance of the names, and
that none of the module names make a huge amount of sens. Documentation
and Google searchs is what really sorts users out.
By changing module names, or any kind of API, we are making these Google
searchs unreliable, so we are actually making it harder for the users.
In addition, we are breaking people's code. Yes we have a deprecation
cycle, but it's costly for everybody to follow our changes.
Thus, for an API change (and that's an API change), there needs to be
clear benefits, IMHO.
Gaël
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andy
2014-11-02 21:14:11 UTC
Permalink
+1 for what Gaël and Arnaud say.
In addition to that, I don't know if a distinction between two groups
of metrics is necessarily straightforward. At which number of
properties would one draw the line? Triangular inequality? Is KL
divergence an "evaluation metric" or a "distance metric"? It almost
seems to be an opinion question.
One works on Xs and one on ys ;)

(As a mathematician, these are clearly not disjoint. The usecases and
API are, though.)

I agree we should do as little API changes as possible, but probably
move evaluation code to a more sensible point.
That probably does not include the actual metric functions, though, I'd
think.

------------------------------------------------------------------------------
Continue reading on narkive:
Loading...