Mathieu Blondel

2010-11-14 17:23:35 UTC

Hello,

We started a discussion on normalizing the module names:

https://github.com/scikit-learn/scikit-learn/pull/14

I start a thread to continue the discussion here on the mailing-list,

as it is more convenient and allows everyone to participate.

Currently, there seems to be a consensus for explicit names rather

than acronyms or abbreviations. I think this is a good thing but we

need to take care of ridiculously long names.

Some modifications are easy:

* gmm:

from scikits.learn.gaussian_mixture import GaussianMixture

* logistic:

from scikits.learn.glm.logistic_regression import LogisticRegression

Some modifications are more difficult. What to do of fastica, pca,

lda, qda, hmm, sgd, svm? Here are some I like:

* hmm:

from scikits.learn.hidden_markov import GaussianHMM

* lda:

from scikits.learn.linear_discriminant import LDA

or

from scikits.learn.fisher_discriminant import FisherDiscriminant

Another question to normalize is singular vs plural, e.g

from scikits.learn.gaussian_process import GaussianProcess

vs

from scikits.learn.gaussian_processes import GaussianProcess

I'm +1 for singular.

Another question to normalize is module grouping. In my opinion,

module grouping can potentially be dangerous, so let's be careful

here.

Gael suggests to rename glm to linear_models (or linear_model?). I'm

+1 for linear_model (generalized_linear_model seems too long anyway).

Shall LDA, SVM and SGD go to the linear_model group? I'd say yes for

LDA, no for SVM and SGD as they are quite big modules on their own.

Any opinion?

Please participate to the discussion so we can converge to the best

naming scheme :)

Mathieu

We started a discussion on normalizing the module names:

https://github.com/scikit-learn/scikit-learn/pull/14

I start a thread to continue the discussion here on the mailing-list,

as it is more convenient and allows everyone to participate.

Currently, there seems to be a consensus for explicit names rather

than acronyms or abbreviations. I think this is a good thing but we

need to take care of ridiculously long names.

Some modifications are easy:

* gmm:

from scikits.learn.gaussian_mixture import GaussianMixture

* logistic:

from scikits.learn.glm.logistic_regression import LogisticRegression

Some modifications are more difficult. What to do of fastica, pca,

lda, qda, hmm, sgd, svm? Here are some I like:

* hmm:

from scikits.learn.hidden_markov import GaussianHMM

* lda:

from scikits.learn.linear_discriminant import LDA

or

from scikits.learn.fisher_discriminant import FisherDiscriminant

Another question to normalize is singular vs plural, e.g

from scikits.learn.gaussian_process import GaussianProcess

vs

from scikits.learn.gaussian_processes import GaussianProcess

I'm +1 for singular.

Another question to normalize is module grouping. In my opinion,

module grouping can potentially be dangerous, so let's be careful

here.

Gael suggests to rename glm to linear_models (or linear_model?). I'm

+1 for linear_model (generalized_linear_model seems too long anyway).

Shall LDA, SVM and SGD go to the linear_model group? I'd say yes for

LDA, no for SVM and SGD as they are quite big modules on their own.

Any opinion?

Please participate to the discussion so we can converge to the best

naming scheme :)

Mathieu