Discussion:
[Scikit-learn-general] silhouette_score and silhouette_samples
Sebastian Raschka
2015-06-16 03:54:53 UTC
Permalink
Hi, all,

I am a little bit confused about the two related metrics silhouette_score and silhouette_samples. The silhouette_samples calculates the silhouette coefficient for each sample and returns an array of those. However, I am wondering if I interpret the silhouette_score correctly. Based on the documentation at http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html I assume that it's just the average of the silhouette coefficients, which can be confirmed by running, e.g.,

np.mean(silhouette_samples(X, y, metric='euclidean'))

Now, I am wondering why silhouette_score has this additional random_state parameter?

Best,
Sebastian
------------------------------------------------------------------------------
Joel Nothman
2015-06-16 04:34:27 UTC
Permalink
See the sample_size parameter: silhouette score can be calculated on a
random subset of the data, presumably for efficiency. Feel free to submit a
PR improving the docstring.
Post by Sebastian Raschka
Hi, all,
I am a little bit confused about the two related metrics silhouette_score
and silhouette_samples. The silhouette_samples calculates the silhouette
coefficient for each sample and returns an array of those. However, I am
wondering if I interpret the silhouette_score correctly. Based on the
documentation at
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html
I assume that it's just the average of the silhouette coefficients, which
can be confirmed by running, e.g.,
np.mean(silhouette_samples(X, y, metric='euclidean'))
Now, I am wondering why silhouette_score has this additional random_state parameter?
Best,
Sebastian
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Sebastian Raschka
2015-06-16 05:26:43 UTC
Permalink
Thanks, Joel, it makes total sense now! Updating the docstring sounds like a good idea, I will get to it in the next couple of days.

Best,
Sebastian
See the sample_size parameter: silhouette score can be calculated on a random subset of the data, presumably for efficiency. Feel free to submit a PR improving the docstring.
Hi, all,
I am a little bit confused about the two related metrics silhouette_score and silhouette_samples. The silhouette_samples calculates the silhouette coefficient for each sample and returns an array of those. However, I am wondering if I interpret the silhouette_score correctly. Based on the documentation at http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html I assume that it's just the average of the silhouette coefficients, which can be confirmed by running, e.g.,
np.mean(silhouette_samples(X, y, metric='euclidean'))
Now, I am wondering why silhouette_score has this additional random_state parameter?
Best,
Sebastian
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Continue reading on narkive:
Loading...