Discussion:
[Scikit-learn-general] Interpreting the cluster_centers in sklearn KMEans
JAGANADH G
2016-04-29 23:59:36 UTC
Permalink
Hi ,
After performing clustering, the cluster centers can be extracted via
.cluster_centers_.

A sample result is

kmeans.cluster_centers_
array([[ 1.01505989, -0.70632886],
[ 0.33475124, 0.89126382],
[-1.287003 , -0.43512572]])

How can *I interpret these values.*

*Can somebody help me understanding this document bit detail*

*cluster_centers_* : array, [n_clusters, n_features]

Coordinates of cluster centers
--
**********************************
JAGANADH G
http://jaganadhg.in
*ILUGCBE*
http://ilugcbe.org.in
Sebastian Raschka
2016-04-30 00:09:23 UTC
Permalink
Hi, Jaganadh,

it looks like you ran k-means on a 2-dimensional dataset (i.e., a dataset with 2 feature variables) and k=3. Thus, the results mean that these three cluster centers (or “centroids”) are the centers of the 3 clusters that k-means attempted to discover. Or in other words, there are 3 globular spheres with its center points
[ 1.01505989, -0.70632886], [ 0.33475124, 0.89126382], and [-1.287003 , -0.43512572]
and each of the training points will be closest to one of these centroids, which defines the cluster a training point has been assigned to. Here’s a figure of how it could look like when plotted in a 2D scatterplot, hopefully, it makes it more clear: Loading Image...

Best,
Sebastian
Hi ,
After performing clustering, the cluster centers can be extracted via .cluster_centers_.
A sample result is
kmeans.cluster_centers_
array([[ 1.01505989, -0.70632886],
[ 0.33475124, 0.89126382],
[-1.287003 , -0.43512572]])
How can I interpret these values.
Can somebody help me understanding this document bit detail
cluster_centers_ : array, [n_clusters, n_features]
Coordinates of cluster centers
--
**********************************
JAGANADH G
http://jaganadhg.in
ILUGCBE
http://ilugcbe.org.in
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
JAGANADH G
2016-05-02 18:39:21 UTC
Permalink
Hi Sebastian,
Thanks for the explanations. But what am seeing is that if I feed a data
with 3K features/attributes the "cluster_centers_ method will give 3K
values for each clusters. My objective is to give a name to each cluster
based on the feature names closest to the centroids .

Best

Jagan
Post by Sebastian Raschka
Hi, Jaganadh,
it looks like you ran k-means on a 2-dimensional dataset (i.e., a dataset
with 2 feature variables) and k=3. Thus, the results mean that these three
cluster centers (or “centroids”) are the centers of the 3 clusters that
k-means attempted to discover. Or in other words, there are 3 globular
spheres with its center points
[ 1.01505989, -0.70632886], [ 0.33475124, 0.89126382], and
[-1.287003 , -0.43512572]
and each of the training points will be closest to one of these centroids,
which defines the cluster a training point has been assigned to. Here’s a
figure of how it could look like when plotted in a 2D scatterplot,
https://raw.githubusercontent.com/rasbt/mlxtend/master/docs/sources/user_guide/cluster/Kmeans_files/Kmeans_17_0.png
Best,
Sebastian
Hi ,
After performing clustering, the cluster centers can be extracted via
.cluster_centers_.
A sample result is
kmeans.cluster_centers_
array([[ 1.01505989, -0.70632886],
[ 0.33475124, 0.89126382],
[-1.287003 , -0.43512572]])
How can I interpret these values.
Can somebody help me understanding this document bit detail
cluster_centers_ : array, [n_clusters, n_features]
Coordinates of cluster centers
--
**********************************
JAGANADH G
http://jaganadhg.in
ILUGCBE
http://ilugcbe.org.in
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
**********************************
JAGANADH G
http://jaganadhg.in
*ILUGCBE*
http://ilugcbe.org.in
Sebastian Raschka
2016-05-02 19:30:29 UTC
Permalink
Thanks for the explanations. But what am seeing is that if I feed a data with 3K features/attributes the "cluster_centers_ method will give 3K values for each clusters.
Yes, that’s correct!
My objective is to give a name to each cluster based on the feature names closest to the centroids .
Hm, doesn’t sound like you are looking for k-means then. Since you have 3K "coordinate system” the centroids will also be 3K, what you essentially do is finding the samples closest to the centroid, not the features.
Maybe a crazy thing to do (and I haven’t thought about it thoroughly): you could transpose the input matrix so that you feed (n_features, n_samples) instead of (n_samples, n_features). Not sure if it makes sense though.
Hi Sebastian,
Thanks for the explanations. But what am seeing is that if I feed a data with 3K features/attributes the "cluster_centers_ method will give 3K values for each clusters. My objective is to give a name to each cluster based on the feature names closest to the centroids .
Best
Jagan
Hi, Jaganadh,
it looks like you ran k-means on a 2-dimensional dataset (i.e., a dataset with 2 feature variables) and k=3. Thus, the results mean that these three cluster centers (or “centroids”) are the centers of the 3 clusters that k-means attempted to discover. Or in other words, there are 3 globular spheres with its center points
[ 1.01505989, -0.70632886], [ 0.33475124, 0.89126382], and [-1.287003 , -0.43512572]
and each of the training points will be closest to one of these centroids, which defines the cluster a training point has been assigned to. Here’s a figure of how it could look like when plotted in a 2D scatterplot, hopefully, it makes it more clear: https://raw.githubusercontent.com/rasbt/mlxtend/master/docs/sources/user_guide/cluster/Kmeans_files/Kmeans_17_0.png
Best,
Sebastian
Hi ,
After performing clustering, the cluster centers can be extracted via .cluster_centers_.
A sample result is
kmeans.cluster_centers_
array([[ 1.01505989, -0.70632886],
[ 0.33475124, 0.89126382],
[-1.287003 , -0.43512572]])
How can I interpret these values.
Can somebody help me understanding this document bit detail
cluster_centers_ : array, [n_clusters, n_features]
Coordinates of cluster centers
--
**********************************
JAGANADH G
http://jaganadhg.in
ILUGCBE
http://ilugcbe.org.in
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
**********************************
JAGANADH G
http://jaganadhg.in
ILUGCBE
http://ilugcbe.org.in
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...