Dženan Softić
2016-02-07 20:12:21 UTC
Hi,
I am doing some experiments with BIRCH. When BIRCH finish, I would
like to merge subclusters based on some criteria. I am doing this this
by calling "merge_subcluster" method on subcluster that I want to
merge with, passing it subcluster object of the second cluster:
cluster1.merge_subcluster(cluster2, self.threshold)
It seems to work, since it updates correctly N, LS, SS (n_samples,
linear_sum, squared_sum). What is left is to remove a merged
subcluster (cluster2) from the subclusters list and to update
centroids:
ind = leaf.subclusters_.index(cluster1) #getting the index to update
the centroid
ind_remove = leaf.subclusters_.index(cluster2) #getting the index of a
cluster that needs to be removed because it is merged
leaf.init_centroids_[ind] = cluster1.centroid_ #update centroid
leaf.init_sq_norm_[ind] = cluster1.sq_norm_
leaf.centroids_ = np.delete(leaf.centroids_, ind_remove, 0) #removing
the centroid of a cluster2
self.root_.init_centroids_ = np.delete(self.root_.init_centroids_,
ind_remove, 0) #removing the centroid from the root
leaf.subclusters_.remove(cluster) #removing the cluster itself
I am not sure I am doing it the right way. Any suggestion/comment
would be very much appreciated.
Thanks,
Dzeno
I am doing some experiments with BIRCH. When BIRCH finish, I would
like to merge subclusters based on some criteria. I am doing this this
by calling "merge_subcluster" method on subcluster that I want to
merge with, passing it subcluster object of the second cluster:
cluster1.merge_subcluster(cluster2, self.threshold)
It seems to work, since it updates correctly N, LS, SS (n_samples,
linear_sum, squared_sum). What is left is to remove a merged
subcluster (cluster2) from the subclusters list and to update
centroids:
ind = leaf.subclusters_.index(cluster1) #getting the index to update
the centroid
ind_remove = leaf.subclusters_.index(cluster2) #getting the index of a
cluster that needs to be removed because it is merged
leaf.init_centroids_[ind] = cluster1.centroid_ #update centroid
leaf.init_sq_norm_[ind] = cluster1.sq_norm_
leaf.centroids_ = np.delete(leaf.centroids_, ind_remove, 0) #removing
the centroid of a cluster2
self.root_.init_centroids_ = np.delete(self.root_.init_centroids_,
ind_remove, 0) #removing the centroid from the root
leaf.subclusters_.remove(cluster) #removing the cluster itself
I am not sure I am doing it the right way. Any suggestion/comment
would be very much appreciated.
Thanks,
Dzeno