关键词:kmeans聚类算法优缺点、kmeans聚类算法matlab、kmeans聚类算法流程图、kmeans聚类算法的应用、kmeans聚类算法实现、kmeans聚类算法原理
聚类算法是这样的一种算法:给定一坨样本数据Sample,要求将样本Sample中相似的数据聚到一类。有了这个认识之后,就应该了解了聚类算法要干什么了吧。说白了,就是归类。
首先,我们需要考虑的是,如何衡量数据之间的相似程度?比如说,有一群说不同语言的人,我们一般是根据他们的方言来聚类的(当然,你也可以指定以身高来聚类)。这里,语言的相似性(或者身高)就成了我们衡量相似的量度了。在考虑存在海量数据,如微博上各种用户的关系网,如何根据用户的关注和被关注来聚类,给用户推荐他们感兴趣的用户?这就是聚类算法研究的内容之一了。
Kmeans就是这样的聚类算法中比较简单的算法,给定数据样本集Sample和应该划分的类数K,对样本数据Sample进行聚类,最终形成K个cluster,其相似的度量是 某条数据i与中心点的“距离”(这里所说的距离,不止于二维)。如下图:
算法的过程是这样的:
1) :input Samples and it’s number N, numSample , k ;
2) :randomize select k sample from the Samples as the initialed clusters , set the maxCompute to m ,and the passNum to 0;
3) :for i=0 to N-1
a): for j=0 to K-1
i) : compute the distance between record i and cluster center j;
ii): add the record i to closest cluster;
4) :for j=0 to K-1
a) : reSelect center between the cluster j ,according given rule;
b) : clear all the older merber in the cluster;
5) :passNum++ ; if all the new center same to the older center or passNum >m ,terminate!else continue;
下面给出Kmeans运行的一个实例:
样本数据如下:
2.1 3.0 10.0
4.0 5.2 -1.0
5.1 1.5 2.3
10.5 12.6 10.8
12.1 10.9 11.0
4.2 5.3 -9.8
5.4 1.6 8.7
-1.0 -2.1 -0.9
0.5 0.3 0.4
需要分成三个聚类
用这个数据运行后,程序打印出划分结果:
—- 第1个聚类 —-
聚类中心:3.75,2.3,9.35
2.1,3,10
5.4,1.6,8.7
—- 第2个聚类 —-
聚类中心:4.1,5.25,-5.4
4,5.2,-1
4.2,5.3,-9.8
—- 第3个聚类 —-
聚类中心:1.53333,-0.1,0.6
5.1,1.5,2.3
-1,-2.1,-0.9
0.5,0.3,0.4
—- 第4个聚类 —-
聚类中心:11.3,11.75,10.9
10.5,12.6,10.8
12.1,10.9,11
至此,我们得到了三个相似性很高的聚类
下面,就是c++的粗略实现了:
//the single sample
class Sample{
public:
unsigned int r, g ,b;
Sample(){};
Sample(unsigned int x, unsigned int y , unsigned int z){
r=x ; g= y; b= z;
}
//compute the distance of two sample
int distance(Sample other){
return ((r-other.r)*(r-other.r) +
(g-other.g) * (g-other.g) +
(b-other.b) * (b-other.b));
}
};
//cluster class
class Cluster{
private :
Sample center; //the center of cluster
int iCenter; //the sub of center
int number; //number of the merber
int *merber;// index of the merber, record the sub
public :
Cluster(){merber = NULL; number = 0; iCenter = -1;}
Cluster(int n , Sample c , int *m , int i)
:number(n), center(c) , merber(m) , iCenter(i)
{
}
Sample getCenter(){
return center;
}
void setCenterNumber(int i){
iCenter = i;
}
int getCenterNumber(){
return iCenter;
}
void setCenter(Sample c){
center = c;
}
void setNumber(int n){
number = n;
}
int getMerberNumber (){
return number;
}
void addMerber(int n){
merber[number++] = n;
}
int getNewCenter(){
int newSub=0;
for(int i=0 ; i<number ; i++){
newSub+=merber[i];
}
return newSub/number;
}
};
//Kmeans class
#include <time.h>
#include <stdio>
#include “Sample.h”
#include “Cluster.h”
class Kmeans{
private:
int num , clusterNum ;
Cluster * clusters;
Sample * samples;
public :
Kmeans(int n, int d, int k, Sample * ob)
: num(n)
, clusterNum(k)
, samples(ob)
, clusters(new Cluster[k])
{
//assert(n>1 && k<n) ;
}
Kmeans(){};
~Kmeans(){delete []clusters ; delete []samples ; }
//initial the clusters
void initialCluster(){
int temp = num/clusterNum;
for(int i=0;i<clusterNum ; i++){
clusters[i].setCenter(samples[(temp*i)%num]) ;
clusters[i].setNumber(0);
}
}
bool run(){
bool converged = false; // signing converged
int passNum = 0;
while (!converged && passNum < 999) //if not converged , try again
// the compute times passNum should under 999
{
for(int i=0 ; i<clusterNum ; i++){
//converged = ( clusters[i].getCenter()) == clusters[clusters[i].getNewCenter()].getCenter();
int temp = clusters[i].getCenterNumber();
converged = clusters[i].getNewCenter()!=temp ;
clusters[i].setCenter(samples[temp]);
}
distribute(); //distribute the samples to the closest cluster
//converged = (); //
passNum++;
}
}
void distribute()
{
// clear all the older cluster merber , and reCluster
for(int k=0; k<clusterNum; k++)
clusters[k].setNumber(0);
// compute the new closest cluster
for(int i=0; i<num; i++){
clusters[getClosestCluster(i)].addMerber(i);
}
}
//get closest cluster,return the sub
int getClosestCluster(int i){
int iShortestDistance=0 , sub=0;
for(int j=0;j<clusterNum ; j++){
int temp = samples[i].distance(clusters[j].getCenter());
if(iShortestDistance< temp){
iShortestDistance = temp;
sub=j;
}
}
return sub;
}
};
转载请注明:数据分析 » Kmeans聚类算法_kmeans聚类算法的应用_kmeans聚类算法实现