nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2019, S2, v.49;No.303 176-184
一种基于局部特征的层次聚类算法
基金项目(Foundation): 国家重点研究发展计划项目(2016YFC1402000);; 中央级公益性科研院所基本科研业务费专项资金项目(2014T07)资助~~
邮箱(Email):
DOI: 10.16441/j.cnki.hdxb.20170055
发布时间: 2019-12-15
出版时间: 2019-12-15
移动端阅读
摘要:

聚类算法在数据挖掘中起到十分重要的作用,其中CHAMELEON算法因具有发现任意形状簇类的能力,成为一种常用算法。本文针对CHAMELEON算法在簇类内部密度存在变化时聚类效果不佳等问题,采用自适应生成近邻图、基于局部特征分割近邻图、聚合子簇等方法,提出了一种基于局部特征与网格结构的层次聚类算法,并使用二维数据集,与不同的聚类算法进行了测试和对比分析。实验结果表明,本文算法在数据分布复杂的情况下,能够得到较理想的聚类效果。

Abstract:

Clustering algorithm plays a very important role in data mining. Clustering in data mining is a discovery process that groups a set of data so that the intracluster similarity is maximized and the intercluster similarity is minimized. These discovered clusters can be used to explain the characteristics of the underlying data distribution, thus serve as the foundation for other data mining and analysis techniques. Clustering algorithms have wide applications in image processing, document classification, pattern recognition, spatial data analysis, economic science and clustering web log data to discover groups of similar access patterns.Among them, CHAMELEON has become a common algorithm, because it has the ability of discovering clusters with arbitrary shapes. Aiming at the deficiency that CHAMELON won't work well if there is a change in the internal density of clusters, this paper introduces a hierarchical clustering algorithm based on local features and grid structure, which generates neighbor graph adaptively, partitions the graph and merges sub-clusters based on local features. After experimenting on two-dimensional data sets and comparing with the reswlts of CHAMELEON algorithm, DBSCAN algorithm and K-means algorithm, the results show that this algorithm can obtain good clustering result in the complex distribution such as changing density within clusters, irregular shape and size difference. This proposed algorithm has a wide application prospect.

参考文献

[1] Sathya Deepa M,Sujatha N.Comparative studies of various clustering techniques and its characteristics[J].International Journal of Advanced Networking & Applications,2014(2):109-114.

[2] Zafar M H,Ilyas M.A clustering based study of classification algorithms[J].International Journal of Database Theory & Application,2015,8(1):11-22.

[3] Karypis G,Han E H,Kumar V.Chameleon a hierarchical clustering algorithm using dynamic modeling[J].Computer,1999,32(8):68-75.

[4] 龙真真,张策,刘飞裔,等.一种改进的Chameleon算法[J].计算机工程,2009,35(20):189-191.LONG Zhen-zhen,ZHANG Ce,LIU Fei-yi.Improved chameleon algorithm[J].Computer Engineering,2009,35(20):189-191.

[5] 薛文娟,刘培玉,刘栋.引入共享近邻加权图的Chameleon算法[J].计算机应用,2012,32(10):2884-2887.XUE Wen-juan,LIU Pei-yu,LIU Dong.Improved chameleon algorithm using weighted nearest neighbors graph[J].Journal of Computer Applications,2012,32(10):2884-2887.

[6] Dashora R,Bajaj H,Dube A,et al.Parallel algorithm for the chameleon clustering algorithm using dynamic modeling[J].International Journal of Computer Applications,2013,79(8):11-17.

[7] Ma L.Parallel chameleon clustering based on mapReduce[J].Journal of Information & Computational Science,2015,12(6):2053-2062.

[8] Xue J,Liu X.A modified chameleon algorithm based on hybrid tissue-like P systems[J].Advances in Information Sciences & Service Sciences,2013,5(11):219-228.

[9] Zhao Y,Liu X,Yan X.A grid-based chameleon algorithm based on the tissue-like P system with promoters and inhibitors[J].Journal of Computational & Theoretical Nanoscience,2016,13(6):3652-3658.

[10] Chen Y,Sprague A P,Reilly K D.MABAC-Matrix Based Clustering Algorithm.[C].//Proceedings of the International Conference on Algorithmic Mathematics & Computer Science.Las Vegas:MSV/AMCS,2004:439-443.

[11] 蒋盛益,庞观松,张黎莎.Chameleon算法的改进[J].小型微型计算机系统,2010,31(8):1643-1646.JIANG Sheng-yi,PANG Guan-song,ZHANG Li-sha.Enhanced chameleon clustering algorithm[J].Journal of Chinese Computer Systems,2010,31(8):1643-1646.

[12] Drakshayani M B,Prasad E V.Semantic based model for text document clustering with idioms[J].International Journal of Data Engineering,2013,4(1):1-13.

[13] Gupta U,Patil N.Recommender system based on Hierarchical Clustering algorithm Chameleon[C].//Advance Computing Conference.Banglore:IEEE,2015:1006-1010.

[14] Zhang H,Wang D,Wang L,et al.A semantics-based method for clustering of Chinese web search results[J].Enterprise Information Systems,2014,8(1):147-165.

[15] Rodriguez A,Laio A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.

[16] Zhang W,Li J.Extended fast search clustering algorithm:Widely density clusters,no density peaks[J].Computer Science,2015,5(7):1-17.

[17] Prasanth A,Hemalatha M.Chameleon clustering algorithm with semantic analysis algorithm for efficient web usage mining[J].International Review on Computers & Software,2015,10(6):580.

[18] Ester M,Kriegel H P,Sander J,et al.A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C].Portland KDD:International Conference Knowledge Discovery and Data Mining,Potland,KDD:1996:226-231.

[19] Hartigan J A,Wong M A.Algorithm as 136:A k-means clustering algorithm[J].Applied Statistics,1979,28(1):100-108.

[20] Ankerst M,Breunig M M,Kriegel H P,et al.OPTICS:Ordering points to identify the clustering structure[J].Acm Sigmod Record,1999,28(2):49-60.

基本信息:

DOI:10.16441/j.cnki.hdxb.20170055

中图分类号:TP311.13

引用信息:

[1]王鹏宇,王国宇,贾贞,等.一种基于局部特征的层次聚类算法[J].中国海洋大学学报(自然科学版),2019,49(S2):176-184.DOI:10.16441/j.cnki.hdxb.20170055.

基金信息:

国家重点研究发展计划项目(2016YFC1402000);; 中央级公益性科研院所基本科研业务费专项资金项目(2014T07)资助~~

发布时间:

2019-12-15

出版时间:

2019-12-15

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文