nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2023, S1, v.53;No.350 184-189
一种基于k-means算法的代表点估计方法
基金项目(Foundation): 国家自然科学基金项目(U1706226)资助~~
邮箱(Email):
DOI: 10.16441/j.cnki.hdxb.20230072
投稿时间: 2023-02-16
投稿日期(年): 2023
修回时间: 2023-03-31
终审时间: 2023-05-06
终审日期(年): 2023
审稿周期(年): 1
移动端阅读
摘要:

本文讨论了基于k-means算法的代表点估计。根据k-means算法对偏进行校正,给出了一维连续分布代表点估计的新方法(Revised k-means, RKM方法)。以一维正态分布为例将该算法求解的代表点应用于核密度估计,比较了随机样本(独立同分布)、修改的Monte Carlo方法、数论方法的样本(伪Monte Carlo方法)和RKM方法基于这4类近似离散统计分布的代表点的核密度估计,其中RKM代表点表现效果最好。

Abstract:

This paper discusses representative point estimation based on the k-means algorithm. A new method for estimating representative points of one-dimensional continuous distributions(RKM method) is given based on the correction of bias by the k-means algorithm. Taking one-dimensronal Normal distribution as an example, The kernel density estimation of representative points based on four types of approximate discrete statistical distributions(random samples(independent identical distribution), modified Monte Carlo method, samples of number theoretic methods(pseudo-Monte Carlo method) and RKM method) is compared, among which the RKM representative points are closest to the original overall distribution.

参考文献

[1] Cox D.Note on grouping[J].Journal of the American Statistical Association,1957,52(280):543-547.

[2] Fang K T,He S D.The problem of selecting a given number of representative points in a normal population and a generalized Mills’ ratio (No.TR-327)[R].Palo Alto:Department of Statistics,Stanford University,1982.

[3] Flury B.Principal points[J].Biometrika,1990,77:33-41.

[4] Stampfer E,Stadlober E.Methods for estimating principal points[J].Communications in Statistics-Simulation and Computation,2002,31:261-277.

[5] Tarpey T.A parametric k-means algorithm[J].Comput Stat,2007,22(1):71-89.

[6] Pollard D.Strong consistency of k-means clustering[J].The Annals of Statistics,1981,9:135-140.

[7] Pollard D.A central limit theorem for k-means clustering[J].The Annals of Probability,1982,10:919-926.

[8] 方开泰,贺平,杨骏.统计分布的代表点集及其应用[J].中国科学:数学,2020,50(9):1149-1168.Fang K T,He P,Yang J.Representative point sets of statistical distributions and their applications[J].Chinese Science:Mathematics,2020,50(9):1149-1168.

[9] Korobov N M.The approximation of multiple integrals[J].Doklady Akademii Nauk SSSR,1959,124:1207-1210.

[10] Flury B.Estimation of principal points[J].Journal of the Royal Statistical Society:Series C (Applied Statistics),1993,42:139-151.

[11] Li L,Flury B.Uniqueness of principal points for univariate distributions[J].Statistics and Probability Letters,1995,25(4):323- 327.

[12] Tarpey T.Two principal points of symmetric,strongly unimodal distributions[J].Statistics and Probability Letters,1994,20,253-257.

[13] Tarpey T.Estimating principal points of univariate distributions[J].Journal of Applied Statistics,1997,24(5):499-512.

[14] Trushkin A V.Sufficient conditions for uniqueness of a locally optimal quantizer for a class of convex error weighting functions[J].IEEE Transactions on Information Theory,1982,28:187-198.

[15] Mao S S,Wang J l,Pu X l.Higher Mathematical Statistics[M].Second Edition.Beijing China:Higher Education Press,2006.

[16] Wang S G.The Theory and Applications of Linear Model[M].Hefei:Anhui Education Publishing House,1987.

[17] Hartigan J A.Clustering Algorithms[M].New York:John Wiley & Sons Inc,1974.

基本信息:

DOI:10.16441/j.cnki.hdxb.20230072

中图分类号:O212.1

引用信息:

[1]王世康,类淑河.一种基于k-means算法的代表点估计方法[J],2023,53(S1):184-189.DOI:10.16441/j.cnki.hdxb.20230072.

基金信息:

国家自然科学基金项目(U1706226)资助~~

投稿时间:

2023-02-16

投稿日期(年):

2023

修回时间:

2023-03-31

终审时间:

2023-05-06

终审日期(年):

2023

审稿周期(年):

1

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文