基于马氏距离的重采样方法在流量识别中的应用Application of Resampling Method Based on Mahalanobis Distance in Traffic Identification
时鸿涛,李洪平,刘竞
摘要(Abstract):
针对网络流量识别中的多分类数据分布不均衡的问题,本文提出了一种基于马氏距离的重采样方法。首先,将网络流量数据进行零均值化处理并转换至主成分空间;再根据少数类样本数据到集合中心点之间的马氏距离对其进行新样本的生成;之后将新生成的样本数据转换至原始空间并进行逆零均值化处理;最后返回所有新生成的样本数据。使用剑桥大学公共网络流量数据进行流量分类实验,实验结果表明该方法能够有效提升少数类的识别准确率,并且比现有的重采样方法和成本敏感方法能够获得更好的分类效果。
关键词(KeyWords): 马氏距离;主成分分析;流量识别;多分类不均衡;重采样方法
基金项目(Foundation): 国家高技术研究发展计划项目(2013AA09A506-4)资助~~
作者(Author): 时鸿涛,李洪平,刘竞
DOI: 10.16441/j.cnki.hdxb.20170150
参考文献(References):
- [1] 时鸿涛,盖凌云,郭忠文.一种基于小波谱的流量识别方法[J].计算机工程,2012,38(12):72-74.Shi Hongtao,Gai lingyun,Guo Zhongwen.Traffic identification method based on wavelet spectrum[J].Computer Engineering,2012,38(12):72-74.
- [2] Shi Hongtao,Liang Gang,Wang Hai.A novel traffic identification approach based on multifractal analysis and combined neural network[J].Annals of Telecommunications,2014,69(3):155-169.
- [3] Labovitz C,Iekel-Johnson S,Mcpherson D,et al.Internet Inter-Domain Traffic[C].New Delhi:ACM SIGCOMM 2010 Conference,2010:75-86.
- [4] Chawla N V,Bowyer K W,Hall L O,et al.SMOTE:Synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
- [5] Han H,Wang W Y,Mao B H.Borderline-SMOTE:A new over-sampling method in imbalanced data sets learning[J].Lecture Notes in Computer Science,2005,3644(5):878-887.
- [6] Xie J,Qiu Z.The effect of imbalanced data sets on LDA:A theoretical and empirical analysis[J].Pattern Recognition,2007,40(2):557-562.
- [7] Kai M T.An instance-weighting method to induce cost-sensitive trees[J].IEEE Transactions on Knowledge & Data Engineering,2002,14(3):659-665.
- [8] Weiss G M,Mccarthy K,Zabar B.Cost-Sensitive Learning vs.Sampling:Which is Best for Handling Unbalanced Classes with Unequal Error Costs?[C].Las Vegas:International Conference on Data Mining,2007:35-41.
文章评论(Comment):
|
||||||||||||||||||
|
||||||||||||||||||