首页 | 官方网站   微博 | 高级检索  
     

面向相似重复记录检测的特征优选方法
引用本文:李鑫,李军,丰继林,高方平,李忠.面向相似重复记录检测的特征优选方法[J].传感器与微系统,2011,30(2):37-40.
作者姓名:李鑫  李军  丰继林  高方平  李忠
作者单位:防灾科技学院灾害信息工程系,河北,三河,065201
基金项目:国家科技支撑计划资助项目,中国地震局教师科研基金资助项目,河北省教育厅自然科学研究计划资助项目
摘    要:大数据集相似重复记录检测和识别中,数据源组成复杂、表征数据记录的特征属性过多,因而检测精度不高、执行检测的代价过大.针对这些问题,提出了一种分组模糊聚类的特征优选方法.首先进行分组记录的属性处理,以有效降低记录属性的维数,并获得分组中的代表性记录,然后采用一种相似度比较计算方法进行组内相似重复记录的检测.理论分析和实验...

关 键 词:特征优选  相似重复记录  模糊聚类  相似度

An optimal feature selection method for approximately duplicate records detecting
LI Xin,LI Jun,FENG Ji-lin,GAO Fang-ping,LI Zhong.An optimal feature selection method for approximately duplicate records detecting[J].Transducer and Microsystem Technology,2011,30(2):37-40.
Authors:LI Xin  LI Jun  FENG Ji-lin  GAO Fang-ping  LI Zhong
Affiliation:LI Xin,LI Jun,FENG Ji-lin,GAO Fang-ping,LI Zhong(Department of Information Technology,Institute of Disaster Prevention Science and Technology,Sanhe 065201,China)
Abstract:During duplicate records detection and recognition in large number of data sets,detection precision is low and cost of detecting is high because source of data are complicated and there are too many feature attributes.To solve these problems,an optimal feature selection method based on fuzzy clustering in groups is proposed.It deals with attributes of records in groups so as to reduce dimensions of attributes recorded effectively and obtain representative records in groups.It detects approximately duplicate...
Keywords:optimal feature selection  approximately duplicate records  fuzzy clustering  similarity  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号