相似度算法分析与比较研究 |
| |
引用本文: | 陈天,刘文浩.相似度算法分析与比较研究[J].电脑与微电子技术,2012(12):18-20. |
| |
作者姓名: | 陈天 刘文浩 |
| |
作者单位: | 四川大学软件工程系,成都610064 |
| |
摘 要: | 针对RSS阅读器中冗余信息带来的不便。在采用中文分词和TF·IDF算法计算相似度进行预处理后,选取Levenshtein、余弦夹角法,Jaccard这三种相似度算法进行冗余信息鉴别。详细讨论这些方法的特征,并从实际应用的角度对这些方法的长处和不足做分析与比较,并选择Jaccard算法实现一个数据过滤机制。
|
关 键 词: | 计算机应用技术 TP·IDF 相似度计算:ICTCLAS |
Research on the Analysis and Comparison on Similarity Algorithm |
| |
Authors: | CHEN Tian LIU Wen-hao |
| |
Affiliation: | (School of Software Engineering, Sic huan University, Chengdu 610064) |
| |
Abstract: | In order to overcome the disadvantages of redundant RSS information, after using technologies of Chinese Segmentation and TP-IDF algorithm as pretreatment for similarity algorithm com- parison, makes the comparison among Levenshtein, Cosine ratio and Jaccard algorithm. Dis- cusses the features of these algorithms and compares the strengths and weaknesses. And intro- duces a simple data filtration mechanism by using optimal Jaccard algorithm. |
| |
Keywords: | Computer Applications Technology TP IDF Similitude Calculate ICTCLAS(Institute of ComputingTechnology Chinese Lexical Analysis System) |
本文献已被 维普 等数据库收录! |