首页 | 官方网站   微博 | 高级检索  
     

基于句子相似度的论文抄袭检测模型研究
引用本文:冷强奎,秦玉平,王春立.基于句子相似度的论文抄袭检测模型研究[J].计算机工程与应用,2011,47(24):199-201.
作者姓名:冷强奎  秦玉平  王春立
作者单位:1. 渤海大学信息科学与工程学院,辽宁锦州,121000
2. 大连海事大学信息科学技术学院,辽宁大连,116026
基金项目:国家自然科学基金,辽宁省教育厅重点实验室项目
摘    要:提出一种基于句子相似度的论文抄袭检测模型。利用局部词频指纹算法对大规模文档进行快速检测,找出疑似抄袭文档。根据最长有序公共子序列算法计算句子间的相似度,并标注抄袭细节,给出抄袭依据。在标准中文数据集SOGOU-T上进行的实验表明,该模型具有较强的局部信息挖掘能力,在一定程度上克服了现有的论文抄袭检测算法精度不高的缺点。

关 键 词:句子相似度  抄袭检测  局部词频  最长有序公共子序列
修稿时间: 

Study on model for plagiarism-detection of scientific papers based on sentence similarity
LENG Qiangkui,QIN Yuping,WANG Chunli.Study on model for plagiarism-detection of scientific papers based on sentence similarity[J].Computer Engineering and Applications,2011,47(24):199-201.
Authors:LENG Qiangkui  QIN Yuping  WANG Chunli
Affiliation:1.College of Information Science and Engineering,Bohai University,Jinzhou,Liaoning 121000,China 2.College of Information Science and Technology,Dalian Maritime University,Dalian,Liaoning 116026,China
Abstract:A new model for plagiarism-identification of scientific papers based on sentence similarity is presented.Large-scale texts are quickly detected with Local Word-Frequency Fingerprin(tLWFF) to find suspected plagiarism ones.Sentence similari-ty is computed according to the Longest Sorted Common Subsequence(LSCS) between source texts and destination texts.The algorithm can mark plagiarism details,and show evidence.The identification experiments on the SOGOU-T database are done with this model.The results show it has higher information mining capacity,and partly overcomes the shortage of low-er precision on existing plagiarism-identification of scientific papers.
Keywords:sentence similarity  plagiarism-detection  local word-frequency  Longest Sorted Common Subsequence(LSCS)
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号