首页 | 官方网站   微博 | 高级检索  
     

基Word Embedding的软件工程领域语义相关词挖掘方法
引用本文:胡望胜.基Word Embedding的软件工程领域语义相关词挖掘方法[J].计算机与现代化,2017,0(9):19.
作者姓名:胡望胜
基金项目:国家自然科学基金资助项目(61572312,61572313); 上海市科委科研项目(15DZ1100305)
摘    要:软件的开发及维护过程中经常要对代码进行搜索。基于关键字匹配的代码搜索面临与传统信息检索一样的问题,即用户查询关键字与代码文本用词不匹配。为提高代码搜索精度,需要挖掘软件中的语义相关词进行查询扩展。本文针对软件工程领域设计了一种基于Word Embedding的语义相关词挖掘方法,并且采用IT技术问答网站Stack Overflow的文档作为语料库训练得到了共包含19332个单词的语义相关词表。与前人工作的对比实验验证了本文方法挖掘的语义相关词能有效提高代码搜索精度。

关 键 词:代码搜索    查询扩展    语义相关词  
收稿时间:2017-09-19

Learning Semantically Related Words in Software Through Word Embedding
HU Wang-sheng.Learning Semantically Related Words in Software Through Word Embedding[J].Computer and Modernization,2017,0(9):19.
Authors:HU Wang-sheng
Abstract:Searching for previously written code is important for software development and maintenance. The same as traditional information retrieval, the inherent difficulty of keyword based code search is vocabulary mismatch between user query and retrieved code. To improve the accuracy of code search, learning semantically related words in software for query expansion is needed. This paper designs a Word Embedding based method to learn semantically related words in software, and obtains semantically related words for 19332 words through training it on Stack Overflow documents. The experiment results show that the learned semantically related words can effectively improve code search accuracy.
Keywords:code search  query expansion  semantically related words  
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号