首页 | 官方网站   微博 | 高级检索  
     

基于句法分析的代码摘要技术
引用本文:王金水,薛醒思,翁伟. 基于句法分析的代码摘要技术[J]. 计算机应用, 2015, 35(7): 1999-2003. DOI: 10.11772/j.issn.1001-9081.2015.07.1999
作者姓名:王金水  薛醒思  翁伟
作者单位:1. 福建工程学院 信息科学与工程学院, 福州 350108;2. 厦门理工学院 计算机与信息工程学院, 福建 厦门 361024
基金项目:国家自然科学基金资助项目(61402108);福建省中青年教师教育科研资助项目(JA14221);福建工程学院科研启动基金资助项目(GY-Z13113; GY-Z14068)。
摘    要:针对词袋模型忽略了词条之间语义关系和概念结构的问题,提出一种基于句法分析的代码摘要技术。首先,该技术利用词性标注识别出最有可能体现代码特性的关键词;然后,通过块分析修正在词性标注过程中可能引入的错误;其次,对标识出的关键词进行降噪,以减少文本噪声带来的不利影响;最后,从关键词中选取若干个权值最高的词以组成代码摘要。实验结果表明,与基于词频-逆文档频率(TF-IDF)和基于TF-IDF扩展的代码摘要技术对比,所提技术生成的代码摘要与参考答案的重叠率(overlap)至少分别提高了9%和6%,说明该技术能够生成更加准确的代码摘要。

关 键 词:代码摘要  文本摘要  句法分析  自然语言处理  程序理解  
收稿时间:2015-01-21
修稿时间:2015-03-27

Source code summarization technology based on syntactic analysis
WANG Jinshui,XUE Xingsi,WENG Wei. Source code summarization technology based on syntactic analysis[J]. Journal of Computer Applications, 2015, 35(7): 1999-2003. DOI: 10.11772/j.issn.1001-9081.2015.07.1999
Authors:WANG Jinshui  XUE Xingsi  WENG Wei
Affiliation:1. College of Information Science and Engineering, Fujian University of Technology, Fuzhou Fujian 350108, China;
2. College of Computer and Information Engineering, Xiamen University of Technology, Xiamen Fujian 361024, China
Abstract:For overcoming the drawback of ignoring the semantic relationship between terms and concept structure in the bag of words model, a source code summarization technology based on syntactic analysis was proposed. Firstly, the part-of-speech tagging was utilized to recognize the keywords that characterized the code feature most. Secondly, the chunk parsing was used to revise the errors that could be introduced in the process of part-of-speech tagging. Thirdly, the noise reduction for those keywords was carried out to decrease the influence of text noise. Finally, several keywords with highest weights were selected to compose the summaries. Through the comparison with TF-IDF (Term Frequency-Inverse Document Frequency)-based and extended TF-IDF-based source code summarization technologies in the experiment, with respect to the overlap coefficient of the golden set, the summaries obtained by the proposed technology are improved by at least 9% and 6% respectively, which illuminates that the proposed technology is able to generate more precise source code summaries.
Keywords:source code summarization   text summarization   syntactic analysis   natural language processing   program comprehension
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号