首页 | 官方网站   微博 | 高级检索  
     

软件问答社区的问题删除预测方法
引用本文:蒋竞,苗萌,赵丽娴,张莉.软件问答社区的问题删除预测方法[J].软件学报,2022,33(5):1699-1710.
作者姓名:蒋竞  苗萌  赵丽娴  张莉
作者单位:北京航空航天大学 计算机学院, 北京 100191
基金项目:科技创新2030-“新一代人工智能”重大项目(2018AAA0102304); 国家自然科学基金(62177003); 中央高校基本科研业务费(YWF-20-BJ-J-1018)
摘    要:Stack Overflow是最受欢迎的软件问答社区之一,用户可以在该网站发布问题并得到其他用户的回答.为了保证问题质量,网站需要尽快发现并删除低质量或者不符合社区主题的问题.当前, Stack Overflow主要采用人工检查的方式发现需要被删除的问题.然而这种方式往往不能保证问题被及时发现、删除,而且加重了社区管理员的负担.为了快速发现需要删除的问题,提出了自动化预测问题删除的方法 MulPredictor.该方法提取问题的语义内容特征、语义统计特征和元特征,使用随机森林分类器计算问题会被删除的概率.实验结果表明:与现有方法DelPredictor和NLPPredictor相比, MulPredictor的准确率在平衡测试集上分别提升了16.34%和12.78%,在随机测试集上分别提升了12.38%和14.14%.此外,分析了影响问题删除的重要特征,发现代码段、问题的标题和正文第1段的特征对问题删除有重要的影响.

关 键 词:问题删除预测  问题质量  问题分类  软件问答社区  Stack  Overflow
收稿时间:2021/8/10 0:00:00
修稿时间:2021/10/9 0:00:00

Prediction Method for Question Deletion in Software Question and Answer Community
JIANG Jing,MIAO Meng,ZHAO Li-Xian,ZHANG Li.Prediction Method for Question Deletion in Software Question and Answer Community[J].Journal of Software,2022,33(5):1699-1710.
Authors:JIANG Jing  MIAO Meng  ZHAO Li-Xian  ZHANG Li
Affiliation:School of Computer Science and Engineering, Beihang University, Beijing 100191, China
Abstract:Stack Overflow is one of the most popular software question and answer communities, where users can post questions and receive answers from others. In order to ensure the quality of questions, the website needs to promptly discover and delete questions with low quality or not conforming to the community''s theme. Currently, Stack Overflow mainly relies on manual inspection to find questions that need to be deleted. However, this way usually hardly guarantees to discover and delete questions in time, and increases the burden of community administrators. In order to quickly find questions that need to be deleted, this paper proposes a method to automatically predict question deletion, which is named MulPredictor. This method extracts the semantic content features, the semantic statistical features and the meta features of a question, and uses the random forest classifier to calculate the probability that it will be deleted. Experimental results showed that, compared with existing methods DelPredictor and NLPPredictor, MulPredictor increases the accuracy by 16.34% and 12.78% on balanced test set, and increases the accuracy by 12.38% and 14.14% on random test set. In addition, this paper also analyzes important features in question deletion, and find that the code segment, the question''s title, and the first paragraph of the question''s body have the most significant impacts on question deletion.
Keywords:prediction of question deletion  question quality  question classification  software question and answer community  Stack Overflow
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号