首页 | 官方网站   微博 | 高级检索  
     

中文软件问答社区主题分析研究
引用本文:蒋竞,吕江枫,张莉. 中文软件问答社区主题分析研究[J]. 软件学报, 2020, 31(4): 1143-1161
作者姓名:蒋竞  吕江枫  张莉
作者单位:北京航空航天大学计算机学院,北京 100191;北京航空航天大学计算机学院,北京 100191;北京航空航天大学计算机学院,北京 100191
基金项目:国家重点研发计划(2018YFB1004202);国家自然科学基金(61672078)
摘    要:软件问答社区是软件开发者通过问答方式进行技术交流的网络平台.近年来,软件问答社区积累了大量用户讨论的技术问答内容.一些研究者对Stack Overflow等英文问答社区进行主题分析研究,但是缺少对于中文软件问答社区的分析.通过对中文软件回答社区开展主题分析研究,不仅可以指导开发者更好地了解技术动向,而且可以帮助管理者改进社区、吸引更多用户参与."开源中国"是中国最大的技术社区之一.对"开源中国"开展了开发者问题主题分析研究.收集"开源中国"的92 383个开发者问题,采用隐狄利克雷分配模型的主题分析方法,分析开发者问题的主题分布、热度趋势、回答情况和关键技术热度等.发现:(1)开发者讨论的技术主题分为前端开发、后端开发、数据库、操作系统、通用技术和其他6个类别.其中,前端开发讨论占比最大.(2)后端开发下的主题中用户的关注重点从传统的项目部署、服务器配置转移到较新的分布式系统等主题.(3)数据展示主题的零回答问题比例最高,数据类型主题下的零回答问题比例最低.(4)在技术学习主题下,用户对于Java的讨论明显多于对Python的讨论.

关 键 词:软件问答社区  主题模型  经验研究  隐狄利克雷分配模型  开源中国
收稿时间:2019-07-21
修稿时间:2019-10-09

Topic Analysis on Chinese Programming Question and Answer Websites
JIANG Jing,Lü Jiang-Feng,ZHANG Li. Topic Analysis on Chinese Programming Question and Answer Websites[J]. Journal of Software, 2020, 31(4): 1143-1161
Authors:JIANG Jing  Lü Jiang-Feng  ZHANG Li
Affiliation:School of Computer Science and Engineering, Beihang University, Beijing 100191
Abstract:Programming question and answer website is a network platform where software developers can exchange technical knowledge by posting and answering questions. With the development of Internet and growth in the number of software developers, programming question and answer websites accumulate extensive discussion contents of software engineering knowledge. Researchers have applied topic analysis on English question and answer websites in recent years, yet there are few similar studies on Chinese programming question and answer websites. Analyzing these contents can help developers know more about the trends of techniques. It also benefits website administrator to improve the forum for better user experience, etc. This study applies latent Dirichlet allocation (LDA) to automatically cluster the main topics in 92 383 questions on OSCHINA. Then, several analyses are applied to these topics, including trend analysis, difficulty analysis, and keyword analysis. Several findings are as follow:(1) Topics concluded from user discussion can be divided into 6 categories, including front-end development, back-end development, databases, operating systems, general techniques, and others. Within those categories, front-end development contains the most question posts. (2) Using trend analysis, it is found that in back-end development, developers are paying more attention to more up-to-date and advanced topics (distributed systems, system design & Web interfaces) rather than basic topics (project deployment, server configuration). (3) It is also found that data presentation is the most difficult topic, as it has the highest ratio of questions which are never answered while its popularity is above average. (4) The trend of different specific techniques is analyzed in one topic. For instance, the popularity of Java in the technique learning topic is obviously higher than the popularity of Python.
Keywords:programming question and answer websites  topic model  empirical study  latent Dirichlet allocation  OSCHINA
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号