首页 | 官方网站   微博 | 高级检索  
     

软件开发活动数据的数据质量问题
引用本文:涂菲菲,周明辉.软件开发活动数据的数据质量问题[J].软件学报,2019,30(5):1522-1531.
作者姓名:涂菲菲  周明辉
作者单位:高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871
基金项目:国家重点研发计划(2018YFB1004201);国家自然科学基金(61432001,61825201)
摘    要:问题追踪系统和版本控制系统等软件开发支持工具已被广泛应用于开源和商业软件的开发中,产生了大量的数据,即软件开发活动数据.软件开发活动数据被广泛应用于科学研究和开发实践,为智能化开发提供支持.然而数据质量对相关的研究和实践有重大影响,却还没有得到足够的重视.为了能够更好地警示数据使用者潜在的数据质量问题,通过文献调研和访谈,并基于自有经验对数据进行分析,总结出了9种数据质量问题,覆盖了数据产生、数据收集和数据使用这3个不同的阶段.进一步地,提出了相应的方法以帮助发现和解决数据问题.发现问题是指加强对数据上下文的理解和通过统计分析及数据可视化发现潜在的数据质量问题,解决问题是指利用冗余数据或者挖掘用户行为模式进行修正.

关 键 词:数据质量  数据产生  数据收集  数据应用  问题追踪数据  版本控制数据
收稿时间:2018/8/31 0:00:00
修稿时间:2018/10/31 0:00:00

Data Quality Problems in Software Development Activity Data
TU Fei-Fei and ZHOU Ming-Hui.Data Quality Problems in Software Development Activity Data[J].Journal of Software,2019,30(5):1522-1531.
Authors:TU Fei-Fei and ZHOU Ming-Hui
Affiliation:Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China and Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
Abstract:Software development tools, such as issue tracking system (ITS) and version control system (VCS), are widely used in the intelligent development of open source software and commercial software. When using these tools to assist software development, they produce substantial amount of data, which is called software development activity data. Data quality has attracted more and more attention with increasingly rich software activity data sources and their wide uses. Faithfully, data is the basis of intelligent development. Data quality has influence on research and practice. To remind data users of latent data quality problem of software developement activity data, three aspects are indicated that may have data quality problems through literature review and interview with data users. The data quality problems arose from three phases, i.e., data production, data collection, and data use. Next, to improve the data quality of software development activity data, several recommendations are proposed that could be taken into consideration, including finding data quality problems and solving data quality problems. First of all, researchers should have a clear understanding of the context of data. Next, they may use statistical analysis and data visualization to find latent data quality problems. Finally, they can try to correct the particular problems by redundant data or to improve data quality by user behavior analysis.
Keywords:data quality  data production  data collection  data use  issue tracking data  version control data
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号