Negative samples reduction in cross-company software defects prediction期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Negative samples reduction in cross-company software defects prediction

Affiliation:	1. School of Information Science and Technology, Nantong University, Nantong, China;2. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, China;3. College of Intelligence and Computing, Tianjin University, Tianjin, China;4. Computer School, Beijing Information Science and Technology University, Beijing, China;5. School of Software Technology, Zhejiang University, Ningbo, China;1. Key Laboratory of Dependable Service Computing in Cyber Physical Society Ministry of Education, Chongqing University, Chongqing, China;2. School of Big Data & Software Engineering, Chongqing University, Chongqing, China;3. Faculty of Information Technology, Monash University, Melbourne, Australia;4. College of Computer Science and Technology, Zhejiang University, Hangzhou, China

Abstract:	ContextSoftware defect prediction has been widely studied based on various machine-learning algorithms. Previous studies usually focus on within-company defects prediction (WCDP), but lack of training data in the early stages of software testing limits the efficiency of WCDP in practice. Thus, recent research has largely examined the cross-company defects prediction (CCDP) as an alternative solution.ObjectiveHowever, the gap of different distributions between cross-company (CC) data and within-company (WC) data usually makes it difficult to build a high-quality CCDP model. In this paper, a novel algorithm named Double Transfer Boosting (DTB) is introduced to narrow this gap and improve the performance of CCDP by reducing negative samples in CC data.MethodThe proposed DTB model integrates two levels of data transfer: first, the data gravitation method reshapes the whole distribution of CC data to fit WC data. Second, the transfer boosting method employs a small ratio of labeled WC data to eliminate negative instances in CC data.ResultsThe empirical evaluation was conducted based on 15 publicly available datasets. CCDP experiment results indicated that the proposed model achieved better overall performance than compared CCDP models. DTB was also compared to WCDP in two different situations. Statistical analysis suggested that DTB performed significantly better than WCDP models trained by limited samples and produced comparable results to WCDP with sufficient training data.ConclusionsDTB reforms the distribution of CC data from different levels to improve the performance of CCDP, and experimental results and analysis demonstrate that it could be an effective model for early software defects detection.

Keywords:	Cross-company defects prediction Software fault prediction Transfer learning
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏