期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Transfer learning for cross-company software defect prediction

Ying Ma Guangchun Luo Aiguo Chen 《Information and Software Technology》2012,54(3):248-256

Context

Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company data. It is difficult to employ these models which are built on the within-company data in practice, because of the lack of these local data repositories. Recently, transfer learning has attracted more and more attention for building classifier in target domain using the data from related source domain. It is very useful in cases when distributions of training and test instances differ, but is it appropriate for cross-company software defect prediction?

Objective

In this paper, we consider the cross-company defect prediction scenario where source and target data are drawn from different companies. In order to harness cross company data, we try to exploit the transfer learning method to build faster and highly effective prediction model.

Method

Unlike the prior works selecting training data which are similar from the test data, we proposed a novel algorithm called Transfer Naive Bayes (TNB), by using the information of all the proper features in training data. Our solution estimates the distribution of the test data, and transfers cross-company data information into the weights of the training data. On these weighted data, the defect prediction model is built.

Results

This article presents a theoretical analysis for the comparative methods, and shows the experiment results on the data sets from different organizations. It indicates that TNB is more accurate in terms of AUC (The area under the receiver operating characteristic curve), within less runtime than the state of the art methods.

Conclusion

It is concluded that when there are too few local training data to train good classifiers, the useful knowledge from different-distribution training data on feature level may help. We are optimistic that our transfer learning method can guide optimal resource allocation strategies, which may reduce software testing cost and increase effectiveness of software testing process. 相似文献

2.

A defect prediction method for software versioning

Yomi Kastro Ayşe Basar Bener 《Software Quality Journal》2008,16(4):543-562

New methodologies and tools have gradually made the life cycle for software development more human-independent. Much of the research in this field focuses on defect reduction, defect identification and defect prediction. Defect prediction is a relatively new research area that involves using various methods from artificial intelligence to data mining. Identifying and locating defects in software projects is a difficult task. Measuring software in a continuous and disciplined manner provides many advantages such as the accurate estimation of project costs and schedules as well as improving product and process qualities. This study aims to propose a model to predict the number of defects in the new version of a software product with respect to the previous stable version. The new version may contain changes related to a new feature or a modification in the algorithm or bug fixes. Our proposed model aims to predict the new defects introduced into the new version by analyzing the types of changes in an objective and formal manner as well as considering the lines of code (LOC) change. Defect predictors are helpful tools for both project managers and developers. Accurate predictors may help reducing test times and guide developers towards implementing higher quality codes. Our proposed model can aid software engineers in determining the stability of software before it goes on production. Furthermore, such a model may provide useful insight for understanding the effects of a feature, bug fix or change in the process of defect detection.

Ayşe Basar BenerEmail:

相似文献

3.

基于BBNs的软件残留缺陷预测模型 总被引：3，自引：0，他引：3

郑翠芳吴志杰夏涛张伟燕《微计算机信息》2006,22(3):269-274

介绍了软件残留缺陷的重要性,简要阐述了目前对残留缺陷进行预测的一些模型,并指出了其中的问题及现有模型适应性不好的原因,提出了基于BBNs的软件残留缺陷数预测模型,给出了模型构建的具体过程。相似文献

4.

Software Residual defects Prediction Model based on BBNs

Zheng Cuifang Wu Zhijie Xia Tao Zhang Weiyan 《微计算机信息》2006,(3)

IntroductionTesting and modification of software are repetitiveprocesses.When to release and implement the qualifiedsoftware product is an important question.The purposeof residual defects'prediction is to keep the code de-fects number under the acceptable level in testing times.It is very important for a decision maker to estimate thephase of software testing and the achievable object.It issignificant for maintenance of delivered software.1Software residual defects predic-tion modelSoftware… 相似文献

5.

《Information Systems》2015

A novel transfer learning method is proposed in this paper to solve the power load forecast problems in the smart grid. Prediction errors of the target tasks can be greatly reduced by utilizing the knowledge transferred from the source tasks. In this work, a source task selection algorithm is developed and the transfer learning model based on Gaussian process is constructed. Negative knowledge transfers are avoided compared with the previous works, and therefore the prediction accuracies are greatly improved. In addition, a fast inference algorithm is developed to accelerate the prediction steps. The results of the experiments with real world data are illustrated. 相似文献

6.

An empirical study of predicting software faults with case-based reasoning 总被引：1，自引：0，他引：1

Taghi M. Khoshgoftaar Naeem Seliya Nandini Sundaresh 《Software Quality Journal》2006,14(2):85-111

The resources allocated for software quality assurance and improvement have not increased with the ever-increasing need for better software quality. A targeted software quality inspection can detect faulty modules and reduce the number of faults occurring during operations. We present a software fault prediction modeling approach with case-based reasoning (CBR), a part of the computational intelligence field focusing on automated reasoning processes. A CBR system functions as a software fault prediction model by quantifying, for a module under development, the expected number of faults based on similar modules that were previously developed. Such a system is composed of a similarity function, the number of nearest neighbor cases used for fault prediction, and a solution algorithm. The selection of a particular similarity function and solution algorithm may affect the performance accuracy of a CBR-based software fault prediction system. This paper presents an empirical study investigating the effects of using three different similarity functions and two different solution algorithms on the prediction accuracy of our CBR system. The influence of varying the number of nearest neighbor cases on the performance accuracy is also explored. Moreover, the benefits of using metric-selection procedures for our CBR system is also evaluated. Case studies of a large legacy telecommunications system are used for our analysis. It is observed that the CBR system using the Mahalanobis distance similarity function and the inverse distance weighted solution algorithm yielded the best fault prediction. In addition, the CBR models have better performance than models based on multiple linear regression. Taghi M. Khoshgoftaar is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the Empirical Software Engineering Laboratory. His research interests are in software engineering, software metrics, software reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, and statistical modeling. He has published more than 200 refereed papers in these areas. He has been a principal investigator and project leader in a number of projects with industry, government, and other research-sponsoring agencies. He is a member of the Association for Computing Machinery, the IEEE Computer Society, and IEEE Reliability Society. He served as the general chair of the 1999 International Symposium on Software Reliability Engineering (ISSRE’99), and the general chair of the 2001 International Conference on Engineering of Computer Based Systems. Also, he has served on technical program committees of various international conferences, symposia, and workshops. He has served as North American editor of the Software Quality Journal, and is on the editorial boards of the journals Empirical Software Engineering, Software Quality, and Fuzzy Systems. Naeem Seliya received the M.S. degree in Computer Science from Florida Atlantic University, Boca Raton, FL, USA, in 2001. He is currently a Ph.D. candidate in the Department of Computer Science and Engineering at Florida Atlantic University. His research interests include software engineering, computational intelligence, data mining, software measurement, software reliability and quality engineering, software architecture, computer data security, and network intrusion detection. He is a student member of the IEEE Computer Society and the Association for Computing Machinery. 相似文献

7.

基于多核字典学习的软件缺陷预测

王铁建吴飞荆晓远《计算机科学》2017,44(12):131-134, 168

提出一种多核字典学习方法,用以对软件模块是否存在缺陷进行预测。用于软件缺陷预测的历史数据具有结构复杂、类不平衡的特点,用多个核函数构成的合成核将这些数据映射到一个高维特征空间,通过对多核字典基的选择,得到一个类别平衡的多核字典,用以对新的软件模块进行分类和预测,并判定其中是否存在缺陷。在NASA MDP数据集上的实验表明,与其他软件缺陷预测方法相比,多核字典学习方法能够针对软件缺陷历史数据结构复杂、类不平衡的特点,较好地解决软件缺陷预测问题。相似文献

8.

On universal transfer learning

M.M. Hassan Mahmud 《Theoretical computer science》2009

In transfer learning the aim is to solve new learning tasks using fewer examples by using information gained from solving related tasks. Existing transfer learning methods have been used successfully in practice and PAC analysis of these methods have been developed. But the key notion of relatedness between tasks has not yet been defined clearly, which makes it difficult to understand, let alone answer, questions that naturally arise in the context of transfer, such as, how much information to transfer, whether to transfer information, and how to transfer information across tasks. In this paper, we look at transfer learning from the perspective of Algorithmic Information Theory/Kolmogorov complexity theory, and formally solve these problems in the same sense Solomonoff Induction solves the problem of inductive inference. We define universal measures of relatedness between tasks, and use these measures to develop universally optimal Bayesian transfer learning methods. We also derive results in AIT that are interesting by themselves. To address a concern that arises from the theory, we also briefly look at the notion of Kolmogorov complexity of probability measures. Finally, we present a simple practical approximation to the theory to do transfer learning and show that even these are quite effective, allowing us to transfer across tasks that are superficially unrelated. The latter is an experimental feat which has not been achieved before, and thus shows the theory is also useful in constructing practical transfer algorithms. 相似文献

9.

Evolutionary induced decision trees for dangerous software modules prediction

Vili Podgorelec Peter Kokol 《Information Processing Letters》2002,82(1):31-38

We study the possibility of constructing decision trees with evolutionary algorithms in order to increase their predictive accuracy. We present a self-adapting evolutionary algorithm for the induction of decision trees and describe the principle of decision making based on multiple evolutionary induced decision trees—decision forest. The developed model is used as a fault predictive approach to foresee dangerous software modules, which identification can largely enhance the reliability of software. 相似文献

10.

Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem

Cagatay Catal Banu Diri 《Information Sciences》2009,179(8):1040-1058

Software quality engineering comprises of several quality assurance activities such as testing, formal verification, inspection, fault tolerance, and software fault prediction. Until now, many researchers developed and validated several fault prediction models by using machine learning and statistical techniques. There have been used different kinds of software metrics and diverse feature reduction techniques in order to improve the models’ performance. However, these studies did not investigate the effect of dataset size, metrics set, and feature selection techniques for software fault prediction. This study is focused on the high-performance fault predictors based on machine learning such as Random Forests and the algorithms based on a new computational intelligence approach called Artificial Immune Systems. We used public NASA datasets from the PROMISE repository to make our predictive models repeatable, refutable, and verifiable. The research questions were based on the effects of dataset size, metrics set, and feature selection techniques. In order to answer these questions, there were defined seven test groups. Additionally, nine classifiers were examined for each of the five public NASA datasets. According to this study, Random Forests provides the best prediction performance for large datasets and Naive Bayes is the best prediction algorithm for small datasets in terms of the Area Under Receiver Operating Characteristics Curve (AUC) evaluation parameter. The parallel implementation of Artificial Immune Recognition Systems (AIRS2Parallel) algorithm is the best Artificial Immune Systems paradigm-based algorithm when the method-level metrics are used. 相似文献

11.

《Applied Soft Computing》2015

BackgroundSoftware fault prediction is the process of developing models that can be used by the software practitioners in the early phases of software development life cycle for detecting faulty constructs such as modules or classes. There are various machine learning techniques used in the past for predicting faults.MethodIn this study we perform a systematic review of studies from January 1991 to October 2013 in the literature that use the machine learning techniques for software fault prediction. We assess the performance capability of the machine learning techniques in existing research for software fault prediction. We also compare the performance of the machine learning techniques with the statistical techniques and other machine learning techniques. Further the strengths and weaknesses of machine learning techniques are summarized.ResultsIn this paper we have identified 64 primary studies and seven categories of the machine learning techniques. The results prove the prediction capability of the machine learning techniques for classifying module/class as fault prone or not fault prone. The models using the machine learning techniques for estimating software fault proneness outperform the traditional statistical models.ConclusionBased on the results obtained from the systematic review, we conclude that the machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers. However, the application of the machine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results. We provide future guidelines to practitioners and researchers based on the results obtained in this work. 相似文献

12.

《Information Processing Letters》2014,114(9):469-474

This paper analyzes the ability of requirement metrics for software defect prediction. Statistical significance tests are used to compare six machine learning algorithms on the requirement metrics, design metrics, and combination of both metrics in our analysis. The experimental results show the effectiveness of the predictor built on the combination of the requirement and design metrics in the early phase of the software development process. 相似文献

13.

《Applied Soft Computing》2019

Time series prediction for higher future horizons is of great importance and has increasingly aroused interest among both scholars and practitioners. Compared to one-step-ahead prediction, multi-step-ahead prediction encounters higher dose of uncertainty arising from various facets, including accumulation of errors and lack of information. Many existing studies draw attention to the former issue, while relatively overlook the latter one. Inspired by this discovery, a new multi-task learning algorithm, called the MultiTL-KELM algorithm for short, is proposed for multi-step-ahead time series prediction in this work, where the long-ago data is utilized to provide more information for the current prediction task. The time-varying quality of time-series data usually gives rise to a wide variability between data over long time span, making it difficult to ensure the assumption of identical distribution. How to make the most of, rather than discard the abundant old data, and transfer more useful knowledge to current prediction is one of the main concerns of our proposed MultiTL-KELM algorithm. Besides, unlike typical iterated or direct strategies, MultiTL-KELM regards predictions of different horizons as different tasks. Knowledge from one task can benefit others, enabling it to explore the relatedness among horizons. Based upon its design scheme, MultiTL-KELM alleviates the accumulation error problem of iterated strategy and the time consuming of direct strategies. The proposed MultiTL-KELM algorithm has been compared with several other state-of-the-art algorithms, and its effectiveness has been numerically confirmed by the experiments we conducted on four synthetic and two real-world benchmark time series datasets. 相似文献

14.

基于CS-ANN的软件缺陷预测模型研究

王海林于倩李彤郁湧明利孙金文《计算机应用研究》2017,34(2)

为了提高软件缺陷预测的准确率,利用布谷鸟搜索算法(Cuckoo Search,CS)的寻优能力和人工神经网络算法（Artificial Neural Network,ANN）的非线性计算能力,提出了基于CS-ANN的软件缺陷预测方法。此方法首先使用基于关联规则的特征选择算法降低数据的维度,去除了噪声属性;利用布谷鸟搜索算法寻找神经网络算法的权值,然后使用权值和神经网络算法构建出预测模型;最后使用此模型完成缺陷预测。使用公开的NASA数据集进行仿真实验,结果表明该模型降低了误报率并提高了预测的准确率,综合评价指标AUC（area under the ROC curve）、F1值和G-mean都优于现有模型。相似文献

15.

《Advanced Engineering Informatics》2023

In complex working site, bearings used as the important part of machine, could simultaneously have faults on several positions. Consequently, multi-label learning approach considering fully the correlation between different faulted positions of bearings becomes the popular learning pattern. Deep reinforcement learning (DRL) combining the perception ability of deep learning and the decision-making ability of reinforcement learning, could be adapted to the compound fault diagnosis while having a strong ability extracting the fault feature from the raw data. However, DRL is difficult to converge and easily falls into the unstable training problem. Therefore, this paper integrates the feature extraction ability of DRL and the knowledge transfer ability of transfer learning (TL), and proposes the multi-label transfer reinforcement learning (ML-TRL). In detail, the proposed method utilizes the improved trust region policy optimization (TRPO) as the basic DRL framework and pre-trains the fixed convolutional networks of ML-TRL using the multi-label convolutional neural network method. In compound fault experiment, the final results demonstrate powerfully that the proposed method could have the higher accuracy than other multi-label learning methods. Hence, the proposed method is a remarkable alternative when recognizing the compound fault of bearings. 相似文献

16.

《Applied Soft Computing》2016

ContextSoftware defect prediction (SDP) is an important task in software engineering. Along with estimating the number of defects remaining in software systems and discovering defect associations, classifying the defect-proneness of software modules plays an important role in software defect prediction. Several machine-learning methods have been applied to handle the defect-proneness of software modules as a classification problem. This type of “yes” or “no” decision is an important drawback in the decision-making process and if not precise may lead to misclassifications. To the best of our knowledge, existing approaches rely on fully automated module classification and do not provide a way to incorporate extra knowledge during the classification process. This knowledge can be helpful in avoiding misclassifications in cases where system modules cannot be classified in a reliable way.ObjectiveWe seek to develop a SDP method that (i) incorporates a reject option in the classifier to improve the reliability in the decision-making process; and (ii) makes it possible postpone the final decision related to rejected modules for an expert analysis or even for another classifier using extra domain knowledge.MethodWe develop a SDP method called rejoELM and its variant, IrejoELM. Both methods were built upon the weighted extreme learning machine (ELM) with reject option that makes it possible postpone the final decision of non-classified modules, the rejected ones, to another moment. While rejoELM aims to maximize the accuracy for a rejection rate, IrejoELM maximizes the F-measure. Hence, IrejoELM becomes an alternative for classification with reject option for imbalanced datasets.ResultsrejoEM and IrejoELM are tested on five datasets of source code metrics extracted from real world open-source software projects. Results indicate that rejoELM has an accuracy for several rejection rates that is comparable to some state-of-the-art classifiers with reject option. Although IrejoELM shows lower accuracies for several rejection rates, it clearly outperforms all other methods when the F-measure is used as a performance metric.ConclusionIt is concluded that rejoELM is a valid alternative for classification with reject option problems when classes are nearly equally represented. On the other hand, IrejoELM is shown to be the best alternative for classification with reject option on imbalanced datasets. Since SDP problems are usually characterized as imbalanced learning problems, the use of IrejoELM is recommended. 相似文献

17.

Logistic曲线在软件开发质量预测中的应用研究

晏明《计算机应用与软件》2014,(11)

影响软件质量的因素除了开发方式多种多样外,还受其他因素影响。对于多阶段、不断开发、不断测试的软件开发项目,跟踪项目整体的测试质量对项目的质量控制有重要意义。研究发现软件开发项目中测试出的缺陷累计值的时间曲线基本符合Logistic与Gompertz函数曲线。采用VBA编程,遍历所有实测数据的三点可求解出实测数据分别与两条函数曲线拟合度最好(最小2乘法)的三个曲线参数(L,b,a)。其中Logistic曲线的L值(即饱和值)可用于预测软件开发项目系统稳定时的缺陷累计值。通过分析软件项目开发中及系统发布运行后的累计缺陷的实测值与函数曲线(三个参数决定的曲线)的预测值,发现该函数曲线可用于预测及监控软件开发过程中及系统发布后的软件质量。相似文献

18.

Class noise detection based on software metrics and ROC curves

Cagatay Catal Oral Alan Kerime Balkan 《Information Sciences》2011,181(21):4867-4877

Noise detection for software measurement datasets is a topic of growing interest. The presence of class and attribute noise in software measurement datasets degrades the performance of machine learning-based classifiers, and the identification of these noisy modules improves the overall performance. In this study, we propose a noise detection algorithm based on software metrics threshold values. The threshold values are obtained from the Receiver Operating Characteristic (ROC) analysis. This paper focuses on case studies of five public NASA datasets and details the construction of Naive Bayes-based software fault prediction models both before and after applying the proposed noise detection algorithm. Experimental results show that this noise detection approach is very effective for detecting the class noise and that the performance of fault predictors using a Naive Bayes algorithm with a logNum filter improves if the class labels of identified noisy modules are corrected. 相似文献

19.

软件可靠性模型建立 总被引：1，自引：0，他引：1

吕励王天荣《小型微型计算机系统》1995,16(12):24-28

本文给出一个实际软件的可靠性模型，这是一个小型软件，通过此软件可靠性模型的建立，我们得出下面的结论：一般中、小型软件的可靠性不一定适合指数分布，此例的软件为幂函数分布的。相似文献

20.

On the relative value of cross-company and within-company data for defect prediction

Burak Turhan Tim Menzies Ayşe B. Bener Justin Di Stefano 《Empirical Software Engineering》2009,14(5):540-578

We propose a practical defect prediction approach for companies that do not track defect related data. Specifically, we investigate the applicability of cross-company (CC) data for building localized defect predictors using static code features. Firstly, we analyze the conditions, where CC data can be used as is. These conditions turn out to be quite few. Then we apply principles of analogy-based learning (i.e. nearest neighbor (NN) filtering) to CC data, in order to fine tune these models for localization. We compare the performance of these models with that of defect predictors learned from within-company (WC) data. As expected, we observe that defect predictors learned from WC data outperform the ones learned from CC data. However, our analyses also yield defect predictors learned from NN-filtered CC data, with performance close to, but still not better than, WC data. Therefore, we perform a final analysis for determining the minimum number of local defect reports in order to learn WC defect predictors. We demonstrate in this paper that the minimum number of data samples required to build effective defect predictors can be quite small and can be collected quickly within a few months. Hence, for companies with no local defect data, we recommend a two-phase approach that allows them to employ the defect prediction process instantaneously. In phase one, companies should use NN-filtered CC data to initiate the defect prediction process and simultaneously start collecting WC (local) data. Once enough WC data is collected (i.e. after a few months), organizations should switch to phase two and use predictors learned from WC data.

Justin Di StefanoEmail:

Burak Turhan received his PhD degree from the department of Computer Engineering at Bogazici University. He recently joined in NRC-Canada IIT-SEG as a Research Associate after six years of research assistant experience in Bogazici University. His research interests include all aspects of software quality and are focused on software defect prediction models. He is a member of IEEE, IEEE Computer Society and ACM SIGSOFT. Tim Menzies (tim@menzies.us) has been working on advanced modeling, software engineering, and AI since 1986. He received his PhD from the University of New South Wales, Sydney, Australia and is the author of over 160 refereeed papers. A former research chair for NASA, Dr. Menzies is now a associate professor at the West Virginia University’s Lane Department of Computer Science and Electrical Engineering. For more information, visit his web page at . Ayşe B. Bener is an assistant professor and a full time faculty member in the Department of Computer Engineering at Bogazici University. Her research interests are software defect prediction, process improvement and software economics. Bener has a PhD in information systems from the London School of Economics. She is a member of the IEEE, the IEEE Computer Society and the ACM. Justin Di Stefano is currently the Software Technical Lead for Delcan, Inc. in Vienna, Virginia, specializing in transportation management and planning. He earned his Master’s degree in Electrical Engineering (with a specialty area of Software Engineering) from West Virginia University in 2007. Prior to his current employment he worked as a researcher for the WVU/NASA Space Grant program where he helped to develop a spin-off product based upon research into static code metrics and error prone code prediction. His undergraduate degrees are in Electrical Engineering and Computer Engineering, both from West Virginia University, earned in the fall of 2002. He has numerous publications on software error prediction, static code analysis and various machine learning algorithms. 相似文献