共查询到20条相似文献,搜索用时 31 毫秒
1.
Context
Software defect prediction studies usually built models using within-company data, but very few focused on the prediction models trained with cross-company data. It is difficult to employ these models which are built on the within-company data in practice, because of the lack of these local data repositories. Recently, transfer learning has attracted more and more attention for building classifier in target domain using the data from related source domain. It is very useful in cases when distributions of training and test instances differ, but is it appropriate for cross-company software defect prediction?Objective
In this paper, we consider the cross-company defect prediction scenario where source and target data are drawn from different companies. In order to harness cross company data, we try to exploit the transfer learning method to build faster and highly effective prediction model.Method
Unlike the prior works selecting training data which are similar from the test data, we proposed a novel algorithm called Transfer Naive Bayes (TNB), by using the information of all the proper features in training data. Our solution estimates the distribution of the test data, and transfers cross-company data information into the weights of the training data. On these weighted data, the defect prediction model is built.Results
This article presents a theoretical analysis for the comparative methods, and shows the experiment results on the data sets from different organizations. It indicates that TNB is more accurate in terms of AUC (The area under the receiver operating characteristic curve), within less runtime than the state of the art methods.Conclusion
It is concluded that when there are too few local training data to train good classifiers, the useful knowledge from different-distribution training data on feature level may help. We are optimistic that our transfer learning method can guide optimal resource allocation strategies, which may reduce software testing cost and increase effectiveness of software testing process. 相似文献2.
New methodologies and tools have gradually made the life cycle for software development more human-independent. Much of the
research in this field focuses on defect reduction, defect identification and defect prediction. Defect prediction is a relatively
new research area that involves using various methods from artificial intelligence to data mining. Identifying and locating
defects in software projects is a difficult task. Measuring software in a continuous and disciplined manner provides many
advantages such as the accurate estimation of project costs and schedules as well as improving product and process qualities.
This study aims to propose a model to predict the number of defects in the new version of a software product with respect
to the previous stable version. The new version may contain changes related to a new feature or a modification in the algorithm
or bug fixes. Our proposed model aims to predict the new defects introduced into the new version by analyzing the types of
changes in an objective and formal manner as well as considering the lines of code (LOC) change. Defect predictors are helpful
tools for both project managers and developers. Accurate predictors may help reducing test times and guide developers towards
implementing higher quality codes. Our proposed model can aid software engineers in determining the stability of software
before it goes on production. Furthermore, such a model may provide useful insight for understanding the effects of a feature,
bug fix or change in the process of defect detection.
相似文献
Ayşe Basar BenerEmail: |
3.
4.
IntroductionTesting and modification of software are repetitiveprocesses.When to release and implement the qualifiedsoftware product is an important question.The purposeof residual defects'prediction is to keep the code de-fects number under the acceptable level in testing times.It is very important for a decision maker to estimate thephase of software testing and the achievable object.It issignificant for maintenance of delivered software.1Software residual defects predic-tion modelSoftware… 相似文献
5.
A novel transfer learning method is proposed in this paper to solve the power load forecast problems in the smart grid. Prediction errors of the target tasks can be greatly reduced by utilizing the knowledge transferred from the source tasks. In this work, a source task selection algorithm is developed and the transfer learning model based on Gaussian process is constructed. Negative knowledge transfers are avoided compared with the previous works, and therefore the prediction accuracies are greatly improved. In addition, a fast inference algorithm is developed to accelerate the prediction steps. The results of the experiments with real world data are illustrated. 相似文献
6.
The resources allocated for software quality assurance and improvement have not increased with the ever-increasing need for
better software quality. A targeted software quality inspection can detect faulty modules and reduce the number of faults
occurring during operations. We present a software fault prediction modeling approach with case-based reasoning (CBR), a part
of the computational intelligence field focusing on automated reasoning processes. A CBR system functions as a software fault
prediction model by quantifying, for a module under development, the expected number of faults based on similar modules that
were previously developed. Such a system is composed of a similarity function, the number of nearest neighbor cases used for
fault prediction, and a solution algorithm. The selection of a particular similarity function and solution algorithm may affect
the performance accuracy of a CBR-based software fault prediction system. This paper presents an empirical study investigating
the effects of using three different similarity functions and two different solution algorithms on the prediction accuracy
of our CBR system. The influence of varying the number of nearest neighbor cases on the performance accuracy is also explored.
Moreover, the benefits of using metric-selection procedures for our CBR system is also evaluated. Case studies of a large
legacy telecommunications system are used for our analysis. It is observed that the CBR system using the Mahalanobis distance
similarity function and the inverse distance weighted solution algorithm yielded the best fault prediction. In addition, the
CBR models have better performance than models based on multiple linear regression.
Taghi M. Khoshgoftaar is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the
Empirical Software Engineering Laboratory. His research interests are in software engineering, software metrics, software
reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, and statistical
modeling. He has published more than 200 refereed papers in these areas. He has been a principal investigator and project
leader in a number of projects with industry, government, and other research-sponsoring agencies. He is a member of the Association
for Computing Machinery, the IEEE Computer Society, and IEEE Reliability Society. He served as the general chair of the 1999
International Symposium on Software Reliability Engineering (ISSRE’99), and the general chair of the 2001 International Conference
on Engineering of Computer Based Systems. Also, he has served on technical program committees of various international conferences,
symposia, and workshops. He has served as North American editor of the Software Quality Journal, and is on the editorial boards
of the journals Empirical Software Engineering, Software Quality, and Fuzzy Systems.
Naeem Seliya received the M.S. degree in Computer Science from Florida Atlantic University, Boca Raton, FL, USA, in 2001. He is currently
a Ph.D. candidate in the Department of Computer Science and Engineering at Florida Atlantic University. His research interests
include software engineering, computational intelligence, data mining, software measurement, software reliability and quality
engineering, software architecture, computer data security, and network intrusion detection. He is a student member of the
IEEE Computer Society and the Association for Computing Machinery. 相似文献
7.
8.
In transfer learning the aim is to solve new learning tasks using fewer examples by using information gained from solving related tasks. Existing transfer learning methods have been used successfully in practice and PAC analysis of these methods have been developed. But the key notion of relatedness between tasks has not yet been defined clearly, which makes it difficult to understand, let alone answer, questions that naturally arise in the context of transfer, such as, how much information to transfer, whether to transfer information, and how to transfer information across tasks. In this paper, we look at transfer learning from the perspective of Algorithmic Information Theory/Kolmogorov complexity theory, and formally solve these problems in the same sense Solomonoff Induction solves the problem of inductive inference. We define universal measures of relatedness between tasks, and use these measures to develop universally optimal Bayesian transfer learning methods. We also derive results in AIT that are interesting by themselves. To address a concern that arises from the theory, we also briefly look at the notion of Kolmogorov complexity of probability measures. Finally, we present a simple practical approximation to the theory to do transfer learning and show that even these are quite effective, allowing us to transfer across tasks that are superficially unrelated. The latter is an experimental feat which has not been achieved before, and thus shows the theory is also useful in constructing practical transfer algorithms. 相似文献
9.
We study the possibility of constructing decision trees with evolutionary algorithms in order to increase their predictive accuracy. We present a self-adapting evolutionary algorithm for the induction of decision trees and describe the principle of decision making based on multiple evolutionary induced decision trees—decision forest. The developed model is used as a fault predictive approach to foresee dangerous software modules, which identification can largely enhance the reliability of software. 相似文献
10.
Software quality engineering comprises of several quality assurance activities such as testing, formal verification, inspection, fault tolerance, and software fault prediction. Until now, many researchers developed and validated several fault prediction models by using machine learning and statistical techniques. There have been used different kinds of software metrics and diverse feature reduction techniques in order to improve the models’ performance. However, these studies did not investigate the effect of dataset size, metrics set, and feature selection techniques for software fault prediction. This study is focused on the high-performance fault predictors based on machine learning such as Random Forests and the algorithms based on a new computational intelligence approach called Artificial Immune Systems. We used public NASA datasets from the PROMISE repository to make our predictive models repeatable, refutable, and verifiable. The research questions were based on the effects of dataset size, metrics set, and feature selection techniques. In order to answer these questions, there were defined seven test groups. Additionally, nine classifiers were examined for each of the five public NASA datasets. According to this study, Random Forests provides the best prediction performance for large datasets and Naive Bayes is the best prediction algorithm for small datasets in terms of the Area Under Receiver Operating Characteristics Curve (AUC) evaluation parameter. The parallel implementation of Artificial Immune Recognition Systems (AIRS2Parallel) algorithm is the best Artificial Immune Systems paradigm-based algorithm when the method-level metrics are used. 相似文献
11.
BackgroundSoftware fault prediction is the process of developing models that can be used by the software practitioners in the early phases of software development life cycle for detecting faulty constructs such as modules or classes. There are various machine learning techniques used in the past for predicting faults.MethodIn this study we perform a systematic review of studies from January 1991 to October 2013 in the literature that use the machine learning techniques for software fault prediction. We assess the performance capability of the machine learning techniques in existing research for software fault prediction. We also compare the performance of the machine learning techniques with the statistical techniques and other machine learning techniques. Further the strengths and weaknesses of machine learning techniques are summarized.ResultsIn this paper we have identified 64 primary studies and seven categories of the machine learning techniques. The results prove the prediction capability of the machine learning techniques for classifying module/class as fault prone or not fault prone. The models using the machine learning techniques for estimating software fault proneness outperform the traditional statistical models.ConclusionBased on the results obtained from the systematic review, we conclude that the machine learning techniques have the ability for predicting software fault proneness and can be used by software practitioners and researchers. However, the application of the machine learning techniques in software fault prediction is still limited and more number of studies should be carried out in order to obtain well formed and generalizable results. We provide future guidelines to practitioners and researchers based on the results obtained in this work. 相似文献
12.
《Information Processing Letters》2014,114(9):469-474
This paper analyzes the ability of requirement metrics for software defect prediction. Statistical significance tests are used to compare six machine learning algorithms on the requirement metrics, design metrics, and combination of both metrics in our analysis. The experimental results show the effectiveness of the predictor built on the combination of the requirement and design metrics in the early phase of the software development process. 相似文献
13.
Time series prediction for higher future horizons is of great importance and has increasingly aroused interest among both scholars and practitioners. Compared to one-step-ahead prediction, multi-step-ahead prediction encounters higher dose of uncertainty arising from various facets, including accumulation of errors and lack of information. Many existing studies draw attention to the former issue, while relatively overlook the latter one. Inspired by this discovery, a new multi-task learning algorithm, called the MultiTL-KELM algorithm for short, is proposed for multi-step-ahead time series prediction in this work, where the long-ago data is utilized to provide more information for the current prediction task. The time-varying quality of time-series data usually gives rise to a wide variability between data over long time span, making it difficult to ensure the assumption of identical distribution. How to make the most of, rather than discard the abundant old data, and transfer more useful knowledge to current prediction is one of the main concerns of our proposed MultiTL-KELM algorithm. Besides, unlike typical iterated or direct strategies, MultiTL-KELM regards predictions of different horizons as different tasks. Knowledge from one task can benefit others, enabling it to explore the relatedness among horizons. Based upon its design scheme, MultiTL-KELM alleviates the accumulation error problem of iterated strategy and the time consuming of direct strategies. The proposed MultiTL-KELM algorithm has been compared with several other state-of-the-art algorithms, and its effectiveness has been numerically confirmed by the experiments we conducted on four synthetic and two real-world benchmark time series datasets. 相似文献
14.
为了提高软件缺陷预测的准确率,利用布谷鸟搜索算法(Cuckoo Search,CS)的寻优能力和人工神经网络算法(Artificial Neural Network,ANN)的非线性计算能力,提出了基于CS-ANN的软件缺陷预测方法。此方法首先使用基于关联规则的特征选择算法降低数据的维度,去除了噪声属性;利用布谷鸟搜索算法寻找神经网络算法的权值,然后使用权值和神经网络算法构建出预测模型;最后使用此模型完成缺陷预测。使用公开的NASA数据集进行仿真实验,结果表明该模型降低了误报率并提高了预测的准确率,综合评价指标AUC(area under the ROC curve)、F1值和G-mean都优于现有模型。 相似文献
15.
In complex working site, bearings used as the important part of machine, could simultaneously have faults on several positions. Consequently, multi-label learning approach considering fully the correlation between different faulted positions of bearings becomes the popular learning pattern. Deep reinforcement learning (DRL) combining the perception ability of deep learning and the decision-making ability of reinforcement learning, could be adapted to the compound fault diagnosis while having a strong ability extracting the fault feature from the raw data. However, DRL is difficult to converge and easily falls into the unstable training problem. Therefore, this paper integrates the feature extraction ability of DRL and the knowledge transfer ability of transfer learning (TL), and proposes the multi-label transfer reinforcement learning (ML-TRL). In detail, the proposed method utilizes the improved trust region policy optimization (TRPO) as the basic DRL framework and pre-trains the fixed convolutional networks of ML-TRL using the multi-label convolutional neural network method. In compound fault experiment, the final results demonstrate powerfully that the proposed method could have the higher accuracy than other multi-label learning methods. Hence, the proposed method is a remarkable alternative when recognizing the compound fault of bearings. 相似文献
16.
ContextSoftware defect prediction (SDP) is an important task in software engineering. Along with estimating the number of defects remaining in software systems and discovering defect associations, classifying the defect-proneness of software modules plays an important role in software defect prediction. Several machine-learning methods have been applied to handle the defect-proneness of software modules as a classification problem. This type of “yes” or “no” decision is an important drawback in the decision-making process and if not precise may lead to misclassifications. To the best of our knowledge, existing approaches rely on fully automated module classification and do not provide a way to incorporate extra knowledge during the classification process. This knowledge can be helpful in avoiding misclassifications in cases where system modules cannot be classified in a reliable way.ObjectiveWe seek to develop a SDP method that (i) incorporates a reject option in the classifier to improve the reliability in the decision-making process; and (ii) makes it possible postpone the final decision related to rejected modules for an expert analysis or even for another classifier using extra domain knowledge.MethodWe develop a SDP method called rejoELM and its variant, IrejoELM. Both methods were built upon the weighted extreme learning machine (ELM) with reject option that makes it possible postpone the final decision of non-classified modules, the rejected ones, to another moment. While rejoELM aims to maximize the accuracy for a rejection rate, IrejoELM maximizes the F-measure. Hence, IrejoELM becomes an alternative for classification with reject option for imbalanced datasets.ResultsrejoEM and IrejoELM are tested on five datasets of source code metrics extracted from real world open-source software projects. Results indicate that rejoELM has an accuracy for several rejection rates that is comparable to some state-of-the-art classifiers with reject option. Although IrejoELM shows lower accuracies for several rejection rates, it clearly outperforms all other methods when the F-measure is used as a performance metric.ConclusionIt is concluded that rejoELM is a valid alternative for classification with reject option problems when classes are nearly equally represented. On the other hand, IrejoELM is shown to be the best alternative for classification with reject option on imbalanced datasets. Since SDP problems are usually characterized as imbalanced learning problems, the use of IrejoELM is recommended. 相似文献
17.
晏明 《计算机应用与软件》2014,(11)
影响软件质量的因素除了开发方式多种多样外,还受其他因素影响。对于多阶段、不断开发、不断测试的软件开发项目,跟踪项目整体的测试质量对项目的质量控制有重要意义。研究发现软件开发项目中测试出的缺陷累计值的时间曲线基本符合Logistic与Gompertz函数曲线。采用VBA编程,遍历所有实测数据的三点可求解出实测数据分别与两条函数曲线拟合度最好(最小2乘法)的三个曲线参数(L,b,a)。其中Logistic曲线的L值(即饱和值)可用于预测软件开发项目系统稳定时的缺陷累计值。通过分析软件项目开发中及系统发布运行后的累计缺陷的实测值与函数曲线(三个参数决定的曲线)的预测值,发现该函数曲线可用于预测及监控软件开发过程中及系统发布后的软件质量。 相似文献
18.
Noise detection for software measurement datasets is a topic of growing interest. The presence of class and attribute noise in software measurement datasets degrades the performance of machine learning-based classifiers, and the identification of these noisy modules improves the overall performance. In this study, we propose a noise detection algorithm based on software metrics threshold values. The threshold values are obtained from the Receiver Operating Characteristic (ROC) analysis. This paper focuses on case studies of five public NASA datasets and details the construction of Naive Bayes-based software fault prediction models both before and after applying the proposed noise detection algorithm. Experimental results show that this noise detection approach is very effective for detecting the class noise and that the performance of fault predictors using a Naive Bayes algorithm with a logNum filter improves if the class labels of identified noisy modules are corrected. 相似文献
19.
20.
Burak Turhan Tim Menzies Ayşe B. Bener Justin Di Stefano 《Empirical Software Engineering》2009,14(5):540-578
We propose a practical defect prediction approach for companies that do not track defect related data. Specifically, we investigate
the applicability of cross-company (CC) data for building localized defect predictors using static code features. Firstly,
we analyze the conditions, where CC data can be used as is. These conditions turn out to be quite few. Then we apply principles
of analogy-based learning (i.e. nearest neighbor (NN) filtering) to CC data, in order to fine tune these models for localization.
We compare the performance of these models with that of defect predictors learned from within-company (WC) data. As expected,
we observe that defect predictors learned from WC data outperform the ones learned from CC data. However, our analyses also
yield defect predictors learned from NN-filtered CC data, with performance close to, but still not better than, WC data. Therefore,
we perform a final analysis for determining the minimum number of local defect reports in order to learn WC defect predictors.
We demonstrate in this paper that the minimum number of data samples required to build effective defect predictors can be
quite small and can be collected quickly within a few months. Hence, for companies with no local defect data, we recommend
a two-phase approach that allows them to employ the defect prediction process instantaneously. In phase one, companies should
use NN-filtered CC data to initiate the defect prediction process and simultaneously start collecting WC (local) data. Once
enough WC data is collected (i.e. after a few months), organizations should switch to phase two and use predictors learned
from WC data.
Burak Turhan received his PhD degree from the department of Computer Engineering at Bogazici University. He recently joined in NRC-Canada IIT-SEG as a Research Associate after six years of research assistant experience in Bogazici University. His research interests include all aspects of software quality and are focused on software defect prediction models. He is a member of IEEE, IEEE Computer Society and ACM SIGSOFT. Tim Menzies (tim@menzies.us) has been working on advanced modeling, software engineering, and AI since 1986. He received his PhD from the University of New South Wales, Sydney, Australia and is the author of over 160 refereeed papers. A former research chair for NASA, Dr. Menzies is now a associate professor at the West Virginia University’s Lane Department of Computer Science and Electrical Engineering. For more information, visit his web page at . Ayşe B. Bener is an assistant professor and a full time faculty member in the Department of Computer Engineering at Bogazici University. Her research interests are software defect prediction, process improvement and software economics. Bener has a PhD in information systems from the London School of Economics. She is a member of the IEEE, the IEEE Computer Society and the ACM. Justin Di Stefano is currently the Software Technical Lead for Delcan, Inc. in Vienna, Virginia, specializing in transportation management and planning. He earned his Master’s degree in Electrical Engineering (with a specialty area of Software Engineering) from West Virginia University in 2007. Prior to his current employment he worked as a researcher for the WVU/NASA Space Grant program where he helped to develop a spin-off product based upon research into static code metrics and error prone code prediction. His undergraduate degrees are in Electrical Engineering and Computer Engineering, both from West Virginia University, earned in the fall of 2002. He has numerous publications on software error prediction, static code analysis and various machine learning algorithms. 相似文献
Justin Di StefanoEmail: |
Burak Turhan received his PhD degree from the department of Computer Engineering at Bogazici University. He recently joined in NRC-Canada IIT-SEG as a Research Associate after six years of research assistant experience in Bogazici University. His research interests include all aspects of software quality and are focused on software defect prediction models. He is a member of IEEE, IEEE Computer Society and ACM SIGSOFT. Tim Menzies (tim@menzies.us) has been working on advanced modeling, software engineering, and AI since 1986. He received his PhD from the University of New South Wales, Sydney, Australia and is the author of over 160 refereeed papers. A former research chair for NASA, Dr. Menzies is now a associate professor at the West Virginia University’s Lane Department of Computer Science and Electrical Engineering. For more information, visit his web page at . Ayşe B. Bener is an assistant professor and a full time faculty member in the Department of Computer Engineering at Bogazici University. Her research interests are software defect prediction, process improvement and software economics. Bener has a PhD in information systems from the London School of Economics. She is a member of the IEEE, the IEEE Computer Society and the ACM. Justin Di Stefano is currently the Software Technical Lead for Delcan, Inc. in Vienna, Virginia, specializing in transportation management and planning. He earned his Master’s degree in Electrical Engineering (with a specialty area of Software Engineering) from West Virginia University in 2007. Prior to his current employment he worked as a researcher for the WVU/NASA Space Grant program where he helped to develop a spin-off product based upon research into static code metrics and error prone code prediction. His undergraduate degrees are in Electrical Engineering and Computer Engineering, both from West Virginia University, earned in the fall of 2002. He has numerous publications on software error prediction, static code analysis and various machine learning algorithms. 相似文献