首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
软件缺陷预测是提升软件质量的有效方法,而软件缺陷预测方法的预测效果与数据集自身的特点有着密切的相关性。针对软件缺陷预测中数据集特征信息冗余、维度过大的问题,结合深度学习对数据特征强大的学习能力,提出了一种基于深度自编码网络的软件缺陷预测方法。该方法首先使用一种基于无监督学习的采样方法对6个开源项目数据集进行采样,解决了数据集中类不平衡问题;然后训练出一个深度自编码网络模型。该模型能对数据集进行特征降维,模型的最后使用了三种分类器进行连接,该模型使用降维后的训练集训练分类器,最后用测试集进行预测。实验结果表明,该方法在维数较大、特征信息冗余的数据集上的预测性能要优于基准的软件缺陷预测模型和基于现有的特征提取方法的软件缺陷预测模型,并且适用于不同分类算法。  相似文献   

2.
The software development life cycle generally includes analysis, design, implementation, test and release phases. The testing phase should be operated effectively in order to release bug-free software to end users. In the last two decades, academicians have taken an increasing interest in the software defect prediction problem, several machine learning techniques have been applied for more robust prediction. A different classification approach for this problem is proposed in this paper. A combination of traditional Artificial Neural Network (ANN) and the novel Artificial Bee Colony (ABC) algorithm are used in this study. Training the neural network is performed by ABC algorithm in order to find optimal weights. The False Positive Rate (FPR) and False Negative Rate (FNR) multiplied by parametric cost coefficients are the optimization task of the ABC algorithm. Software defect data in nature have a class imbalance because of the skewed distribution of defective and non-defective modules, so that conventional error functions of the neural network produce unbalanced FPR and FNR results. The proposed approach was applied to five publicly available datasets from the NASA Metrics Data Program repository. Accuracy, probability of detection, probability of false alarm, balance, Area Under Curve (AUC), and Normalized Expected Cost of Misclassification (NECM) are the main performance indicators of our classification approach. In order to prevent random results, the dataset was shuffled and the algorithm was executed 10 times with the use of n-fold cross-validation in each iteration. Our experimental results showed that a cost-sensitive neural network can be created successfully by using the ABC optimization algorithm for the purpose of software defect prediction.  相似文献   

3.
ContextSeveral issues hinder software defect data including redundancy, correlation, feature irrelevance and missing samples. It is also hard to ensure balanced distribution between data pertaining to defective and non-defective software. In most experimental cases, data related to the latter software class is dominantly present in the dataset.ObjectiveThe objectives of this paper are to demonstrate the positive effects of combining feature selection and ensemble learning on the performance of defect classification. Along with efficient feature selection, a new two-variant (with and without feature selection) ensemble learning algorithm is proposed to provide robustness to both data imbalance and feature redundancy.MethodWe carefully combine selected ensemble learning models with efficient feature selection to address these issues and mitigate their effects on the defect classification performance.ResultsForward selection showed that only few features contribute to high area under the receiver-operating curve (AUC). On the tested datasets, greedy forward selection (GFS) method outperformed other feature selection techniques such as Pearson’s correlation. This suggests that features are highly unstable. However, ensemble learners like random forests and the proposed algorithm, average probability ensemble (APE), are not as affected by poor features as in the case of weighted support vector machines (W-SVMs). Moreover, the APE model combined with greedy forward selection (enhanced APE) achieved AUC values of approximately 1.0 for the NASA datasets: PC2, PC4, and MC1.ConclusionThis paper shows that features of a software dataset must be carefully selected for accurate classification of defective components. Furthermore, tackling the software data issues, mentioned above, with the proposed combined learning model resulted in remarkable classification performance paving the way for successful quality control.  相似文献   

4.
软件缺陷预测已成为软件工程的重要研究课题,构造了一个基于粗糙集和支持向量机的软件缺陷预测模型。该模型通过粗糙集对原样本集进行属性约减,去掉冗余的和与缺陷预测无关的属性,利用粒子群对支持向量机的参数做选择。实验数据来源于NASA公共数据集,通过属性约减,特征属性由21个约减为5个。实验表明,属性约减后,Bayes分类器、CART树、神经网络和本文提出的粗糙集—支持向量机模型的预测性能均有所提高,本文提出的粗糙集支持向量机的预测性能好于其他三个模型。  相似文献   

5.
软件缺陷预测有助于提高软件开发质量,保证测试资源有效分配。针对软件缺陷预测研究中类标签数据难以获取和类不平衡分布问题,提出基于采样的半监督支持向量机预测模型。该模型采用无监督的采样技术,确保带标签样本数据中缺陷样本数量不会过低,使用半监督支持向量机方法,在少量带标签样本数据基础上利用无标签数据信息构建预测模型;使用公开的NASA软件缺陷预测数据集进行仿真实验。实验结果表明提出的方法与现有半监督方法相比,在综合评价指标[F]值和召回率上均优于现有方法;与有监督方法相比,能在学习样本较少的情况下取得相当的预测性能。  相似文献   

6.
为降低软件缺陷率,对现有的缺陷预测模型进行了优化,同时引入正交缺陷分类方法,并对该方法加以改进,使其能够支持缺陷的原因分析,将缺陷预测与改进的正交缺陷分类方法结合起来,形成一套软件缺陷预防流程并应用在实际项目中.实验结果表明,该成果可以在软件生命周期的各个阶段有效预防缺陷,最大限度地提高软件质量.  相似文献   

7.
Software health management (SWHM) is an emerging field which addresses the critical need to detect, diagnose, predict, and mitigate adverse events due to software faults and failures. These faults could arise for numerous reasons including coding errors, unanticipated faults or failures in hardware, or problematic interactions with the external environment. This paper demonstrates a novel approach to software health management based on a rigorous Bayesian formulation that monitors the behavior of software and operating system, performs probabilistic diagnosis, and provides information about the most likely root causes of a failure or software problem. Translation of the Bayesian network model into an efficient data structure, an arithmetic circuit, makes it possible to perform SWHM on resource-restricted embedded computing platforms as found in aircraft, unmanned aircraft, or satellites. SWHM is especially important for safety critical systems such as aircraft control systems. In this paper, we demonstrate our Bayesian SWHM system on three realistic scenarios from an aircraft control system: (1) aircraft file-system based faults, (2) signal handling faults, and (3) navigation faults due to inertial measurement unit (IMU) failure or compromised Global Positioning System (GPS) integrity. We show that the method successfully detects and diagnoses faults in these scenarios. We also discuss the importance of verification and validation of SWHM systems.  相似文献   

8.
Much current software defect prediction work focuses on the number of defects remaining in a software system. In this paper, we present association rule mining based methods to predict defect associations and defect correction effort. This is to help developers detect software defects and assist project managers in allocating testing resources more effectively. We applied the proposed methods to the SEL defect data consisting of more than 200 projects over more than 15 years. The results show that, for defect association prediction, the accuracy is very high and the false-negative rate is very low. Likewise, for the defect correction effort prediction, the accuracy for both defect isolation effort prediction and defect correction effort prediction are also high. We compared the defect correction effort prediction method with other types of methods - PART, C4.5, and Naive Bayes - and show that accuracy has been improved by at least 23 percent. We also evaluated the impact of support and confidence levels on prediction accuracy, false-negative rate, false-positive rate, and the number of rules. We found that higher support and confidence levels may not result in higher prediction accuracy, and a sufficient number of rules is a precondition for high prediction accuracy.  相似文献   

9.
软件缺陷检测旨在自动检测程序模块中是否包含缺陷,从而加速软件测试过程,提高软件系统的质量。针对传统软件缺陷预测模型被限制在一定的应用范围而影响其预测的准确性和适用性,提出了一种基于PSO-BP软件缺陷预测模型。该模型运用粒子群优化算法优化BP神经网络的权值和阈值,采用交叉验证的方式进行实验,并与传统的机器学习方法J48和BP神经网络等方法进行了比较。实验结果表明提出的方法具有较高的预测准确性。  相似文献   

10.
数据的不平衡性是软件缺陷预测研究中一个严峻且无法规避的问题,为了解决这一问题,本文提出一种利用分布函数合成新样本的过抽样和随机向下抽样相结合的算法。该算法首先对降维后的主成分进行分布函数拟合,然后利用分布函数生成随机数,并对生成的随机数进行筛选,最后与随机向下抽样相结合。实验所用数据取自NASA MDP数据集,并与经典的SMOTE 向下抽样方法进行对比,从G-mean和F-measure值可以看出前者的预测结果明显优于后者,预测精度更高。  相似文献   

11.
软件缺陷集成预测模型研究   总被引:1,自引:0,他引:1  
利用单一分类器构造的缺陷预测模型已经遇到了性能瓶颈, 而集成分类器相比单一分类器往往具有显著的性能优势。以构造高效的集成缺陷预测模型为出发点, 比较了七种不同类型集成分类器的算法和特点。在14个基准数据集上的实验显示, 部分集成预测模型的性能优于基于朴素贝叶斯的单一预测模型。其中, 基于投票的集成分类框架具有最优的预测性能以及统计学意义上的性能优势显著性, 随机森林算法次之。Stacking集成框架也具有较强的泛化能力。  相似文献   

12.
In this paper, we consider the problem of predicting a large scale spatial field using successive noisy measurements obtained by mobile sensing agents. The physical spatial field of interest is discretized and modeled by a Gaussian Markov random field (GMRF) with uncertain hyperparameters. From a Bayesian perspective, we design a sequential prediction algorithm to exactly compute the predictive inference of the random field. The main advantages of the proposed algorithm are: (1) the computational efficiency due to the sparse structure of the precision matrix, and (2) the scalability as the number of measurements increases. Thus, the prediction algorithm correctly takes into account the uncertainty in hyperparameters in a Bayesian way and is also scalable to be usable for mobile sensor networks with limited resources. We also present a distributed version of the prediction algorithm for a special case. An adaptive sampling strategy is presented for mobile sensing agents to find the most informative locations in taking future measurements in order to minimize the prediction error and the uncertainty in hyperparameters simultaneously. The effectiveness of the proposed algorithms is illustrated by numerical experiments.  相似文献   

13.
Discrimination in decision making is prohibited on many attributes (religion, gender, etc…), but often present in historical decisions. Use of such discriminatory historical decision making as training data can perpetuate discrimination, even if the protected attributes are not directly present in the data. This work focuses on discovering discrimination in instances and preventing discrimination in classification. First, we propose a discrimination discovery method based on modeling the probability distribution of a class using Bayesian networks. This measures the effect of a protected attribute (e.g., gender) in a subset of the dataset using the estimated probability distribution (via a Bayesian network). Second, we propose a classification method that corrects for the discovered discrimination without using protected attributes in the decision process. We evaluate the discrimination discovery and discrimination prevention approaches on two different datasets. The empirical results show that a substantial amount of discrimination identified in instances is prevented in future decisions.  相似文献   

14.
Image interpretation using Bayesian networks   总被引:2,自引:0,他引:2  
The problem of image interpretation is one of inference with the help of domain knowledge. In this paper, we formulate the problem as the maximum a posteriori (MAP) estimate of a properly defined probability distribution function (PDF). We show that a Bayesian network can be used to represent this PDF as well as the domain knowledge needed for interpretation. The Bayesian network may be relaxed to obtain the set of optimum interpretations  相似文献   

15.
Reforestation planning using Bayesian networks   总被引:2,自引:0,他引:2  
The aim of this research was to construct a reforestation model for woodland located in the basin of the river Liébana (NW Spain). This is essentially a pattern recognition problem: the class labels are types of woodland, and the variables for each point are environmental coordinates (referring to altitude, slope, rainfall, lithology, etc.). The model trained using data for existing wooded areas will serve as a guideline for the reforestation of deforested areas. Nonetheless, with a view to tackling reforestation from a more informed perspective, of interest is an interpretable model of relationships existing not just between woodland type and environmental variables but also between and among the environmental variables themselves. For this reason we used Bayesian networks, as a tool that is capable of constructing a causal model of the relationships existing between all the variables represented in the model. The prediction results obtained were compared with those for classical linear techniques, neural networks and support vector machines.  相似文献   

16.
提出基于改进的粒子群优化支持向量机方法(PSO-ISVM)的测控软件缺陷预测方法。通过引入代价惩罚系数,定义粒子群优化算法中的适应度函数,利用最小化适应度函数值作为优化目标,排除大量的冗余干扰信息,提高对测控软件有缺陷模块的预测准确度,寻找支持向量机的最优参数。通过仿真实例分析测控软件有效性,并与常用缺陷预测方法进行比较,表明该模型能加快软件缺陷预测速度和提高对有缺陷模块的预测准确度。  相似文献   

17.
基于预测关系的贝叶斯网络学习算法   总被引:2,自引:0,他引:2       下载免费PDF全文
在介绍有代表性的贝叶斯网络结构学习算法基础上,给出了变量之间预测能力的概念及估计方法,并证明了预测能力就是预测正确率,在此基础上建立了基于变量之间预测关系的贝叶斯网络结构学习方法,并使用模拟数据进行了对比实验,实验结果显示该算法能够有效地进行贝叶斯网络结构学习。  相似文献   

18.
采集软件研发过程中可能与缺陷有关的过程数据或产品数据,对软件缺陷数量进行预测,达到对软件质量的把控。采用LASSO进行特征值选择确定最佳影响因子集合,采用线性模型和贝叶斯网络模型分别对样本数据进行预测,说明两种模型的因子分析过程和模型构建过程,采用R语言进行编码实现。通过预测结果的对比验证了当数据经过二次主观加工后,采用线性模型的预测结果比贝叶斯网络预测结果更准确。  相似文献   

19.
A Bayesian network is a powerful graphical model. It is advantageous for real-world data analysis and finding relations among variables. Knowledge presentation and rule generation, based on a Bayesian approach, have been studied and reported in many research papers across various fields. Since a Bayesian network has both causal and probabilistic semantics, it is regarded as an ideal representation to combine background knowledge and real data. Rare event predictions have been performed using several methods, but remain a challenge. We design and implement a Bayesian network model to forecast daily ozone states. We evaluate the proposed Bayesian network model, comparing it to traditional decision tree models, to examine its utility.  相似文献   

20.
缺陷预测能够有效地提升软件测试的效率。基于朴素贝叶斯理论,提出了一个利用平面中点与直线几何关系进行分类的软件缺陷预测模型LGD-NB。LGD-NB有两种工作模式,当其基于最小风险进行决策时,比传统的朴素贝叶斯具有对代价更为精确的描述;在定义了几何上的高风险决策区域后,LGD-NB可作为元分类器,提供一个可集成其他分类模型进行二次分类的集成框架。实验结果显示:基于最小风险LGD-NB模型的预测性能优于传统的朴素贝叶斯;而集成了SVM算法后的LGD-NB,其预测能力也有较为明显的提升。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号