首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
(Semi-)automated diagnosis of software faults can drastically increase debugging efficiency, improving reliability and time-to-market. Current automatic diagnosis techniques are predominantly of a statistical nature and, despite typical defect densities, do not explicitly consider multiple faults, as also demonstrated by the popularity of the single-fault benchmark set of programs. We present a reasoning approach, called Zoltar-M(ultiple fault), that yields multiple-fault diagnoses, ranked in order of their probability. Although application of Zoltar-M to programs with many faults requires heuristics (trading-off completeness) to reduce the inherent computational complexity, theory as well as experiments on synthetic program models and multiple-fault program versions available from the software infrastructure repository (SIR) show that for multiple-fault programs this approach can outperform statistical techniques, notably spectrum-based fault localization (SFL). As a side-effect of this research, we present a new SFL variant, called Zoltar-S(ingle fault), that is optimal for single-fault programs, outperforming all other variants known to date.  相似文献   

2.
The increasing level of automation in critical infrastructures requires development of effective ways for finding faults in safety critical software components. Synchronization in concurrent components is especially prone to errors and, due to difficulty of exploring all thread interleavings, it is difficult to find synchronization faults. In this paper we present an experimental study demonstrating the effectiveness of model checking techniques in finding synchronization faults in safety critical software when they are combined with a design for verification approach. We based our experiments on an automated air traffic control software component called the Tactical Separation Assisted Flight Environment (TSAFE). We first reengineered TSAFE using the concurrency controller design pattern. The concurrency controller design pattern enables a modular verification strategy by decoupling the behaviors of the concurrency controllers from the behaviors of the threads that use them using interfaces specified as finite state machines. The behavior of a concurrency controller is verified with respect to arbitrary numbers of threads using the infinite state model checking techniques implemented in the Action Language Verifier (ALV). The threads which use the controller classes are checked for interface violations using the finite state model checking techniques implemented in the Java Path Finder (JPF). We present techniques for thread isolation which enables us to analyze each thread in the program separately during interface verification. We conducted two sets of experiments using these verification techniques. First, we created 40 faulty versions of TSAFE using manual fault seeding. During this exercise we also developed a classification of faults that can be found using the presented design for verification approach. Next, we generated another 100 faulty versions of TSAFE using randomly seeded faults that were created automatically based on this fault classification. We used both infinite and finite state verification techniques for finding the seeded faults. The results of our experiments demonstrate the effectiveness of the presented design for verification approach in eliminating synchronization faults.  相似文献   

3.
Although numerous empirical studies have been conducted to measure the fault detection capability of software analysis methods, few studies have been conducted using programs of similar size and characteristics. Therefore, it is difficult to derive meaningful conclusions on the relative detection ability and cost‐effectiveness of various fault detection methods. In order to compare fault detection capability objectively, experiments must be conducted using the same set of programs to evaluate all methods and must involve participants who possess comparable levels of technical expertise. One such experiment was ‘Conflict1’, which compared voting, a testing method, self‐checks, code reading by stepwise refinement and data‐flow analysis methods on eight versions of a battle simulation program. Since an inspection method was not included in the comparison, the authors conducted a follow‐up experiment ‘Conflict2’, in which five of the eight versions from Conflict1 were subjected to Fagan inspection. Conflict2 examined not only the number and types of faults detected by each method, but also the cost‐effectiveness of each method, by comparing the average amount of effort expended in detecting faults. The primary findings of the Conflict2 experiment are the following. First, voting detected the largest number of faults, followed by the testing method, Fagan inspection, self‐checks, code reading and data‐flow analysis. Second, the voting, testing and inspection methods were largely complementary to each other in the types of faults detected. Third, inspection was far more cost‐effective than the testing method studied. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

4.
This paper proposes a strategy for automatically fixing faults in a program by combining the ideas of mutation and fault localization. Statements ranked in order of their likelihood of containing faults are mutated in the same order to produce potential fixes for the faulty program. The proposed strategy is evaluated using 8 mutant operators against 19 programs each with multiple faulty versions. Our results indicate that 20.70% of the faults are fixed using selected mutant operators, suggesting that the strategy holds merit for automatically fixing faults. The impact of fault localization on efficiency of the overall fault-fixing process is investigated by experimenting with two different techniques, Tarantula and Ochiai, the latter of which has been reported to be better at fault localization than Tarantula, and also proves to be better in the context of fault-fixing using our proposed strategy. Further experiments are also presented to evaluate stopping criteria with respect to the mutant examination process and reveal that a significant fraction of the (fixable) faults can be fixed by examining a small percentage of the program code. We also report on the relative fault-fixing capabilities of mutant operators used and present discussions on future work.  相似文献   

5.
一种基于切片技术度量Java耦合性的框架   总被引:7,自引:0,他引:7  
在研究面向对象的度量问题时,人们通过简单的统计方法和基于信息源的方法来度量其中的一些特征,例如基本度量、CK度量和AoKi度量等。文中采用一种基于程序切片的方法来度量Java的耦合性问题,通过对J ava源程序中存在的耦合关系的度量,得到了一种比传统方法更精确的耦合度量方法。  相似文献   

6.
Applying machine learning to software fault-proneness prediction   总被引:1,自引:0,他引:1  
The importance of software testing to quality assurance cannot be overemphasized. The estimation of a module’s fault-proneness is important for minimizing cost and improving the effectiveness of the software testing process. Unfortunately, no general technique for estimating software fault-proneness is available. The observed correlation between some software metrics and fault-proneness has resulted in a variety of predictive models based on multiple metrics. Much work has concentrated on how to select the software metrics that are most likely to indicate fault-proneness. In this paper, we propose the use of machine learning for this purpose. Specifically, given historical data on software metric values and number of reported errors, an Artificial Neural Network (ANN) is trained. Then, in order to determine the importance of each software metric in predicting fault-proneness, a sensitivity analysis is performed on the trained ANN. The software metrics that are deemed to be the most critical are then used as the basis of an ANN-based predictive model of a continuous measure of fault-proneness. We also view fault-proneness prediction as a binary classification task (i.e., a module can either contain errors or be error-free) and use Support Vector Machines (SVM) as a state-of-the-art classification method. We perform a comparative experimental study of the effectiveness of ANNs and SVMs on a data set obtained from NASA’s Metrics Data Program data repository.  相似文献   

7.
Dynamic slicing algorithms have been considered to aid in debugging for many years. However, as far as we know, no detailed studies on evaluating the benefits of using dynamic slicing for locating real faults present in programs have been carried out. In this paper we study the effectiveness of fault location using dynamic slicing for a set of real bugs reported in some widely used software programs. Our results show that of the 19 faults studied, 12 faults were captured by data slices, 7 required the use of full slices, and none of them required the use of relevant slices. Moreover, it was observed that dynamic slicing considerably reduced the subset of program statements that needed to be examined to locate faulty statements. Interestingly, we observed that all of the memory bugs in the faulty versions were captured by data slices. The dynamic slices that captured faulty code included 0.45 to 63.18% of statements that were executed at least once.
Rajiv Gupta (Corresponding author)Email:
  相似文献   

8.
We compare the effectiveness of four modeling methods—negative binomial regression, recursive partitioning, random forests and Bayesian additive regression trees—for predicting the files likely to contain the most faults for 28 to 35 releases of three large industrial software systems. Predictor variables included lines of code, file age, faults in the previous release, changes in the previous two releases, and programming language. To compare the effectiveness of the different models, we use two metrics—the percent of faults contained in the top 20% of files identified by the model, and a new, more general metric, the fault-percentile-average. The negative binomial regression and random forests models performed significantly better than recursive partitioning and Bayesian additive regression trees, as assessed by either of the metrics. For each of the three systems, the negative binomial and random forests models identified 20% of the files in each release that contained an average of 76% to 94% of the faults.  相似文献   

9.
Software testing forms an integral part of the software development life cycle. Since the objective of testing is to ensure the conformity of an application to its specification, a test “oracle” is needed to determine whether a given test case exposes a fault or not. Using an automated oracle to support the activities of human testers can reduce the actual cost of the testing process and the related maintenance costs. In this paper, we present a new concept of using an artificial neural network as an automated oracle for a tested software system. A neural network is trained by the backpropagation algorithm on a set of test cases applied to the original version of the system. The network training is based on the “black‐box” approach, since only inputs and outputs of the system are presented to the algorithm. The trained network can be used as an artificial oracle for evaluating the correctness of the output produced by new and possibly faulty versions of the software. We present experimental results of using a two‐layer neural network to detect faults within mutated code of a small credit approval application. The results appear to be promising for a wide range of injected faults. ? 2002 John Wiley & Sons, Inc.  相似文献   

10.
The results of an empirical study of software error detection using self checks and N-version voting are presented. Working independently, each of 24 programmers first prepared a set of self checks using just the requirements specification of an aerospace application, and then each added self checks to an existing implementation of that specification. The modified programs were executed to measure the error-detection performance of the checks and to compare this with error detection using simple voting among multiple versions. The analysis of the checks revealed that there are great differences in the ability of individual programmers to design effective checks. It was found that some checks that might have been effective failed to detect an error because they were badly placed, and there were numerous instances of checks signaling nonexistent errors. In general, specification-based checks alone were not as effective as specification-based checks combined with code-based checks. Self checks made it possible to identify faults that had not been detected previously by voting 28 versions of the program over a million randomly generated inputs. This appeared to result from the fact that the self checks could examine the internal state of the executing program, whereas voting examines only final results of computations. If internal states had to be identical in N-version voting systems, then there would be no reason to write multiple versions  相似文献   

11.
目的 客观评价作为图像融合的重要研究领域,是评价融合算法性能的有力工具。目前,已有几十种不同类型的评价指标,但各应用领域包括可见光与红外图像融合,仍缺少统一的选择依据。为了方便比较不同融合算法性能,提出一种客观评价指标的通用分析方法并应用于可见光与红外图像融合。方法 将可见光与红外图像基准数据集中的客观评价指标分为两类,分别是基于融合图像的评价指标与基于源图像和融合图像的评价指标。采用Kendall相关系数分析融合指标间的相关性,聚类得到指标分组;采用Borda计数排序法统计算法的综合排序,分析单一指标排序和综合排序的相关性,得到一致性较高的指标集合;采用离散系数分析指标均值随不同算法的波动程度,选择充分体现不同算法间差异的指标;综合相关性分析、一致性分析及离散系数分析,总结具有代表性的建议指标集合。结果 在13对彩色可见光与红外和8对灰度可见光与红外两组图像源中,分别统计分析不同图像融合算法的客观评价数据,得到可见光与红外图像融合的建议指标集(标准差、边缘保持度),作为融合算法性能评估的重要参考。相较于现有方法,实验覆盖20种融合算法和13种客观评价指标,并且不依赖主观评价结果。结论...  相似文献   

12.
基于程序频谱的动态缺陷定位方法研究   总被引:1,自引:0,他引:1  
陈翔  鞠小林  文万志  顾庆 《软件学报》2015,26(2):390-412
基于程序频谱的动态缺陷定位是软件自动化调试研究中的一个热点问题,通过搜集测试用例的程序频谱和执行结果,基于特定模型以定位缺陷语句在被测程序内的可能位置.对近些年来国内外学者在该研究领域取得的成果进行系统总结:首先,给出预备知识和基本假设;随后,提出缺陷定位研究框架并识别出框架内一系列可影响缺陷定位效果的内在影响因素,包括程序频谱构造方式、测试套件构成和维护、内在缺陷数量、测试用例预言设置、用户反馈和缺陷修复开销等;接着,对实证研究中采用的评测指标和评测程序进行总结和分析;然后,对缺陷定位方法在一些特定测试领域中的应用进行总结;最后,对该领域未来值得关注的研究方向进行了展望.  相似文献   

13.
基于条件概率模型的缺陷定位方法   总被引:1,自引:0,他引:1  
舒挺  黄明献  丁佐华  王磊  夏劲松 《软件学报》2018,29(6):1756-1769
缺陷定位是软件调试的重要阶段,依赖程序频谱信息实现软件缺陷定位,是当前比较行之有效的方法.基于频谱缺陷定位方法应用的前提是,程序频谱和执行结果之间存在的潜在关联.通过经验性分析两者之间的内在关联,借助于统计学的条件概率思想,构建了用以量化分析两者关系强弱的P模型,并基于此提出了基于条件概率的缺陷定位方法.以Siemens套件中的7个程序、Space程序和3个Unix工具程序为基准评测对象,与已有的15种经典缺陷定位方法进行了对比实验.实证研究结果表明,该方法总体上具有更好的缺陷定位效果.  相似文献   

14.
Debugging is crucial for producing reliable software. One of the effective bug localization techniques is spectral‐based fault localization. It tries to locate a buggy statement by applying an evaluation metric to program spectra and ranking program components on the basis of the score it computes. Here, we propose a restricted class of “hyperbolic” metrics, with a small number of numeric parameters. This class of functions is based on past theoretical and empirical results. We show that optimization methods such as genetic programming and simulated annealing can reliably discover effective metrics over a wide range of data sets of program spectra. We evaluate the performance for both real programs and model programs with single bugs, multiple bugs, “deterministic” bugs, and nondeterministic bugs and find that the proposed class of metrics performs as well as or better than the previous best‐performing metrics over a broad range of data.  相似文献   

15.
In this empirical study, we evaluate the extent to which a set of software measures are correlated with the number of faults and the total estimated repair effort for a large software system. The measures we use are basic counts reflecting program size and structure and metrics proposed by McCabe and Halstead. The effect of program size has a major influence on these metrics, and we present a suitable method of adjusting the metrics for size. In modeling faults or repair effort as a function of one variable, a number of measures individually explain approximately one-quarter of the variation observed in the fault data. No one measure does significantly better than size in explaining the variation in faults found across software units, and thus multiple variable models are necessary to find metrics of importance in addition to program size. The “best” multivariate model explains approximately one-half the variation in the fault data. The metrics included in this model (in addition to size) are: the ratio of block comments to total lines of code, the number of decisions per function, and the relative vocabulary of program variables and operators. These metrics have potential for future use in the quality control of software.  相似文献   

16.
缺陷定位是软件开发过程的重要环节。充分利用程序的结构特征和行为特征有助于提高缺陷定位效率。提出一种基于多变量Logistic回归分析的缺陷定位框架, 用于软件演化时对新版本程序进行类方法级别的缺陷定位。首先设计一组度量结构特征和行为特征的指标, 通过静态分析和测试程序搜集并构建旧版本程序的特征数据集, 同时从缺陷跟踪系统获取旧版本缺陷信息;其次, 基于所得特征数据集和缺陷信息, 应用单变量分析筛选出度量指标中与缺陷显著相关的指标, 随后用选中的显著指标展开多变量分析, 训练多变量Logistic模型;最后, 基于选出的显著指标搜集并构建新版本程序的特征数据集, 运用得到的Logistic模型预测每个类方法的出错概率, 进而按出错概率降序检查类方法以定位错误。基于一组开源程序进行缺陷定位实证研究,结果表明, 多变量Logistic模型可以提高缺陷定位的效率。关键词:  相似文献   

17.
ContextSoftware quality attributes are assessed by employing appropriate metrics. However, the choice of such metrics is not always obvious and is further complicated by the multitude of available metrics. To assist metrics selection, several properties have been proposed. However, although metrics are often used to assess successive software versions, there is no property that assesses their ability to capture structural changes along evolution.ObjectiveWe introduce a property, Software Metric Fluctuation (SMF), which quantifies the degree to which a metric score varies, due to changes occurring between successive system's versions. Regarding SMF, metrics can be characterized as sensitive (changes induce high variation on the metric score) or stable (changes induce low variation on the metric score).MethodSMF property has been evaluated by: (a) a case study on 20 OSS projects to assess the ability of SMF to differently characterize different metrics, and (b) a case study on 10 software engineers to assess SMF's usefulness in the metric selection process.ResultsThe results of the first case study suggest that different metrics that quantify the same quality attributes present differences in their fluctuation. We also provide evidence that an additional factor that is related to metrics’ fluctuation is the function that is used for aggregating metric from the micro to the macro level. In addition, the outcome of the second case study suggested that SMF is capable of helping practitioners in metric selection, since: (a) different practitioners have different perception of metric fluctuation, and (b) this perception is less accurate than the systematic approach that SMF offers.ConclusionsSMF is a useful metric property that can improve the accuracy of metrics selection. Based on SMF, we can differentiate metrics, based on their degree of fluctuation. Such results can provide input to researchers and practitioners in their metric selection processes.  相似文献   

18.
System analysts often use software fault prediction models to identify fault-prone modules during the design phase of the software development life cycle. The models help predict faulty modules based on the software metrics that are input to the models. In this study, we consider 20 types of metrics to develop a model using an extreme learning machine associated with various kernel methods. We evaluate the effectiveness of the mode using a proposed framework based on the cost and efficiency in the testing phases. The evaluation process is carried out by considering case studies for 30 object-oriented software systems. Experimental results demonstrate that the application of a fault prediction model is suitable for projects with the percentage of faulty classes below a certain threshold, which depends on the efficiency of fault identification (low: 47.28%; median: 39.24%; high: 25.72%). We consider nine feature selection techniques to remove the irrelevant metrics and to select the best set of source code metrics for fault prediction.  相似文献   

19.
is paper reports on a pioneer effort for the establishment of a software compostie-metric with key capability of distinguishing among different structrues.As a part of this effort most of the previously proposed program control-flow complexity metrics are evaluated.It is obseved that most of these metrics are inhrently limited in distinguishing capability.However,the concept of composite metrics in potentially useful for the development of a practical metrics.This paper presents a methology for the development of a practical composite metric using statistical techniques.The proposed metric differs from all previous metrics in 2 ways:(1)It is based on an overall structural analysis of a given program in deeper and broader context.It captures various structural measurements taken from all existing structural levels;(2)It unifies a set of 19 important structural metrics.The compositing model of these metrics in based on statistical techniques rather than on an arbitrary method.Experinces with the proposd metric clearly indicate that it distinguishes different structures better than the previous metrics.  相似文献   

20.
Software metrics-based quality estimation models can be effective tools for identifying which modules are likely to be fault-prone or not fault-prone. The use of such models prior to system deployment can considerably reduce the likelihood of faults discovered during operations, hence improving system reliability. A software quality classification model is calibrated using metrics from a past release or similar project, and is then applied to modules currently under development. Subsequently, a timely prediction of which modules are likely to have faults can be obtained. However, software quality classification models used in practice may not provide a useful balance between the two misclassification rates, especially when there are very few faulty modules in the system being modeled.This paper presents, in the context of case-based reasoning, two practical classification rules that allow appropriate emphasis on each type of misclassification as per the project requirements. The suggested techniques are especially useful for high-assurance systems where faulty modules are rare. The proposed generalized classification methods emphasize on the costs of misclassifications, and the unbalanced distribution of the faulty program modules. We illustrate the proposed techniques with a case study that consists of software measurements and fault data collected over multiple releases of a large-scale legacy telecommunication system. In addition to investigating the two classification methods, a brief relative comparison of the techniques is also presented. It is indicated that the level of classification accuracy and model-robustness observed for the case study would be beneficial in achieving high software reliability of its subsequent system releases. Similar observations are made from our empirical studies with other case studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号