首页 | 官方网站   微博 | 高级检索  
 共查询到19条相似文献,搜索用时 250 毫秒
自动化测试工具是安卓应用质量保障的主要手段.随着安卓版本多样性、底层硬件差异性(碎片化)以及逻辑复杂性增加,自动化测试迎来新的挑战.为解决这些问题,近年来,业界开发出大量自动化测试工具.但是现有工具数量多,并且测试重点多样,测试人员选择工具时存在一定的困扰.为帮助测试人员选择最佳测试工具,实现对自动化测试工具的统一评估,提出了面向安卓自动化测试工具多特征综合评估方法(comprehensiveevaluation of Android automated testing, CEAT),并将其实现为便于测试人员使用的平台. CEAT在引入测试领域广泛接受的3个评估指标,即代码覆盖率、异常检出率、融合多版本兼容度得分的基础上,进一步基于变异测试的思想引入变异杀死率,并从用户体验出发引入UI控件覆盖率.以上5个指标构成CEAT整个体系,从而实现安卓自动化测试工具的综合多维评估.为验证CEAT的效果,生成了1 089个变异应用的待测应用集,在包含6个移动设备的真机集群中部署实验,对5个自动化测试工具适配并执行5 040次测试任务.最终结果表明:i) 5个指标从不同角度对自动化测试工具进行评估,...  相似文献   

变异测试:原理、优化和应用   总被引:1,自引:0,他引:1  
变异测试是一种基于缺陷的软件测试技术,在近四十年得到国内外学者的广泛关注,并取得了一些研究成果。对已有的研究工作进行总结,将其分为变异测试原理、优化和应用三个模块。其中在变异测试原理模块中,给出变异测试的基本假设,对变异测试分析流程进行介绍,并对其中的重要概念依次给出定义,从静态检测和动态检测两个角度对等价变异体检测技术进行总结。在变异测试优化模块中,从变异体选择优化和变异体执行优化两个角度对已有研究工作进行总结。在变异测试应用模块中,选择了测试用例集充分性评估、测试用例生成和回归测试三个应用领域,对研究工作进行分类总结。最后对变异测试的未来研究方向进行了展望。  相似文献   

深度学习软件的结构特征与传统软件存在明显差异,因此即使展开了大量测试,依然无法有效衡量测试数据对深度学习软件的覆盖情况和测试充分性,并造成后续使用过程中依然可能存在大量未知错误.深度森林是一种新型深度学习模型,其克服了深度神经网络存在的一些缺点,例如:需要大量训练数据、需要高算力平台、需要大量超参数.但目前还没有相关工作对深度森林的测试方法进行研究.针对深度森林的结构特点,制定了一组由随机森林结点覆盖率RFNC、随机森林叶子覆盖率RFLC、级联森林类型覆盖率CFCC和级联森林输出覆盖率CFOC组成的测试覆盖率评价指标.在此基础上,基于遗传算法设计了覆盖制导的测试数据自动生成方法DeepRanger,可自动生成能有效提高模型覆盖率的测试数据集.为对所提出覆盖指标的有效性进行验证,在深度森林开源项目gcForest和MNIST数据集上设计并进行了一组实验.实验结果表明,所提出的4种覆盖指标均能有效评价测试数据集对深度森林模型的测试充分性.此外,与基于随机选择的遗传算法相比,使用覆盖信息制导的测试数据生成方法 DeepRanger能达到更高的模型覆盖率.  相似文献   

回归测试中的测试用例优先排序技术述评   总被引:5,自引:4,他引:1  
陈翔  陈继红  鞠小林  顾庆 《软件学报》2013,24(8):1695-1712
测试用例优先排序(test case prioritization,简称TCP)问题是回归测试研究中的一个热点.通过设定特定排序准则,对测试用例进行排序以优化其执行次序,旨在最大化排序目标,例如最大化测试用例集的早期缺陷检测速率.TCP问题尤其适用于因测试预算不足以致不能执行完所有测试用例的测试场景.首先对TCP问题进行描述,并依次从源代码、需求和模型这3个角度出发对已有的TCP技术进行分类;然后对一类特殊的TCP问题(即测试资源感知的TCP问题)的已有研究成果进行总结;随后依次总结实证研究中常用的评测指标、评测数据集和缺陷类型对实证研究结论的影响;接着依次介绍TCP技术在一些特定测试领域中的应用,包括组合测试、事件驱动型应用测试、Web服务测试和缺陷定位等;最后对下一步工作进行展望.  相似文献   

随着软件技术的快速发展,面向领域的软件系统在广泛使用的同时带来了研究与应用上的新挑战.由于领域应用对安全性、可靠性有着很高的要求,而符号执行和模糊测试等技术在保障软件系统的安全性、可靠性方面已经发展了数十年.许多研究和被发现的缺陷表明了它们的有效性.但是由于两者的优劣不同,将这两者的结合仍是近期热门研究话题.目前的结合方法在于两者相互协助,例如模糊测试不可达的区域交给符号执行求解.但是这些方法只能在模糊测试(或符号执行)运行时判定是否应该借助符号执行(或模糊测试),无法同时利用这两者的优势,从而导致性能不足.基于此,我们提出基于深度学习,将基于符号执行的测试与模糊测试相结合的混合测试方法.该方法旨在测试开始之前就判断适合模糊测试(或符号执行)的路径集,从而制导模糊测试(或符号执行)到达适合它们的区域.同时,我们还提出混合机制实现两者之间的交互,从而进一步提升整体的覆盖率.基于LAVA-M中程序的实验表明,我们的方法相对于单独符号执行或模糊测试,能够提升20%多的分支覆盖率,增加约1~13倍的路径数目,多检测到929个缺陷.  相似文献   

变异测试是一种有效的软件测试技术,通过生成变异体来模拟软件缺陷,帮助提升现有测试套件的缺陷检测能力.变异体的质量对于变异测试的有效性具有显著影响.传统的变异测试方法通常采用人工设计的基于语法规则的变异算子生成变异体,并已取得一定的研究成果.近年来,许多研究开始结合深度学习技术,通过学习开源项目历史代码生成变异体.目前该新方法在变异体生成方面取得了初步的成果.基于语法规则和基于学习的两种变异技术,其机理不同,但其目标均是通过生成变异体来提高测试套件的缺陷检测能力,因此全面比较这两种变异技术对于变异测试及其下游任务(如缺陷定位和修复)至关重要.针对这一问题,本文设计实现一项针对基于语法规则和基于学习的变异技术的实证研究,旨在了解不同机理的变异技术在变异测试任务上的性能,以及生成的变异体在程序语义上的差异性.具体地,本文以Defect4J v1.2.0数据集为实验对象,比较MAJOR和PIT为代表的基于语法规则的变异技术和以DeepMutation、μBERT和LEAM为代表的基于深度学习的变异技术.实验结果表明,基于规则与学习的变异技术均可有效支持变异测试实践,但MAJOR的测试效果最优,能够检测85.4%的真实缺陷.在语义表示上,MAJOR具有最强的语义代表能力,基于其构造的测试套件能够杀死其余变异技术生成的超过95%占比的变异体.在缺陷表征上,两类技术均具有独特性.例如,基于规则的技术具有更稳定的代码元素变异能力,而基于学习的技术具有更强的上下文理解能力.  相似文献   

软件缺陷在软件的开发和维护过程中是不可避免的,软件缺陷报告是软件维护过程中重要的缺陷描述文档,高质量的软件缺陷报告可以有效提高软件缺陷修复的效率.然而,由于存在许多开发人员、测试人员和用户与缺陷跟踪系统交互并提交软件缺陷报告,同一个软件缺陷可能被不同的人员报告,导致了大量重复的软件缺陷报告.重复的软件缺陷报告势必加重人工检测重复缺陷报告的工作量,并造成人力物力的浪费,降低了软件缺陷修复的效率.以系统文献调研的方式,对近年来国内外学者在重复软件缺陷报告检测领域的研究工作进行了系统的分析.主要从研究方法、数据集的选取、性能评价等方面具体分析总结,并提出该领域在后续研究中存在的问题、挑战以及建议.  相似文献   

针对基于状态的类测试技术缺陷检测率较低的问题,提出一种使用等价类划分和边界值分析等功能性测试方法构建UML状态图的方法,描述基于W方法的测试序列生成策略,使用Mujava变异工具对方法的有效性进行检测。实验结果表明,该测试策略具有较高的缺陷检测率。  相似文献   

静态软件缺陷预测方法研究   总被引:14,自引:7,他引:7  
静态软件缺陷预测是软件工程数据挖掘领域中的一个研究热点.通过分析软件代码或开发过程,设计出与软件缺陷相关的度量元;随后,通过挖掘软件历史仓库来创建缺陷预测数据集,旨在构建出缺陷预测模型,以预测出被测项目内的潜在缺陷程序模块,最终达到优化测试资源分配和提高软件产品质量的目的.对近些年来国内外学者在该研究领域取得的成果进行了系统的总结.首先,给出了研究框架并识别出了影响缺陷预测性能的3个重要影响因素:度量元的设定、缺陷预测模型的构建方法和缺陷预测数据集的相关问题;接着,依次总结了这3个影响因素的已有研究成果;随后,总结了一类特殊的软件缺陷预测问题(即,基于代码修改的缺陷预测)的已有研究工作;最后,对未来研究可能面临的挑战进行了展望.  相似文献   

以太坊虚拟机是以太坊区块链中关键组成部分, 其缺陷会导致交易的执行结果出现偏差, 给以太坊生态带来严重问题. 现有的以太坊虚拟机缺陷检测工作仅将虚拟机视为独立的智能合约执行工具, 没有完整测试其工作流程, 从而导致缺陷检测存在盲点. 针对上述问题, 提出了一种以太坊虚拟机运行全过程的缺陷检测方法(ETHCOV). ETHCOV首先结合权重策略指导智能合约、合约接口参数输入和交易序列按不同粒度变异, 然后将其与区块状态以及世界状态打包作为测试用例, 最后将测试用例输入到以太坊虚拟机中触发运行并对比检验运行结果, 以此来检测以太坊虚拟机的漏洞缺陷. 基于上述方法实现了一个原型系统, 并以2万多个真实智能合约作为为输入对以太坊虚拟机进行缺陷检测测试. 实验结果表明, 相较于现有工具EVMFuzzer, ETHCOV的测试效率提升了339%, 代码覆盖率提升了125%, 并检测出3组用例的不一致输出. 这些结果表明ETHCOV能有效检测以太坊虚拟机的缺陷.  相似文献   

A measure of the “goodness” or efficiency of the test suite is used to determine the proficiency of a test suite. The appropriateness of the test suite is determined through mutation analysis. Several Finite State Machine (FSM) mutants are produced in mutation analysis by injecting errors against hypotheses. These mutants serve as test subjects for the test suite (TS). The effectiveness of the test suite is proportional to the number of eliminated mutants. The most effective test suite is the one that removes the most significant number of mutants at the optimal time. It is difficult to determine the fault detection ratio of the system. Because it is difficult to identify the system’s potential flaws precisely. In mutation testing, the Fault Detection Ratio (FDR) metric is currently used to express the adequacy of a test suite. However, there are some issues with this metric. If both test suites have the same defect detection rate, the smaller of the two tests is preferred. The test case (TC) is affected by the same issue. The smaller two test cases with identical performance are assumed to have superior performance. Another difficulty involves time. The performance of numerous vehicles claiming to have a perfect mutant capture time is problematic. Our study developed three metrics to address these issues: , , and In this context, most used test generation tools were examined and tested using the developed metrics. Thanks to the metrics we have developed, the research contributes to eliminating the problems related to performance measurement by integrating the missing parameters into the system.  相似文献   

The empirical assessment of test techniques plays an important role in software testing research. One common practice is to seed faults in subject software, either manually or by using a program that generates all possible mutants based on a set of mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults, thus facilitating the statistical analysis of fault detection effectiveness of test suites; however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. Focusing on four common control and data flow criteria (block, decision, C-use, and P-use), this paper investigates this important issue based on a middle size industrial program with a comprehensive pool of test cases and known faults. Based on the data available thus far, the results are very consistent across the investigated criteria as they show that the use of mutation operators is yielding trustworthy results: generated mutants can be used to predict the detection effectiveness of real faults. Applying such a mutation analysis, we then investigate the relative cost and effectiveness of the above-mentioned criteria by revisiting fundamental questions regarding the relationships between fault detection, test suite size, and control/data flow coverage. Although such questions have been partially investigated in previous studies, we can use a large number of mutants, which helps decrease the impact of random variation in our analysis and allows us to use a different analysis approach. Our results are then; compared with published studies, plausible reasons for the differences are provided, and the research leads us to suggest a way to tune the mutation analysis process to possible differences in fault detection probabilities in a specific environment  相似文献   

On similarity-awareness in testing-based fault localization   总被引:2,自引:0,他引:2  
In the process of software development and maintenance, software debugging is an inevitable and time-consuming task. To accelerate software debugging, various approaches have been proposed to automate fault localization. Among them, testing-based fault-localization approaches are most promising, which use the execution information of many test cases to localize the faults. However, these existing testing-based fault-localization approaches ignore the similarity between test cases, which may harm the effectiveness of these approaches according to our previous research. Therefore, in this paper we propose a similarity-aware fault-localization approach, which takes each test case as a fuzzy set to deal with the similarity between test cases and calculates statements’ suspicions based on the probability theory. To investigate whether SAFL can address the similarity issue effectively, we manually injected redundant test cases in a test suite and performed an experimental study on the original test suite and the test suite with redundancy, respectively. The experimental results demonstrate that in our experiments SAFL is an effective fault-localization approach, whether there is manually injected redundancy in the test suite. To compare SAFL with most existing testing-based fault-localization approaches, we performed another experimental study on Siemens program suite, which is extensively used in the evaluation of many other testing-based fault-localization approaches. This experimental study confirms the effectiveness of SAFL. Based on the two experimental studies, it seems that in our experiments SAFL cannot only deal with test suites containing much redundancy effectively but also perform effectively for test suites without much redundancy. A preliminary version of this paper appears in (Hao et al. 2005a).  相似文献   

Mutation testing has historically been used to assess the fault-finding effectiveness of a test suite or other verification technique. Mutation analysis, rather, entails augmenting a test suite to detect all killable mutants. Concerns about the time efficiency of mutation analysis may prohibit its widespread, practical use. The goal of our research is to assess the effectiveness of the mutation analysis process when used by software testers to augment a test suite to obtain higher statement coverage scores. We conducted two empirical studies and have shown that mutation analysis can be used by software testers to effectively produce new test cases and to improve statement coverage scores in a feasible amount of time. Additionally, we find that our user study participants view mutation analysis as an effective but relatively expensive technique for writing new test cases. Finally, we have shown that the choice of mutation tool and operator set can play an important role in determining how efficient mutation analysis is for producing new test cases.  相似文献   

测试用例集约简问题研究及其进展   总被引:7,自引:0,他引:7  
测试用例集约简问题是软件测试中的关键问题之一,其目的是使用尽可能少的测试用例充分满足给定的测试目标,从而提高测试效率、降低测试成本。在简要介绍了测试用例集约简问题基本概念的基础上,总结了求解该问题的几种主要方法,分析比较了这些方法的效率和特性。随后探讨了与测试用例集约简问题强相关的测试用例集错误检测效率的问题,并研究了测试用例优先级技术。最后指出了测试用例集约简问题的下一步研究方向。  相似文献   

Based on (1) research into mutation testing for general purpose programming languages, and (2) spreadsheet errors that have been reported in the literature, we have developed a suite of mutation operators for spreadsheets. We present an evaluation of the mutation adequacy of du-adequate test suites generated by a constraint-based automatic test-case generation system we have developed in previous work. The results of the evaluation suggest additional constraints that can be incorporated into the system to target mutation adequacy. In addition to being useful in mutation testing of spreadsheets, the operators can be used in the evaluation of error-detection tools and also for seeding spreadsheets with errors for empirical studies. We describe two case studies where the suite of mutation operators helped us carry out such empirical evaluations. The main contribution of this paper is a suite of mutation operators for spreadsheets that can be used for carrying out empirical evaluations of spreadsheet tools to indicate ways in which the tools can be improved.  相似文献   

Regression testing is an important activity in the software life cycle, but it can also be very expensive. To reduce the cost of regression testing, software testers may prioritize their test cases so that those which are more important, by some measure, are run earlier in the regression testing process. One potential goal of test case prioritization techniques is to increase a test suite's rate of fault detection (how quickly, in a run of its test cases, that test suite can detect faults). Previous work has shown that prioritization can improve a test suite's rate of fault detection, but the assessment of prioritization techniques has been limited primarily to hand-seeded faults, largely due to the belief that such faults are more realistic than automatically generated (mutation) faults. A recent empirical study, however, suggests that mutation faults can be representative of real faults and that the use of hand-seeded faults can be problematic for the validity of empirical results focusing on fault detection. We have therefore designed and performed two controlled experiments assessing the ability of prioritization techniques to improve the rate of fault detection of test case prioritization techniques, measured relative to mutation faults. Our results show that prioritization can be effective relative to the faults considered, and they expose ways in which that effectiveness can vary with characteristics of faults and test suites. More importantly, a comparison of our results with those collected using hand-seeded faults reveals several implications for researchers performing empirical studies of test case prioritization techniques in particular and testing techniques in general  相似文献   

Mutation testing has traditionally been used as a defect injection technique to assess the effectiveness of a test suite as represented by a “mutation score.” Recently, mutation testing tools have become more efficient, and industrial usage of mutation analysis is experiencing growth. Mutation analysis entails adding or modifying test cases until the test suite is sufficient to detect as many mutants as possible and the mutation score is satisfactory. The augmented test suite resulting from mutation analysis may reveal latent faults and provides a stronger test suite to detect future errors which might be injected. Software engineers often look for guidance on how to augment their test suite using information provided by line and/or branch coverage tools. As the use of mutation analysis grows, software engineers will want to know how the emerging technique compares with and/or complements coverage analysis for guiding the augmentation of an automated test suite. Additionally, software engineers can benefit from an enhanced understanding of efficient mutation analysis techniques. To address these needs for additional information about mutation analysis, we conducted an empirical study of the use of mutation analysis on two open source projects. Our results indicate that a focused effort on increasing mutation score leads to a corresponding increase in line and branch coverage to the point that line coverage, branch coverage and mutation score reach a maximum but leave some types of code structures uncovered. Mutation analysis guides the creation of additional “common programmer error” tests beyond those written to increase line and branch coverage. We also found that 74% of our chosen set of mutation operators is useful, on average, for producing new tests. The remaining 26% of mutation operators did not produce new test cases because their mutants were immediately detected by the initial test suite, indirectly detected by test suites we added to detect other mutants, or were not able to be detected by any test.
Laurie WilliamsEmail:

Ben Smith   is a second year Ph.D. student in Computer Science at North Carolina State University working as an RA under Dr. Laurie Williams. He received his Bachelor’s degree in Computer Science in May of 2007 and he hopes to receive his doctorate in 2012. He has begun work on developing SQL Coverage Metrics as a predictive measure of the security of a web application. This fall, he will be beginning the doctoral preliminary exam and working as a Testing Manager for the NCSU CSC Senior Design Center: North Carolina State’s capstone course for Computer Science. Finally, he has designed and maintained the websites for the Center for Open Software Engineering and ESEM 2009. Laurie Williams   is an Associate Professor in the Computer Science Department of the College of Engineering at North Carolina State University. She leads the Software Engineering Reasearch group and is also the Director of the North Carolina State University Laboratory for Collaborative System Development and the Center for Open Software Engineering. She is also technical co-director of the Center for Open Software Engineering (COSE) and the area technical director of the Secure Open Systems Initiative (SOSI) at North Carolina State University. Laurie received her Ph.D. in Computer Science from the University of Utah, her MBA from Duke University, and her BS in Industrial Engineering from Lehigh University. She worked for IBM for nine years in Raleigh, NC before returning to academia. Laurie’s research interests include agile software development methodologies and practices, collaborative/pair programming, software reliability and testing, and software engineering for secure systems development.   相似文献   

Web applications have become popular and a preferred mean for users to do various crucial tasks such as selling and buying goods, doing short tasks, controlling smart houses and bank account management. The correctness of all such applications is important and requires thorough testing. Structural testing is widely used to achieve correctness in traditional software's, however, for web applications, it is challenging because of its dynamic and heterogeneous nature. To achieve desired structural coverage of web applications different dynamic coverage criteria are used as a quality assessment indicator. However, there is a lack of empirical evidence regarding the effectiveness of the proposed coverage criteria. In this paper, we conduct an empirical evaluation by evaluating and comparing the fault detection effectiveness and efficiency of various dynamic coverage criteria by performing mutation analysis. We conduct a series of experiments to assess and compare four widely used coverage criteria on seven open-source case studies including small to large scale applications. We performed mutation analysis by first generating different faulty versions (mutants) for the case studies and then by executing test suites to record mutation score for each criterion. The results from most of the subject applications show that DOM coverage is the most effective and efficient criterion followed by Virtual DOM, HTML Element and Statement coverage criteria.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号