首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The aim of software testing is to find faults in a program under test, so generating test data that can expose the faults of a program is very important. To date, current stud- ies on generating test data for path coverage do not perform well in detecting low probability faults on the covered path. The automatic generation of test data for both path coverage and fault detection using genetic algorithms is the focus of this study. To this end, the problem is first formulated as a bi-objective optimization problem with one constraint whose objectives are the number of faults detected in the traversed path and the risk level of these faults, and whose constraint is that the traversed path must be the target path. An evolution- ary algorithm is employed to solve the formulated model, and several types of fault detection methods are given. Finally, the proposed method is applied to several real-world programs, and compared with a random method and evolutionary opti- mization method in the following three aspects: the number of generations and the time consumption needed to generate desired test data, and the success rate of detecting faults. The experimental results confirm that the proposed method can effectively generate test data that not only traverse the target path but also detect faults lying in it.  相似文献   

2.
The growth of geo-technologies and the development of methods for spatial data collection have resulted in large spatial data repositories that require techniques for spatial information extraction, in order to transform raw data into useful previously unknown information. However, due to the high complexity of spatial data mining, the need for spatial relationship comprehension and its characteristics, efforts have been directed towards improving algorithms in order to provide an increase of performance and quality of results. Likewise, several issues have been addressed to spatial data mining, including environmental management, which is the focus of this paper. The main original contribution of this work is the demonstration of spatial data mining using a novel algorithm with a multi-relational approach that was applied to a database related to water resource from a certain region of S~o Paulo State, Brazil, and the discussion about obtained results. Some characteristics involving the location of water resources and the profile of who is administering the water exploration were discovered and discussed.  相似文献   

3.
Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach -- Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.  相似文献   

4.
DBSCAN (density-based spatial clustering of ap- plications with noise) is an important spatial clustering tech- nique that is widely adopted in numerous applications. As the size of datasets is extremely large nowadays, parallel process- ing of complex data analysis such as DBSCAN becomes in- dispensable. However, there are three major drawbacks in the existing parallel DBSCAN algorithms. First, they fail to prop- erly balance the load among parallel tasks, especially when data are heavily skewed. Second, the scalability of these al- gorithms is limited because not all the critical sub-procedures are parallelized. Third, most of them are not primarily de- signed for shared-nothing environments, which makes them less portable to emerging parallel processing paradigms. In this paper, we present MR-DBSCAN, a scalable DBSCAN algorithm using MapReduce. In our algorithm, all the crit- ical sub-procedures are fully parallelized. As such, there is no performance bottleneck caused by sequential process- ing. Most importantly, we propose a novel data partitioning method based on computation cost estimation. The objective is to achieve desirable load balancing even in the context of heavily skewed data. Besides, We conduct our evaluation us- ing real large datasets with up to 1.2 billion points. The ex- periment results well confirm the efficiency and scalability of MR-DBSCAN.  相似文献   

5.
In data analysis tasks, we are often confronted to very high dimensional data. Based on the purpose of a data analysis study, feature selection will find and select the relevant subset of features from the original features. Many feature selection algorithms have been proposed in classical data analysis, but very few in symbolic data analysis (SDA) which is an extension of the classical data analysis, since it uses rich objects instead to simple matrices. A symbolic object, compared to the data used in classical data analysis can describe not only individuals, but also most of the time a cluster of individuals. In this paper we present an unsupervised feature selection algorithm on probabilistic symbolic objects (PSOs), with the purpose of discrimination. A PSO is a symbolic object that describes a cluster of individuals by modal variables using relative frequency distribution associated with each value. This paper presents new dissimilarity measures between PSOs, which are used as feature selection criteria, and explains how to reduce the complexity of the algorithm by using the discrimination matrix.  相似文献   

6.
This paper proposes an image steganography scheme, in which a secret image is hidden into a cover image using a SIS (secret image sharing) scheme. Taking advantage of the fault tolerance property of the (k, n)-threshold SIS, where using any k of n shares (k ≤ n), the secret data can be recovered without any ambiguity. In order to increase the security of the secret information to digital attacks, the proposed steganography algorithm becomes resilient to cropping and impulsive noise contamination using SIS scheme. Among many SIS schemes proposed until now, the Lin and Chan's scheme is selected as the SIS, due to its lossless recovery capability of a large amount of secret data. Stego-image quality and hiding capacity depend on the prim number used in polynomial. The proposed scheme is evaluated from several points of view, such as imperceptibility of the stego-image respect to its original cover image using the PSNR, quality of the extracted secret image, robustness of hidden data to cropping operation, impulsive noise contamination and the combination of both attacks. The evaluation results show a high quality of the extracted secret image from the stego-image when it suffered more than 20% cropping and/or high density noise contamination.  相似文献   

7.
Drug taxonomy could be described as an inherent structure of different pharmaceutical componential drugs. Unfortunately, the literature does not always provide a clear path to define and classify adverse drug events. While not a systematic review, this paper uses examples from the literature to illustrate problems that investigators will confront as they develop a conceptual framework for their research. It also proposes a targeted taxonomy that can facilitate a clear and consistent approach to understanding different drugs and could aid in the comparison to results of past and future studies. In terms of building the drugs taxonomy, symptoms information were selected, clustered and adapted for this purpose. Finally, although national or international agreement on taxonomy for different drugs is a distant or unachievable goal, individual investigations and the literature as a whole will be improved by prospective, explicit classification of different drugs using this new pharmacy information system (PIS) and inclusion of the study's approach to classification in publications. The PIS allows user to find information quickly by following semantic connections that surround every drug linked to the subject. It provides quicker search, faster and more intuitive understanding of the focus. This research work can pretend to become a leading provider of encyclopedia service for scientists and educators, as well as attract the scientific community-universities, research and development groups.  相似文献   

8.
Abstract: Pedestrian detection techniques are important and challenging especially for complex real world scenes. They can be used for ensuring pedestrian safety, ADASs (advance driver assistance systems) and safety surveillance systems. In this paper, we propose a novel approach for multi-person tracking-by-detection using deformable part models in Kalman filtering framework. The Kalman filter is used to keep track of each person and a unique label is assigned to each tracked individual. Based on this approach, people can enter and leave the scene randomly. We test and demonstrate our results on Caltech Pedestrian benchmark, which is two orders of magnitude larger than any other existing datasets and consists of pedestrians varying widely in appearance, pose and scale. Complex situations such as people occluded by each other are handled gracefully and individual persons can be tracked correctly after a group of people split. Experiments confirm the real-time performance and robustness of our system, working in complex scenes. Our tracking model gives a tracking accuracy of 72.8% and a tracking precision of 82.3%. We can further reduce false positives by 2.8%, using Kalman filtering.  相似文献   

9.
When the mobile environment consists of light-weight devices, the loss of network connectivity and scarce resources, e.g., low battery power and limited memory, become primary issues of concern in order to efficiently support portable wireless devices. In this paper, we propose an index-based peer-to-peer data access method that uses a new Hierarchical Location-Based Sequential (HLBS) index. We then propose a novel distributed Nearest First Broadcast (NFB) algorithm. Both HLBS and NFB are specifically designed for mobile peer-to-peer service in wireless broadcast environments. The system has a lower response time, because the client only contacts a qualified service provider by accessing the HLBS and quickly retrieves the data to answer the query by using NFB. HLBS and NFB design the index for spatial objects according to the positions of individual clients and transfer the index in the order arranged so that the spatial query can be processed even after the user tunes the partial index. Hence, this design can support rapid and energy-efficient service. A performance evaluation is conducted to compare the proposed algorithms with algorithms based on R-tree and Hilbert-curve air indexes. The results show that the proposed data dissemination algorithm with the HLBS index is scalable and energy efficient in both range queries and nearest neighbor queries.  相似文献   

10.
Six Sigma is a rigorous, focused, and highly effective implementation of proven quality principles and techniques. A company's performance is measured by the sigma level of their business processes. Traditionally companies accepted three or four sigma performance levels as the norm. The Six Sigma standard of 3.4 problems-per-million opportunities is a response to the increasing expectations of customers. DMAIC is an acronym for five phases of Six Sigma methodology: Define, Measure, Analyze, Improve, Control. This paper describes possibility of using Bayesian Network for retraining data mining model. Concrete application of this proposal is in the field of the chum. Chum is a derivation from change and turn. It can be defined as a discontinuation of a contract. Data mining methods and algorithms can predict behavior of customers. We can get better results using Six Sigma methodology. The goal of this paper is proposal of implementation chum (with Bayesian network) to the phases of Six Sigma methodology.  相似文献   

11.
Due to the explosive increase in the amount of information in computer systems, we need a system that can process large amounts of data efficiently. Cloud computing system is an effective means to achieve this capacity and has spread throughout the world. In our research, we focus on hybrid cloud environments, and we propose a method for efficiently processing large amounts of data while responding flexibly to needs related to performance and costs. We have developed this method as middleware. For data-intensive jobs using this system, we have created a benchmark that can determine the saturation of the system resources deterministically. Using this benchmark, we can determine the parameters in this middleware. This middleware can provide Pareto optimal cost load balancing based on the needs of the user. The results of the evaluation indicate the success of the system.  相似文献   

12.
Use computer technology which includes serial data transmission technology, SQL Server database technology and network technology to the power system, we achieved the monitoring of electric quantity by wireless systems. Our work mainly includes the design of system structure, serial programming, database programming and network programming. We upload the data of current, voltage and power in each meter within a period of time to the Internet, making it convenient for the users and power grid companies to achieve current-time data monitoring.  相似文献   

13.
Road traffic accidents have caused a myriad of problems for many countries, ranging from untimely loss of loved ones to disability and disruption of work. In many cases, when a road traffic accident occurs that results in the death of both drivers of the vehicles involved in the accident, there are some difficulties in identifying the cause of the accident and the driver who committed the accident. There is a need for methods to identify the cause of road traffic accidents in the absence of eyewitnesses or when there is a dispute between those who are involved in the accident. This paper attempts to predict the causes of road accidents based on real data collected from the police department in Dubai, United Arab Emirates. Data mining techniques were used to predict the causes of road accidents. Results obtained have shown that the model can predict the cause of road accidents with accuracy greater than 75%.  相似文献   

14.
High-density oligonucleotide microarrays allow several millions of genetic markers in a single experiment to be observed. Current bioinformatics tools for gene-expression quantile data normalization are unable to process such huge data sets. In parallel with this reality, the huge volume of molecular data produced by current high-throughput technologies in modern molecular biology has increased at a similar pace, challenging our capacity to process and understand data. On the other hand, the arrival of CUDA (compute unified device architecture) has unveiled the extraordinary power of GPUs (graphics processing units) to accelerate data intensive general purpose computing more and more as time goes by. In this work, we have evaluated the use of dynamic parallelism for ordering gene-expression data, where the management of kernels launching can be done not only by the host, but also by the device. Each sample has more than 6.5 million genes. We optimized the Quicksort parallel implementation available in the CUDA-5.5 Toolkit Samples and compared the performance of the sequential Quicksort algorithm from the GNU C Library (glibc) and with the parallel radix sort implementation available in the CUDPP-2.1 library. The Quicksort parallel implementation is designed to run on the GPU Kepler architecture, which supports dynamic parallelism. The results show that in the studied application the GPU parallel version with dynamic parallelism attains speed-ups in the data-sorting step. However, to achieve an effective overall speed-up considering the radix sort algorithm, performance of the whole application needs further optimizations.  相似文献   

15.
Graduation project management system is designed and developed on B/S mode with C# language. The main flow of the system is students' selecting projects. System is designed on teachers, students and administrators, students. Focus on the project application, the project audit, the select projects and the results generated summary ect. It enables instructors to assign projects and students to select them online quickly and conveniently, and instructors can track and manage the graduation design in the whole course through the system. This system provides a good interactive platform of graduation design for teachers and students, which improves the efficiency and quality of the project selection of graduation design.  相似文献   

16.
This work compares commercial fast data transport approaches through 10 Gbit/s WAN (wide area network). Common solutions, such as FTP (file transport protocol) based on TCP/IP stack, are being increasingly replaced by modern protocols based on more efficient stacks. To assess the capabilities of current applications for fast data transport, the following commercial solutions were investigated: Velocity--a data transport application of Bit Speed LLC; TIXstream--a data transport application of Tixel GmbH; FileCatalyst Direct--a data transport application of Unlimi-Tech Software Inc; Catapult Server--a data transport application of XDT PTY LTD; ExpeDat--a commercial data transport solution of Data Expedition, Inc. The goal of this work is to test solutions under equal network conditions and thus compare transmission performance of recent proprietary alternatives for FTP/TCP within 10 Gigabit/s networks where there are high latencies and packet loss in WANs. This research focuses on a comparison of approaches using intuitive parameters such as data rate and duration of transmission. The comparison has revealed that of all investigated solutions TIXstream achieves maximum link utilization in presence of lightweight impairments. The most stable results were achieved using FC Direct. ExpeDat shows the most accurate output.  相似文献   

17.
Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.  相似文献   

18.
Translational science research development requires life sciences researchers to update their concept on systems integration and data integration. The RESTful API has proven to be very useful in integrating the massive amount of data and tools. This work presents an approach for biological systems analysis based on the semantic web and RESTful API. The integration of this approach in the BioExtract Server engine as a WMS (workflow management system) allows us to apply our work to the linking of several biological levels (metabolic pathway, metabolic network model, genes and publications). This work shows how HPC infrastructure and RESTful API can be an opportunity in Systems Biology and life science research challenges. We present a use case in BioExtract server as an integrated workflow management system. The use case shows how we can extract data to build a mathematical model for a specific biological system. The two workflow examples provided in this work enhance the flexibility in the workflow for reusability. Then any other researcher can customize the same workflow for different gene name or biological process.  相似文献   

19.
Wireless sensor networks (WSNs) have been applied in a variety of application areas. Most WSN systems, once deployed, are intended to operate unattended for a long period. During the lifetime, it is necessary to fix bugs, reconfigure system parameters, and upgrade the software in order to achieve reliable system performance. However, manually collecting all nodes back and reconfiguring through serial connections with computer is infeasible since it is labor-intensive and inconvenient due to the harsh deploying environments. Hence, data dissemination over multi-hop is desired to facilitate such tasks. This survey discusses the requirements and challenges of data dissemination in WSNs, reviews existing work, introduces some relevant techniques, presents the metrics of the performance and comparisons of the state-of-the-art work, and finally suggests the possible future directions in data dissemination studies. This survey elaborates and compares existing approaches of two categories: structure-less schemes and structure-based schemes, classified by whether or not the network structure information is used during the disseminating process. In existing literatures, different categories have definite boundary and limited analysis on the trade-off between different categories. Besides, there is no survey that discusses the emerging techniques such as Constructive Interference (CI) while these techniques have the chance to change the framework of data dissemination. In a word, even though many efforts have been made, data dissemination in WSNs still needs some more work to embrace the new techniques and improve the efficiency and practicability further.  相似文献   

20.
The existing methods for visualizing volumetric data are mostly based on piecewise linear models.And all kinds of analysis based on them have to be substituted by coarse interpolations.So both accuracy and reliability of the traditional framework for visualization and analysis of volumetric data are far from our needs of digging information implied in volumetric data fields.In this paper,we propose a novel framework based on a C2-continuous seven-directional box spline,under which reconstruction is of high accuracy and differential computations relative to analysis based on the reconstruction model are accurate.We introduce a polynomial differential operator to improve the reconstruction accuracy.In order to settle the difficulty of evaluating upon the seven-directional box spline,we convert it into B′ezier form and propose effective theories and algorithms of extracting iso-surfaces,critical points and curvatures.Plentiful of examples are also given in this paper to illustrate that the novel framework is suitable for analysis,the improved reconstruction method has high accuracy,and our algorithms are fast and stable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号