首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 171 毫秒
1.
In recent years we have witnessed several applications of frequent sequence mining, such as feature selection for protein sequence classification and mining block correlations in storage systems. In typical applications such as clustering, it is not the complete set but only a subset of discriminating frequent subsequences which is of interest. One approach to discovering the subset of useful frequent subsequences is to apply any existing frequent sequence mining algorithm to find the complete set of frequent subsequences. Then, a subset of interesting subsequences can be further identified. Unfortunately, it is very time consuming to mine the complete set of frequent subsequences for large sequence databases. In this paper, we propose a new algorithm, CONTOUR, which efficiently mines a subset of high-quality subsequences directly in order to cluster the input sequences. We mainly focus on how to design some effective search space pruning methods to accelerate the mining process and discuss how to construct an accurate clustering algorithm based on the result of CONTOUR. We conducted an extensive performance study to evaluate the efficiency and scalability of CONTOUR, and the accuracy of the frequent subsequence-based clustering algorithm.  相似文献   

2.
In this paper, a hierarchical algorithm, HierarchyScan, is proposed to efficiently locate one-dimensional subsequences within a collection of sequences with arbitrary length. The proposed algorithm performs correlation between the stored sequences and the template pattern in the transformed domain to identify subsequences in a scale- and phase-independent fashion. This is in contrast to those approaches based on the computation of Euclidean distance in the transformed domain. In the proposed hierarchical algorithm, the transformed domain representation of each original sequence is divided into multiple groups of coefficients. The matching is performed hierarchically from the group with the greatest filtering capability to the group with the lowest filtering capability. Only those subsequences whose maximum correlation value is higher than a predefined threshold will be selected for additional screening. This approach is compared to the sequential scanning and an order-of-magnitude speedup is observed.  相似文献   

3.
A binary decision diagram based approach for mining frequent subsequences   总被引:2,自引:1,他引:1  
Sequential pattern mining is an important problem in data mining. State of the art techniques for mining sequential patterns, such as frequent subsequences, are often based on the pattern-growth approach, which recursively projects conditional databases. Explicitly creating database projections is thought to be a major computational bottleneck, but we will show in this paper that it can be beneficial when the appropriate data structure is used. Our technique uses a canonical directed acyclic graph as the sequence database representation, which can be represented as a binary decision diagram (BDD). In this paper, we introduce a new type of BDD, namely a sequence BDD (SeqBDD), and show how it can be used for efficiently mining frequent subsequences. A novel feature of the SeqBDD is its ability to share results between similar intermediate computations and avoid redundant computation. We perform an experimental study to compare the SeqBDD technique with existing pattern growth techniques, that are based on other data structures such as prefix trees. Our results show that a SeqBDD can be half as large as a prefix tree, especially when many similar sequences exist. In terms of mining time, it can be substantially more efficient when the support is low, the number of patterns is large, or the input sequences are long and highly similar.  相似文献   

4.
Quasi-Monte Carlo (QMC) methods are now widely used in scientific computation, especially in estimating integrals over multidimensional domains. One advantage of QMC is that it is easy to parallelize applications, and so the success of any parallel QMC application depends crucially on the quality of parallel quasirandom sequences used. Much of the recent work dealing with parallel QMC methods has been aimed at splitting a single quasirandom sequence into many subsequences. In contrast with this perspective to concentrate on breaking one sequence up, this paper proposes an alternative approach to generating parallel sequences for QMC. This method generates parallel sequences of quasirandom numbers via scrambling. The exact meaning of scrambling depends on the type of parallel quasirandom numbers. In general, we seek to randomize the generator matrix for each quasirandom number generator. Specifically, this paper will discuss how to parallelize the Halton sequence via scrambling. The proposed scheme for generating parallel random number streams is especially good for heterogeneous and unreliable computing environments.  相似文献   

5.
孙焘  朱晓明 《计算机科学》2017,44(2):270-274
多条序列的最长公共子序列可以代表多条序列的公共信息,其在诸多领域里有着重要的应用,如信息检索、基因序列匹配等。求解多条序列的最长公共子序列是著名的NP难问题,本质为多解问题。一些近似算法虽然时间复杂度较低,但只能求出单解,对于有多解的序列集合,求得的结果信息量损失较大。因此提出一个新的近似算法来解决最长公共子序列问题。算法引入了代数结构“格”,通过动态规划求解出两条序列的公共格,并递归求解当前格与当前序列的公共格。公共格中的路径保存了多条公共子序列使得最终求解出的最长公共子序列为多个。对算法的相关定理给出了理论证明,并通过实验验证了算法的正确性。  相似文献   

6.
DNA计算中编码序列的优化设计方案*   总被引:1,自引:1,他引:0  
提出了一种优化设计方案.该方案的各项评价指标均优于根据以往文献提供的方法所能得到的最好结果.尤其是所提出的海明距离测度方法,进一步保证了特异性杂交产生的自由能远大于非特异性杂交所产生的自由能,便于进行DNA编码序列的设计与选择,为可控的DNA计算提供可靠有效的编码序列.  相似文献   

7.
The severe competition in the market has driven enterprises to produce a wider variety of products to meet consumer’s need. However, frequent variation of product specification and more complexity of product cause the assembly sequence planning of product become more and more complicated. As a result, the issue of assembly sequence planning of complex product becomes a problem which is worthy of concern. In this study, a methodology for assembly sequence planning of complex components is presented, which consists of three phases: assembly-based modular design, assembly subsequences generation for each module and assembly sequences merging. Nested partitions (NP) method is used to merge assembly subsequences. Assembly sequences merging can make full use of subsequences information of modules and simplify assembly sequence planning of the complex products. A desk lamp is used as an example for implementation to validate the feasibility of this research.  相似文献   

8.
9.
基因组的结构与功能存在密切联系,其功能主要通过DNA子序列来表达,因此研究DNA序列结构对于生物信息学来说具有重要的意义。该文研究了k-长DNA子序列在DNA全序列中出现频数的计数问题,设计并实现了k-长DNA子序列内部计数算法和外部计数算法。该算法通过一个哈希函数把k-长DNA子序列映射为整数关键字从而把k-长DNA子序列出现频数的计数问题转化为整数关键字的重复计数问题,使得能够利用经典B树算法来解决k-长DNA子序列的出现频数计数问题。针对所要解决的问题提出3种改进措施以进一步提高算法的性能。  相似文献   

10.
基于Segmental-DTW的无监督行为序列分割   总被引:4,自引:0,他引:4  
吴晓婕  胡占义  吴毅红 《软件学报》2008,19(9):2285-2292
行为序列分割是行为分析与识别中最初始、最基础的一个步骤.提出了一种无监督的行为序列分割算法,主要步骤包括:(1)采用等长有重叠的时间窗口对视频序列进行粗分割;(2)将粗分割的视频段两两作比较,通过Segmental-DTW算法分割出两个视频段中最相似的行为片断;(3)将行为片断的相似性转化为邻接图表示,通过图聚类方法对分割出的行为片断进行聚类.该算法采用了从粗到细的分割思想,能够准确地分割出视频序列中大量出现的行为的片断,并将相同行为的片断聚为一类.分割结果可以直接用于行为建模和识别.实验结果也表明了分割出的行为片断具有较好的代表性和有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号