首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 83 毫秒
1.
The rigorous alignment of multiple protein sequences becomes impractical even with a modest number of sequences, since computer memory and time requirements increase as the product of the lengths of the sequences. We have devised a strategy to approach such an optimal alignment, which modifies the intensive computer storage and time requirements of dynamic programming. Our algorithm randomly divides a group of unaligned sequences into two subgroups, between which an optimal alignment is then obtained by a Needleman-Wunsch style of algorithm. Our algorithm uses a matrix with dimensions corresponding to the lengths of the two aligned sequence subgroups. The pairwise alignment process is repeated using different random divisions of the whole group into two subgroups. Compared with the rigorous approach of solving the n-dimensional lattice by dynamic programming, our iterative algorithm results in alignments that match or are close to the optimal solution, on a limited set of test problems. We have implemented this algorithm in a computer program that runs on the IBM PC class of machines, together with a user-friendly environment for interactively selecting sequences or groups of sequences to be aligned either simultaneously or progressively.  相似文献   

2.
Markovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specific classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for inferring models in this class in the restrictive data stream scenario: Unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We also present extensions of the algorithm that (1) reduce to a minimum the need for guessing parameters of the target distribution and (2) are able to adapt to changes in the input distribution, relearning new models when needed. We provide rigorous PAC-like bounds for all of the above. Our algorithm makes a key usage of stream sketching techniques for reducing memory and processing time, and is modular in that it can use different tests for state equivalence and for change detection in the stream.  相似文献   

3.
A program to calculate optimum alignment between two sequences, which may be DNA, amino acid or other information, has been written in PASCAL. The Sellers' algorithm for calculating distance between sequences has been modified to reduce its demands on microcomputer memory space by more than half. Gap penalties and mismatch scores are user-adjustable. In 48 K of memory the program aligns sequences up to 170 elements in length; optimum alignment and total distance between a pair of sequences are displayed. The program aligns longer sequences by subdivision of both sequences into corresponding, overlapping sections. Section length and amount of section overlap are user-defined. More importantly, extension of this modification of Sellers' algorithm to align longer sequences, given hardware and compilers/languages capable of using a larger memory space (e.g. 640 K), shows that it is now possible to align, without subdivision, sequences with up to 700 elements each. The increase in computation time for this program with increasing sequence lengths aligned without subdivision is curvilinear, but total times are essentially dependent on hardware/language/compiler combinations. The statistical significance of an alignment is examined with conventional Monte Carlo approaches.  相似文献   

4.
In this paper, we present an efficient technique for mapping a backpropagation (BP) learning algorithm for multilayered neural networks onto a network of workstations (NOW's). We adopt a vertical partitioning scheme, where each layer in the neural network is divided into p disjoint partitions, and map each partition onto an independent workstation in a network of p workstations. We present a fully distributed version of the BP algorithm and also its speedup analysis. We compare the performance of our algorithm with a recent work involving the vertical partitioning approach for mapping the BP algorithm onto a distributed memory multiprocessor. Our results on SUN 3/50 NOW's show that we are able to achieve better speedups by using only two communication sets and also by avoiding some redundancy in the weights computation for one training cycle of the algorithm.  相似文献   

5.
We consider garbage collection (GC) in dynamic real-time systems. We consider the time-based GC approach of running the collector as a separate, concurrent thread, and focus on real-time scheduling to obtain assurances on mutator timing behavior, while ensuring that memory is never exhausted. We present a scheduling algorithm called GCUA. The algorithm considers mutator activities that are subject to time/utility function time constraints, variable execution time demands, the unimodal arbitrary arrival model that allows a strong adversary, and resource overloads. We establish several properties of GCUA including probabilistically-satisfied utility lower bounds for each mutator activity, a lower bound on the system-wide total accrued utility, bounded sensitivity for the assurances to variations in mutator execution time demand estimates, and no memory exhaustion at all times. Our simulation experiments validate our analytical results and confirm the algorithm’s effectiveness and superiority.  相似文献   

6.
We present a memory and time efficient surface reconstruction algorithm for large sets of points, without normal information, that may not fit within the main memory. Our algorithm treats the input points as forming layers of surface patches, and then performs reconstruction by working on multiple layers of surface patches concurrently based on an out-of-core octree. Additionally, the memory usage of the algorithm can be pre-estimated through a few simple quantities in relation to the spatial coherence of the input points. Tests on the algorithm with large point sets verify that it produces good outputs too.  相似文献   

7.
We consider garbage collection (GC) in dynamic, multiprocessor real-time systems. We consider the time-based, concurrent GC approach and focus on real-time scheduling to obtain mutator timing assurances, despite memory allocation and garbage collection. We present a scheduling algorithm called GCMUA. The algorithm considers mutator activities that are subject to time/utility function time constraints, stochastic execution-time and memory demands, and overloads. We establish that GCMUA probabilistically lower bounds each mutator activity's accrued utility, lower bounds the system-wide total accrued utility, and upper bounds the timing assurances' sensitivity to variations in mutator execution-time and memory demand estimates. Our simulation experiments validate our analytical results and confirm GCMUA's effectiveness.  相似文献   

8.
Sequence segmentation has gained popularity in bioinformatics and particularly in studying DNA sequences. Information theoretic models have been used in providing accurate solutions in the segmentation of DNA sequences. Existing dynamic programming approaches provide optimal solution to the segmentation problem. However, their quadratic time complexity prohibits their applicability to long sequences. In this paper, we propose a parallel approach to improve the performance of a quasilinear sequence segmentation algorithm. The target segmentation technique is a divide-and-conquer recursive algorithm that is based on information theory principles and models. We present three parallel implementations that aim at reducing the segmentation time. The first implementation uses the multithreading capabilities of CPUs. The second one is a hybrid implementation that utilizes the synergy between the CPU and the multithreading power of GPUs. The third implementation is a variation of the hybrid approach where it utilizes the concept of unified memory between the CPU and the GPU instead of the standard memory copy approach. We demonstrate the applicability of the parallel implementations by testing them on real DNA sequences and randomly generated sequences with different lengths and different number of unique elements. The results show that the hybrid CPU-GPU approach outperforms the sequential implementation with a speedup of up to 5.9X while the CPU parallel implementation provides a poor speedup of only 1.7X.  相似文献   

9.
In this paper, we address the scalability problem of periodicity detection for time series and sequence databases. We present time and space efficient periodicity detection method that efficiently uses external memory (disk) when the series cannot be processed inside the available main memory. Our approach uses suffix tree to facilitate periodicity detection. We consider two cases, namely in-core and out of core. First, we optimize storage requirements of the suffix tree to be able to fit larger suffix trees in-core. This guarantees the ability to mine larger sequences when everything can be kept in-core, which is what the current periodicity detection approaches are able to mine. Second, when the data structures go out of core, we extend the suffix tree construction part to use external memory. We are able to achieve this while maintaining linear time complexity. As a result, when we go out of core, we can mine databases that require considerably larger space to keep the data structures compared to the available main memory. For the out-of-core periodicity detection part, the proposed method allows the required data structures to be kept on external memory partially when a memory overflow situation occurs. Various pruning strategies are also proposed to allow the proposed approach to process large sequences within reasonable amount of time. Additionally, we introduced the notion of “emulated tree traversal” for fast suffix tree traversal. Due to all these special considerations, we are able to mine much larger sequences compared to other existing periodicity detection algorithms. To demonstrate the applicability, power, and effectiveness of the proposed framework, we present results of periodicity detection up to 500 MB of time sequence data, which (to the best of our knowledge) is the largest reported sequence mined for periodicity detection ever.  相似文献   

10.
Hierarchical data are often modelled as trees. An interesting query identifies pairs of similar trees. The standard approach to tree similarity is the tree edit distance, which has successfully been applied in a wide range of applications. In terms of runtime, the state-of-the-art algorithm for the tree edit distance is RTED, which is guaranteed to be fast independent of the tree shape. Unfortunately, this algorithm requires up to twice the memory of its competitors. The memory is quadratic in the tree size and is a bottleneck for the tree edit distance computation.In this paper we present a new, memory efficient algorithm for the tree edit distance, AP-TED (All Path Tree Edit Distance). Our algorithm runs at least as fast as RTED without trading in memory efficiency. This is achieved by releasing memory early during the first step of the algorithm, which computes a decomposition strategy for the actual distance computation. We show the correctness of our approach and prove an upper bound for the memory usage. The strategy computed by AP-TED is optimal in the class of all-path strategies, which subsumes the class of LRH strategies used in RTED. We further present the AP-TED+ algorithm, which requires less computational effort for very small subtrees and improves the runtime of the distance computation. Our experimental evaluation confirms the low memory requirements and the runtime efficiency of our approach.  相似文献   

11.
We propose a lossless, single‐rate triangle mesh topology codec tailored for fast data‐parallel GPU decompression. Our compression scheme coherently orders generalized triangle strips in memory. To unpack generalized triangle strips efficiently, we propose a novel parallel and scalable algorithm. We order vertices coherently to further improve our compression scheme. We use a variable bit‐length code for additional compression benefits, for which we propose a scalable data‐parallel decompression algorithm. For a set of standard benchmark models, we obtain (min: 3.7, med: 4.6, max: 7.6) bits per triangle. Our CUDA decompression requires only about 15% of the time it takes to render the model even with a simple shader.  相似文献   

12.
We present a new method to accelerate the process of finding nearest ray-object intersections in ray tracing. The algorithm consumes an amount of memory more or less linear in the number of objects. The basic ideas can be characterized with a modified BSP tree and plane traversal. Plane traversal is a fast linear time algorithm to find the closest intersection point in a list of bounding volumes hit by a ray. We use plane traversal at every node of the high outdegree BSP tree. Our implementation is competitive to fast ray-tracing programs. We present a benchmark suite that allows for an extensive comparison of ray-tracing algorithms.  相似文献   

13.
We present an efficient and accurate algorithm for self‐collision detection in deformable models. Our approach can perform discrete and continuous collision queries on triangulated meshes. We present a simple and linear time algorithm to perform the normal cone test using the unprojected 3D vertices, which reduces to a sequence point‐plane classification tests. Moreover, we present a hierarchical traversal scheme that can significantly reduce the number of normal cone tests and the memory overhead using front‐based normal cone culling. The overall algorithm can reliably detect all (self) collisions in models composed of hundreds of thousands of triangles. We observe considerable performance improvement over prior continuous collision detection algorithms.  相似文献   

14.
生物序列的k-mer频次统计是生物信息处理中一个非常基础且重要的问题. 本文针对多序列在对齐模式下,不同偏移处一段长度范围内的k-mer频次统计问题进行了研究. 提出了一种逆向遍历k-mer计数算法BTKC. 该算法能够充分利用长度的k-mer统计信息,快速得到长度的k-mer统计信息,从而避免了统计任意长度的k-mer频次信息时都需要对所有序列进行遍历. 算法的时间复杂度分析及实验结果表明,相比于传统的前向遍历FTKC算法,BTKC算法性能提升非常明显,且其时间复杂度与k-mer长度的变化范围无关,非常适合于在k-mer长度变化范围较大的情况下使用.  相似文献   

15.
Recently, many organisms have had their DNA entirely sequenced. This reality presents the need for comparing long DNA sequences, which is a challenging task due to its high demands for computational power and memory. Sequence comparison is a basic operation in DNA sequencing projects, and most sequence comparison methods currently in use are based on heuristics, which are faster but offer no guarantees of producing the best alignments possible. In order to alleviate this problem, Smith–Waterman proposed an algorithm. This algorithm obtains the best local alignments but at the expense of very high computing power and huge memory requirements. In this article, we present and evaluate our experiments involving three strategies to run the Smith–Waterman algorithm in a cluster of workstations using a Distributed Shared Memory System. Our results on an eight-machine cluster presented very good speed-up and indicate that impressive improvements can be achieved depending on the strategy used. In addition, we present a number of theoretical remarks concerning how to reduce the amount of memory used.  相似文献   

16.
Sequence alignment is a fundamental operation for homology search in bioinformatics. For two DNA or protein sequences of length m and n, full-matrix (FM), dynamic programming alignment algorithms such as Needleman-Wunsch and Smith-Waterman take O(m × n) time and use a possibly prohibitive O(m × n) space. Hirschberg's algorithm reduces the space requirements to O(min(m, n)), but requires approximately twice the number of operations required by the FM algorithms. The Fast Linear-Space Alignment (FastLSA) algorithm adapts to the amount of space available by trading space for operations. FastLSA can effectively adapt to use either linear or quadratic space, depending on the specific machine. Our experiments show that, in practice, due to memory caching effects, FastLSA is always as fast or faster than the Hirschberg and FM algorithms. To improve the performance of FastLSA further, we have parallelized it using a simple but effective form of wavefront parallelism. Our experimental results show that Parallel FastLSA exhibits good speedups, almost linear for eight processors or less, and also that the efficiency of Parallel FastLSA increases with the size of the sequences that are aligned. Consequently, parallel and sequential FastLSA can be flexibly and effectively used with high performance in situations where space and the number of parallel processors can vary greatly.  相似文献   

17.
The Hamming distance with shifts was introduced by Bookstein et al. as a generalization of the traditional Hamming distance to allow a tunable degree of fuzziness when comparing two binary sequences of the same length. We present a linear-time algorithm for computing this distance. The previous best time bound was quadratic.  相似文献   

18.
We present an efficient algorithm for dynamic adaptive color quantization of 24 bit image (video) sequences, important in multimedia applications. Besides producing hi fidelity 8 bit imagery, our algorithm runs with minimal computational cost and the generated colormaps are robust to small differences in consecutive images. Apart from the two standard color quantization tasks, colormap design and quantizer mapping, our algorithm includes colormap filling-an operation unique to dynamic color quantization. This task solves the problem of screen flicker, a serious problem in dynamic quantization of image sequences, resulting from rapid changes in display of colormaps. Our solution is based on two ideas: including in the current colormap a small set of color representatives from the previous image; assigning representatives to the colormap entries in an order that reduces the difference between contents of equal entries in consecutive colormaps. Our algorithm runs in near real time on medium range workstations  相似文献   

19.
Acyclic colorings of subcubic graphs   总被引:1,自引:0,他引:1  
It is known that the acyclic chromatic number of a subcubic graph is at most four, and its acyclic edge chromatic number is at most five. We present algorithms that prove these two facts. Let n be the number of vertices of a graph. Our first algorithm takes O(n) time and uses four colors to properly color the vertices of any subcubic graph so that there is no 2-colored cycle. Our second algorithm takes O(n) time and uses five colors to properly color the edges of any subcubic graph so that there is no 2-colored cycle. Both are the first linear-time algorithms for the problems they solve.  相似文献   

20.
We describe a fast computer algorithm for identifying consensus patterns in DNA sequences. The method requires no prior assumptions about the consensus pattern other than its length. In particular no previous knowledge of the frequency or spacing of consensus patterns is required. However, a priori information about the shape of the consensus pattern, or invariability of individual positions, or the overall conservation level, can be utilized to enhance the selectivity and sensitivity of search. As the number of all possible consensus words increases very rapidly with length, comprehensive searches have usually been restricted to a maximum of 10-12 nucleotides, even when large mainframes are used. Our algorithm enables searching for consensus patterns of this order on current mid-range and powerful microcomputers. Searches may be conducted on single, long sequences or a set of possibly aligned shorter sequences. We give examples of identified consensus patterns in both prokaryotic and eukaryotic DNA sequences, along with some typical program timings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号