首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 140 毫秒
1.
蒋华  殷波 《计算机应用》2009,29(2):403-405
针对重复网页的去重问题,对两种重复词句提取算法进行了系统分析比较。STC算法在时间成本上具有优秀性能,重复序列的倒排索引方法在空间复杂度方面更胜一筹。结合STC算法对重复序列方法进行了改进,而面向主题转载的重复网页,先抽取重复串,然后将重复串作索引进行STC算法的重复抽取。实验结果表明,改进算法在保持了原有空间特性的基础上极大地提高了时间效率。  相似文献   

2.
非编码区重复序列分析在基因组研究中起着重要作用,其基础就是在非编码DNA序列中识别和定位所有的重复结构。重复序列识别问题在计算机科学中主要体现为字符串匹配问题。在分析了后缀树和后缀数组字符串匹配算法的基础上,详细阐述了基于后缀数组的精确串联重复序列识别方法。实验表明,该方法适合用于非编码DNA序列分析。  相似文献   

3.
非编码区重复序列分析在基因组研究中起着重要作用,其基础就是在非编码DNA序列中识别和定位所有的重复结构。重复序列识别问题在计算机科学中主要体现为字符串匹配问题。在分析了后缀树和后缀数组字符串匹配算法的基础上,详细阐述了基于后缀数组的精确串联重复序列识别方法。实验表明,该方法适合用于非编码DNA序列分析。  相似文献   

4.
DNA序列分析研究是生物信息学的重要内容之一。基因组的基因相关区域和基因外区域中含有大量重复序列,尽管目前大多数重复序列的功能还没能肯定,但它们在遗传分析中已起重要作用。挖掘DNA重复序列成为DNA序列分析的关键。自底向上的挖掘算法中间过程产生很多短的、甚至单字符的模式,使得挖掘效率降低;另一方面,目前序列模式挖掘算法在多序列挖掘中表现出高效性,但由于单支持度定义的局限导致无法在挖掘过程中同时找到单条DNA序列中的重复序列,因此不能很好地适用于DNA重复序列挖掘。本文基于新的多支持度序列模式挖掘框架,提出了一种融合自底向上和自顶向下策略挖掘DNA重复序列的新算法DnaReSM,其结果为生物学相关实验提供基础。实验结果表明,DnaReSM探测算法能有效挖掘DNA重复序列。  相似文献   

5.
DNA序列中基于后继数组索引的LPR查找算法   总被引:1,自引:1,他引:0  
DNA序列中的重复片段在人类基因研究中有着非常重要的生物意义,因此,查找给定DNA序列中的重复片段是生物序列分析领域中的一个重要课题.基于重复片段的模式提出了新的重复片段定义LPR(largest pattern repetition)和模式单元的概念.对于长度为n的DNA序列,其中的LPR的数量是O(n)数量级的,但提供了与个数可多达n2/4的tandem repeat相同的重复片段信息.基于模式单元设计了可用于重复片段查找的全新索引--后继数组.后继数组有效地降低了索引空间,很好地突破了重复片段查找中的索引空间瓶颈.在后继数组上,通过模式单元可发现构成LPR的全部原子模式,并通过判断相同模式是否在原序列中连续出现完成LPR的查找.理论分析和实验结果均表明,设计的LPR查找算法的时间和空间复杂度均为O(n).  相似文献   

6.
通过分析下推自动机的运行规律和特点,提出上下文无关语言的可重复序列的概念,将其划分为平衡重复序列、增重复序列、减重复序列三类;研究了这三类可重复序列在下推自动机的状态转换图中的结构表现和性质,通过分析下推自动机状态转换图中标注回路与可重复序列之间的关系,给出求解可重复序列的计算方法;证明了不同类型的可重复序列对上下文无关语言性质的影响,利用可重复序列揭示了上下文无关语言的Pumping引理的本质特征,并给出正规语言判定的一个充分必要条件.  相似文献   

7.
霍红卫  白帆 《计算机学报》2008,31(2):214-219
当前大部分重复体识别算法不是依靠于已经标识的重复体数据库就是定义重复体为两个最大长度的相似序列,而没有一个严格的定义来平衡重复体的长度和频率.针对这些问题文中提出了一种基于局部序列比对算法BLAST变型且支持空位的快速识别重复体的RepeatSearcher算法.算法通过定义重复体的精确边界运用逐步扩展调和序列来识别重复体.算法使用C.briggsae基因组序列作为测试对象,并与当前通用的重复体识别算法RECON以及新近的识别算法RepeatScout做了比较分析.结果表明RepeatSearcher使每一条重复体序列具有了精确的边界,而且相对其它算法在没有损失精度的情况下,缩短了算法的运行时间.  相似文献   

8.
袋自动机     
提出了袋自动机模型和袋语言的概念,并给出了袋自动机的状态转换图;分析了袋语言重复序列在状态转换图中的反映,并划分为不变重复序列、增重复序列、减重复序列和传递重复序列,给出了袋语言的结构特性;研究了袋语言类同Chomsky文法体系中各型语言的关系,证明了正规语言类是袋语言类的真子集,袋语言类是上下文有关语言类的真子集,而袋语言类同上下文无关语言类是两个相交但互不包含的语言类,即存在不是上下文无关语言的袋语言,也存在无法用袋自动机产生的上下文无关语言.  相似文献   

9.
黄亚佳  倪磊  金帆  杨光 《集成技术》2019,8(6):31-38
直接的重复序列广泛地存在于真核和原核细胞基因组中,并且与多种疾病(如遗传性神经肌 肉神经退行性疾病等)相关,因此定量重复序列的删除变得非常重要。结合高通量显微成像和分析技术,该文设计了基于三色荧光报告系统的方法来定量重复序列删除的发生。结果显示,在铜绿假单胞菌中,重复序列的删除频率在 recA 基因缺失突变株中明显降低,而 RadA 蛋白和 UvrD 蛋白的缺失则会提高重复序列的删除频率,并且重复序列的删除与细菌的生长率和启动子等因素无关。该研究有助于加深对直接重复序列相关问题的理解,并为直接重复序列删除定量提供了新的方法。  相似文献   

10.
Petri网的状态转换图   总被引:2,自引:0,他引:2  
给出Petri网的状态转换图模型,并作为分析工具,分析Petri网重复引发序列在状态转换图中的表现特点,给出标注路径(回路)与引发序列(重复引发序列)的关系及其判定条件,并给出求基本重复引发序列的计算方法;定义了可重复序列之间的依赖性和依赖度,准确地给出了可重复序列之间依赖关系的形式描述;分析了Petri网语言的结构特点,证明了任意一个Petri网语言都是一个正规语言表达式与该网的可重复引发序列α闭包的同步.  相似文献   

11.
序列中的重复模式识别算法及应用研究是数据挖掘领域的重要问题,是提取序列中有用信息的主要手段之一。近年来,针对各种重复模式定义、有效的识别算法设计以及重复模式识别算法在有关领域中的应用有了很多研究成果。文中对序列中重复模式的类型与特点作了描述,讨论了识别算法中常用的数据结构,以分类的方式重点回顾并总结了近年来重复模式在一些相关领域中的应用及相关算法的设计思路与技巧,并从加入的领域知识及约束、识别结果与算法扩充性、存在的主要问题等方面进行了讨论,其中包括在网络信息抽取、Web文档特征提取与聚类算法及相关的维文信息处理等领域中的应用。最后,讨论了关于序列重复模式识别算法在各个相关领域中的应用研究所面临的挑战,并探讨了未来的研究方向。  相似文献   

12.
The software commonly used for assembly of shotgun sequence data has several limitations. One such limitation becomes obvious when repetitive sequences are encountered. Shotgun assembly is a difficult task, even for non-repetitive regions, but the use of quality assessments of the data and efficient matching algorithms have made it possible to assemble most sequences efficiently. In the case of highly repetitive sequences, however, these algorithms fail to distinguish between sequencing errors and single base differences in regions containing nearly identical repeats. None of the currently available fragment assembly programs are able to correctly assemble highly similar repetitive data, and we, therefore, present a novel shotgun assembly program, Tandem Repeat Assembly Program (TRAP). The main feature of this program is the ability to separate long repetitive regions from each other by distinguishing single base substitutions as well as insertions/deletions from sequencing errors. This is accomplished by using a novel multiple-alignment based analysis method. Since repeats are a common complication in most sequencing projects, this software should be of use for the whole sequencing community.  相似文献   

13.
We apply the Minimal Length Encoding Principle to formalize inference about the evolution of macromolecular sequences. The Principle is shown to imply a combination of Weighted Parsimony and Compatibility methods that have long been used by biologists because of their good practical performance. The background assumptions are expressed as an encoding scheme for the observed data and as heuristic rules for selection of diagnostic positions in the sequences. The Principle was applied to discover new subfamilies of Alu sequences, the most numerous family of repetitive DNA sequences in the human genome.  相似文献   

14.
微卫星是广泛分布在真核生物基因组中的短串联重复序列.微卫星不稳定(Microsatellite Instability,MSI)是指由DNA错配修复系统故障引起的微卫星区域重复序列插入或缺失的现象.微卫星不稳定的检测对于肿瘤的早期诊断以及预后判断等具有重要的意义.临床上采用MSI-PCR以及MMR-IHC的实验方法检测MSI,随着下一代测序技术的发展,基于高通量测序数据的MSI检测方法及软件逐渐涌现.本文将从生物学实验方法和计算方法两个角度对当前的MSI检测方法进行介绍并讨论分析这些方法的优势及局限.  相似文献   

15.
Detection and Recognition of Periodic,Nonrigid Motion   总被引:1,自引:0,他引:1  
The recognition of nonrigid motion, particularly that arising from human movement (and by extension from the locomotory activity of animals) has typically made use of high-level parametric models representing the various body parts (legs, arms, trunk, head etc.) and their connections to each other. Such model-based recognition has been successful in some cases; however, the methods are often difficult to apply to real-world scenes, and are severely limited in their generalizability. The first problem arises from the difficulty of acquiring and tracking the requisite model parts, usually specific joints such as knees, elbows or ankles. This generally requires some prior high-level understanding and segmentation of the scene, or initialization by a human operator. The second problem, with generalization, is due to the fact that the human model is not much good for dogs or birds, and for each new type of motion, a new model must be hand-crafted. In this paper, we show that the recognition of human or animal locomotion, and, in fact, any repetitive activity can be done using low-level, non-parametric representations. Such an approach has the advantage that the same underlying representation is used for all examples, and no individual tailoring of models or prior scene understanding is required. We show in particular, that repetitive motion is such a strong cue, that the moving actor can be segmented, normalized spatially and temporally, and recognized by matching against a spatio-temporal template of motion features. We have implemented a real-time system that can recognize and classify repetitive motion activities in normal gray-scale image sequences. Results on a number of real-world sequences are described.  相似文献   

16.
Many video sequences consist of a locally dynamic background containing moving foreground subjects. In this paper we propose a novel way of re‐displaying these sequences, by giving the user control over a virtual camera frame. Based on video mosaicing, we first compute a static high quality background panorama. After segmenting and removing the foreground subjects from the original video, the remaining elements are merged into a dynamic background panorama, which seamlessly extends the original video footage. We then re‐display this augmented video by warping and cropping the panorama. The virtual camera can have an enlarged field‐of‐view and a controlled camera motion. Our technique is able to process videos with complex camera motions, reconstructing high quality panoramas without parallax artefacts, visible seams or blurring, while retaining repetitive dynamic elements.  相似文献   

17.
18.
《Computers & chemistry》1996,20(1):119-121
CENSOR is a program designed to identify and eliminate fragments of DNA sequences homologous to any chosen reference sequences, in particular to repetitive elements. CENSOR is based on two principal algorithms of Smith & Waterman (1981) [J. Mol. Biol. 147, 195] and Wilbur & Lipman (1983) [Proc. Natl Acad. Sci. U.S.A. 80, 726]. It includes several pre-set sensitivity levels based on both biological and statistical criteria which help to distinguish between aligned pairs of homologous and non-homologous sequences. CENSOR has been implemented in C/C + + in the SUN/UNIX environment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号