首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
传统的可搜索加密方案仅支持精确匹配的搜索,在效率和性能上都不能适应云计算环境。用支持多种字符串相似性操作的R+树构建索引,实现了云计算中对加密数据的模糊关键字搜索;用编辑距离来量化关键字的相似度,提出了一种可以返回与关键字更接近的文件检索方法。通过字符串聚类提高了模糊关键字搜索的效率。  相似文献   

2.
云计算的核心是在虚拟化技术的基础上,通过互联网技术为用户提供动态易扩展的计算资源。利用中心服务器的计算模式来管控网络上大量云资源使得中心服务器成为整个系统的瓶颈,不利于云计算的大规模应用,因此提出使用对等网络技术构建分布式的云资源索引存储和查询系统,但是结构化拓扑系统维护比较复杂,一般不支持复杂搜索条件查询。本文提出了一种多关键字云资源搜索算法。在基于分层超级节点的云资源搜索算法基础上进行路由算法改进,希望实现多关键字的精确查询。对多关键字的生成、分割及存储做出了详细说明,提出一种有效的基于数据集的索引搜索策略,实现了包含三个或三个以上的关键字高效、准确查询。分析实验结果证明了算法明显提高了资源搜索的命中率,尤其是随着关键字数目的增多,不仅保证了资源搜索的命中率,同时大大增加了资源的召回率。  相似文献   

3.
对已有的Chord进行改造,提出了一种支持字符串模糊匹配的Chord扩展资源索引模型O-Chord。该模型以一维指纹为资源信息的键,实现了支持多关键字查询、语义查询的模糊匹配功能,并利用逆向指纹、逆向finger表项以及特有的预判断过程,提高了系统查询效率,使系统中的查询负载趋于平衡。实验结果表明,O-Chord具有较高的查全率和负载平衡性。  相似文献   

4.
李婷  程海涛 《计算机科学》2017,44(9):216-221, 226
在精确XML文档上的关键字查询方法的研究大多是基于LCA语义或者其变种语义(SLCA,ELCA等)开展的,将包含所有关键字的最紧致XML子树片段作为查询结果返回。但是这些基于LCA语义产生的查询结果中通常包含了大量的冗余信息,现实世界中存在着大量的不确定和模糊信息,因而如何从模糊XML文档中搜索到高质量的关键字查询结果是一个需要研究的问题。针对模糊XML文档上的关键字近似查询方法进行研究,通过引入最小连接树(MCT)的概念,提出在模糊XML文档上关键字查询的所有GDMCTs问题,并给出解决这一问题的基于栈的算法All fuzzy GDMCTs,该算法可以得到满足用户指定的子树大小阈值和可能性阈值条件的所有GDMCTs结果。实验表明,该算法在模糊XML文档上能够得到较高质量的关键字查询结果。  相似文献   

5.
随着电子医疗的高速发展,医疗机构需要花费大量资源管理各自独立庞大的电子病例数据,同时医疗机构之间难以实现数据共享。为此提出了一种适用于电子医疗环境的密文检索方案。该方案实现了数据的统一有效管理与利用,同时该方案支持容错且可验证的多关键字云端密文检索。多关键字容错机制基于模糊提取器,增强了检索有效性及实际应用性,基于双线性对累积树数据结构,实现的可验证机制为检索结果提供了可靠性验证。安全性分析表明,该方案满足用户数据保密性以及查询请求隐私性。搜索性能分析表明了该方案多关键字搜索的有效性。  相似文献   

6.
王燚 《计算机应用》2004,24(10):121-124
提出了一种新的相似字符串查询的方法。其目的在于提高基于相似字符串匹配的查询在大规模字符串数据库中的查询效率,并且提供带通配符的字符串查询方式。该方法使用Trie数据结构组织数据库中的数据,使用基于编辑距离的相似字符串匹配方法,在Trie数据结构中进行高效的匹配和查询,得到K相似度下的候选词集。实验证明,本方法在K≤2时具有相当高的查询效率。  相似文献   

7.
针对法律案例多样化,且法律条文中通常都含有大量专业性很强的法律专用语,使得法律按照条文来分析自身面临了很大难题,本文研究了基于Web的法律信息服务平台的设计与实现,将冗杂无序的网上法律信息进行统一搜集和分类,并提供了大量案例供用户阅读,为用户提供更多有价值的法律信息.系统主要功能有:商务实务查询,相关信息查询,用户权限管理.案例及其类别管理,法律法规及其类别管理,简易论坛,案例搜索,法律法规查询,相似案例查找等.本系统的核心功能是相似案例查找和关键字搜索功能,常要利用Lucene进行切词处理.相似案例查找功能逆向运用了KNN算法的思想,在切词处理后,形成空间向量,利用空间向量计算来分析两案例的相似度,从而实现了相似案例查找.为了实现关键字搜索功能,利用Lucene实现了对数据库中的案例的全文动态索引处理,从而提高了搜索速度和搜索精度.  相似文献   

8.
基于本体的信息检索是实现知识检索的有效途径,针对目前本体支持的形式化概念还不足以表示不完备知识的问题,提出一种基于Rough本体的信息检索方法,该方法中本体以本体信息系统的形式表示。用户提交关键字查询后,首先结合基于关键字检索的方法在预先定义的语义文档空间中搜索文档,然后利用关联搜索的方法来搜索与关键词关联的个体集和属性集,以属性集作为等价类构造Rough本体的近似空间,最后通过近似空间计算个体集和文档集的相似度,根据相似度高低对文档排序。实验表明,此方法比基于关键字和基于经典本体的方法有更高的查准率。  相似文献   

9.
本文提出了一种基于web的字符串的模糊匹配方法。将给定的源字符串S和T目标字符串按照分割好的字符串单元进行匹配,得到两个字符串的相似程度。此方法不同与串的模式匹配。  相似文献   

10.
一种有效的并行汉字/字符串相似检索技术   总被引:1,自引:0,他引:1  
王素琴  邹旭楷 《软件学报》1995,6(8):463-467
本文提出了一种有效的并行汉字/字符串相似检索技术.通过引入搜索状态向量及字符一模式匹配向量,该技术将字符串匹配比较转化为简单的整数字位运算,通过对字符串方向相反的搜索有效地实现了多处理机对汉字/字符串的并行相似检索.文中也给出了并行实现算法,同时分析了算法的复杂性.  相似文献   

11.
In this paper, we study edit similarity query processing to find strings similar to a query string from a collection of strings. To solve the problem, many algorithms have been proposed under a filter-and-verification framework, where candidate strings are generated and refined using a few filters and then verified to find true matches. A major focus of those algorithms has been on generating candidates as small as possible in an early stage of the query processing. A typical approach to generate candidates is to extract some signatures from a query and take union of string ids in the inverted lists of the extracted signatures. However, the number of candidates generated from existing techniques is extremely larger than the number of answer strings and costs for refinement and verification are expensive. To address the problem, we propose an intersection-based candidate generation scheme, which generates a substantially smaller number of candidates. Given some signatures of a query, the proposed scheme first categorizes signatures into several groups. Then, it takes intersection of string ids in the inverted lists of the signatures in each group. Finally, it takes union of the intersections to generate candidates. To minimize the number of candidates under our scheme, we propose a novel algorithm which judiciously selects an optimal signature group. We show through experiments that our technique is very effective in reducing the number of candidates and significantly improves the performance.  相似文献   

12.
A novel online approach to exact string matching and filtering of large databases is presented. String matching/filtering is based on artificial neural networks and operates in two stages: initially, a self‐organizing map retrieves the cluster of database strings that are most similar to the query string; subsequently, a harmony theory network compares the retrieved strings with the query string and determines whether an exact match exists. The similarity measure is configured to the specific characteristics of the database so as to expose overall string similarity rather than character coincidence at homologous string locations. The experimental results demonstrate foolproof, fast, and practically database‐size independent operation that is especially robust to database modifications. The proposed approach is put forward for general‐purpose (directory, catalogue, glossary search) as well as Internet‐oriented (e‐mail blocking, URL, username classification) applications. © 2010 Wiley Periodicals, Inc.  相似文献   

13.
In this paper, we propose an efficient encoding and labeling scheme for XML, called EXEL, which is a variant of the region labeling scheme using ordinal and insert-friendly bit strings. We devise a binary encoding method to generate the ordinal bit strings, and an algorithm to make a new bit string inserted between bit strings without any influences on the order of preexisting bit strings. These binary encoding method and bit string insertion algorithm are the bases of the efficient query processing and the complete avoidance of re-labeling for updates. We present query processing and update processing methods based on EXEL. In addition, the Stack-Tree-Desc algorithm is used for an efficient structural join, and the String B-tree indexing is utilized to improve the join performance. Finally, the experimental results show that EXEL enables complete avoidance of re-labeling for updates while providing fairly reasonable query processing performance.  相似文献   

14.
在区间编码和前缀编码的基础上,提出了一种区间编码的改进的编码方案RSD(region-string-dinary),采用二进制编码策略,可顺序友好的插入位串;提出了新的位串插入算法,可生成有序位串,且不影响已经存在位串的顺序。描述了R S D中节点间结构关系的判定方法。该二进制编码方案和位串插入算法是有效进行查询处理和避免更新时重新编码的基础。实验表明R S D使得更新时完全可避免重新编码,显示出合理的查询处理性能。  相似文献   

15.
Roland H. C. Yap 《Constraints》2001,6(2-3):157-172
Approximate matching techniques based on string alignment are important tools for investigating similarities between strings, such as those representing DNA and protein sequences. We propose a constraint based approach for parametric sequence alignment which allows for more general string alignment queries where the alignment cost can itself be parameterized as a query with some initial constraints. Thus, the costs need not be fixed in a parametric alignment query unlike the case in normal alignment. The basic dynamic programming string edit distance algorithm is generalized to a naive algorithm which uses inequalities to represent the alignment score. The naive algorithm is rather costly and the remainder of the paper develops an improvement which prunes alternatives where it can and approximates the alternatives otherwise. This reduces the number of inequalities significantly and strengthens the constraint representation with equalities. We present some preliminary results using parametric alignment on some general alignment queries.  相似文献   

16.
A fuzzy extractor is a powerful but theoretical tool that can be used to extract uniform strings from (discrete) noisy sources. However, when using a fuzzy extractor in practice, extra features are needed, such as the renewability of the extracted strings and the ability to use the fuzzy extractor directly on continuous input data instead of discrete data. Our contribution is threefold. Firstly, we propose a fuzzy embedder as a generalization of the fuzzy extractor. A fuzzy embedder naturally supports renewability, as it allows a string to be embedded instead of extracted. It also supports direct analysis of quantization effects, as it makes no limiting assumptions about the nature of the input source. Secondly, we give a general construction for fuzzy embedders based on the technique of quantization index modulation (QIM). We show that the performance measures of a QIM, as proposed by the watermarking community, translate directly to the security properties of the corresponding fuzzy embedder. Finally, we show that from the perspective of the length of the embedded string, quantization in two dimensions is optimal. We present two practical constructions for a fuzzy embedder in two-dimensional space. The first construction is optimal from reliability perspective, and the second construction is optimal in the length of the embedded string.  相似文献   

17.
支持带有通配符的字符串匹配算法   总被引:1,自引:0,他引:1       下载免费PDF全文
研究了查询字符串中含有通配符"*"以及"?"两种情况下的字符串匹配问题,其中,"*"代表任意长度的字符串,"?"代表字母表中任意一个字符。由于gram索引结构在空间大小以及查询效率上的优势,将gram索引结构用于带通配符的字符串匹配问题。通过将带有通配符的查询字符串分解为若干不含通配符的查询片段,成功地将带有通配符的复杂查询问题转化为不含通配符的简单精确子串匹配问题。同时在片段查询过程中运用长度过滤、位置过滤以及计数过滤等方法来提高查询速度。  相似文献   

18.
支持块编辑距离的索引结构   总被引:1,自引:0,他引:1  
在近似字符串匹配中,传统的编辑距离不能很好地衡量诸如人名、地址等数据的相似关系,而块编辑距离可以很好地衡量两个字符串的相似性.如何有效地支持块编辑距离,进行近似字符串查询处理具有重要的意义.计算两个字符串的块编辑距离是一个NP完全问题,因此希望提供有效的方法可以增强过滤能力,并减少假通过率.设计了一种支持移动编辑距离的新颖的索引结构SHV-Trie,通过研究移动编辑距离的操作特性,使用字母出现的频率作为支持移动编辑距离操作的一个下界,并且提出相应的查询过滤算法,同时,针对索引SHV-Trie的空间开销过大的问题,提出一种优化字母排列的索引结构和一种压缩的索引结构及相关查询过滤算法.真实数据集上的实验结果与分析显示了所提出的索引结构具有良好的过滤能力,并通过减少效率假通过率提高查询的效率.  相似文献   

19.
A method for learning the membership of strings belonging to finite languages is proposed. The learning is based on sets of strings of fuzzy linguistic variables. These strings belong to languages, each one of which describes a class of phenomena. The learning algorithm attempts to maximize the number of times a string reaches the highest possibility value for the language representing the class of phenomena containing the sample described by the string. Application to automatic speech recognition is described, and experimental results are presented showing the benefits of the proposed method.  相似文献   

20.
基于编辑距离的字符串近似查询算法一般是先给定阈值k,然后计算那些与查询串的编辑距离小于或等于k的结果。但是对于近似子串查询,结果中有很多是交叠的,并且是无意义的,于是提出了一种局部最优化匹配的概念,只计算那些符合阈值条件,并且是局部最优的结果,这样不仅避免了结果的交叠,而且极大节省了时间开销。给出了支持局部最优化匹配的近似子串查询的定义,相应提出了一种基于gram索引的局部最优化近似子串查询算法,分析了子串近似匹配过程中的规律,研究了基于局部最优化匹配的边界限定和过滤策略,给出了一种过滤优化的局部最优化近似子串查询算法,提高了查询效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号