首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Bloom filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability, being impossible to store extra elements without increasing the false positive probability. This leads typically to a conservative assumption regarding maximum set size, possibly by orders of magnitude, and a consequent space waste. This paper proposes Scalable Bloom Filters, a variant of Bloom filters that can adapt dynamically to the number of elements stored, while assuring a maximum false positive probability.  相似文献   

2.
A Bloom filter is a space-efficient data structure used for probabilistic set membership testing. When testing an object for set membership, a Bloom filter may give a false positive. The analysis of the false positive rate is a key to understanding the Bloom filter and applications that use it. We show experimentally that the classic analysis for false positive rate is wrong. We formally derive a correct formula using a balls-and-bins model and show how to numerically compute the new, correct formula in a stable manner. We also prove that the new formula always results in a predicted greater false positive rate than the classic formula. This correct formula is numerically compared to the classic formula for relative error - for a small Bloom filter the prediction of false positive rate will be in error when the classic formula is used.  相似文献   

3.
长流检测对网络检测和管理有着重要的意义.提出一种基于抽样和Bloom Filters的长流检测算法,首先对报文进行抽样,然后通过Bloom Filters哈希运算,在内存中用临时表和流信息表来判断到达阈值的流并维护其信息,满足了高速网络环境下长流检测的要求,在保证测量精度的同时有效得控制了资源消耗.实验分析表明,和已有的方法相比,具有简单易行、资源可控等优点.  相似文献   

4.
Undoubtedly, dealing with security issues is one of the most important and complex tasks various networks face today. A large number of security algorithms have been proposed to enhance security in various types of networks. Many of these solutions are either directly or indirectly based on Bloom filter (BF), a space- and time-efficient probabilistic data structure introduced by Burton Bloom in 1970. Obviously, Bloom filters and their variants are getting more and more consideration in network security area. This paper provides an up-to-date survey of the application of BFs and their variants to improve performance of the approaches proposed to address security problems with different types of networks.  相似文献   

5.
现在的互联网中存在网页重复的问题,这些问题将会使数据挖掘,搜索的复杂度加大。现有技术一些不足之处,针对互联网中的重复网页采用基于Bloom Filter的网页去重算法。使用了现有的网页去杂算法,对网页进行预处理,同时利用Bloom Filter结构大大降低了网页去重算法的时间复杂度和空间复杂度。从网页中提炼出表示网页特征的一些长句,从而把网页去重过程转换为一个搜索长句的过程,使用Bloom Filter减小了算法的时间复杂度。  相似文献   

6.
提出一种轻量级的可扩展的传感器网络广播认证协议,在多级μTESLA中使用了压缩布鲁姆过滤器,取代了由Merkel树实现的对多级μTESLA密钥链进行绑定的过程.该协议具有持续时间长、自愈等属性,通过在每个接收端分配适当的存储空间,极大程度减少了通信负载.并结合类似公钥加密技术(PKC)计算模式的椭圆曲线加密(ECC)算法,实现分析结果表明:该协议可以保证长期的安全性,并能减少能量消耗.  相似文献   

7.
We give two algorithms to randomly permute a linked list of length n in place using O(nlogn) time and O(logn) stack space in both the expected case and the worst case. The first algorithm uses well-known sequential random sampling, and the second uses inverted sequential random sampling.  相似文献   

8.
On the false-positive rate of Bloom filters   总被引:1,自引:0,他引:1  
Bloom filters are a randomized data structure for membership queries dating back to 1970. Bloom filters sometimes give erroneous answers to queries, called false positives. Bloom analyzed the probability of such erroneous answers, called the false-positive rate, and Bloom's analysis has appeared in many publications throughout the years. We show that Bloom's analysis is incorrect and give a correct analysis.  相似文献   

9.
偏态数据流中的Bloom Filters自适应机制研究   总被引:1,自引:0,他引:1  
针对Count Bloom Filters(CBF)在对偏态分布的网络数据流进行频度检测时,其使用的固定位数的计数器容易溢出的不足,提出了一种自适应性Bloom Filters(Adaptive Bloom Filters ABF),ABF使用可扩展的逻辑计数器替代CBF中大小固定的物理计数器进行计数,逻辑计数器由数目动态变化的若干个物理计数器组成,初始状态逻辑计数器等同于物理计数器,但逻辑计数器在频度数值上溢时会自适应扩展,覆盖其外部的物理计数器,增加数值容量,保证数值的测量准确性.实验表明ABF能够更好地适应检测频度的变化,并且不显著增加误判率,在对数据偏态分布的频度测量场合比其它Count Bloom Filters更具有优势.  相似文献   

10.
The length-constrained heaviest path (LCHP) in a weighted tree T, where each edge is assigned a weight and a length, is the path P in T with maximum total path weight and total path length bounded by a given value B. This paper presents an O(nlogn) time LCHP algorithm which utilizes a data structure constructed from the spine decomposition of the input tree. This is an improvement over the existing algorithm by Wu et al. (1999), which runs in O(nlog2n) time. Our method also improves on a previous O(nlogn) time algorithm by Kim (2005) for the special case of finding a longest nonnegative path in a constant degree tree in that we can handle trees of arbitrary degree within the same time bounds.  相似文献   

11.
We give an algorithm to compute the subset partial order (called the subset graph) for a family F of sets containing k sets with N elements in total and domain size n. Our algorithm requires O(nk2/logk) time and space on a Pointer Machine. When F is dense, i.e. N=Θ(nk), the algorithm requires O(N2/log2N) time and space. We give a construction for a dense family whose subset graph is of size Θ(N2/log2N), indicating the optimality of our algorithm for dense families. The subset graph can be dynamically maintained when F undergoes set insertions and deletions in O(nk/logk) time per update (that is sub-linear in N for the case of dense families). If we assume words of b?k bits, allow bits to be packed in words, and use bitwise operations, the above running time and space requirements can be reduced by a factor of blog(k/b+1)/logk and b2log(k/b+1)/logk respectively.  相似文献   

12.
The Data Distribution Service (DDS) middleware has recently been standardized by the OMG. Prior to data communication, a discovery protocol had to locate and obtain remote DDS entities and their attributes. Specifically, DDS discovery matches the DataWriters (DWs) and DataReaders (DRs) entities (Endpoints) situated in different network nodes. DDS specification does not specify how this discovery is translated “into the wire”. To provide interoperability and transparency between different DDS implementations, the OMG has standardized the DDS Interoperability Wire Protocol (DDS-RTPS). Any compliant DDS-RTPS implementation must support at least the SDP (Simple Discovery Protocol). The SDP works in relatively small or medium networks but it may not scale as the number of DDS Endpoints increases. This paper addresses the design and evaluation of an SDP alternative-which uses Bloom Filters (BF)-that increases DDS scalability. BFs use Hash functions for space-efficient probabilistic data set representation. We provide both analytical and experimental studies. Results show that our approach can improve the discovery process (in terms of network load and node resource consumption), especially in those scenarios with large Endpoint per Participant ratios.  相似文献   

13.
Double hashing with bucket capacity one is augmented with multiple passbits to obtain significant reduction to unsuccessful search lengths. This improves the analysis of Martini et al. [P.M. Martini, W.A. Burkhard, Double hashing with multiple passbits, Internat. J. Found. Theoret. Comput. Sci. 14 (6) (2003) 1165-1188] by providing a closed form expression for the expected unsuccessful search lengths.  相似文献   

14.
It is well known how to preprocess a rooted tree in linear time to yield the lowest common ancestor of any given pair of nodes in constant time. We generalize these algorithms for graphs called Arbitrarily Directed Trees, or ADTs. Such trees are obtained by reversing the directions of some edges in trees with the customary “root-to-leaf” orientation of edges.  相似文献   

15.
Given a sequence of n elements, we introduce the notion of an almost-increasing subsequence as the longest subsequence that can be converted to an increasing subsequence by possibly adding a value, that is at most a fixed constant, to each of the elements. We show how to optimally construct such subsequence in time, where k is the length of the output subsequence.  相似文献   

16.
We consider the following problem: Given an unsorted array of n elements, and a sequence of intervals in the array, compute the median in each of the subarrays defined by the intervals. We describe a simple algorithm which needs O(nlogk+klogn) time to answer k such median queries. This improves previous algorithms by a logarithmic factor and matches a comparison lower bound for k=O(n). The space complexity of our simple algorithm is O(nlogn) in the pointer machine model, and O(n) in the RAM model. In the latter model, a more involved O(n) space data structure can be constructed in O(nlogn) time where the time per query is reduced to O(logn/loglogn). We also give efficient dynamic variants of both data structures, achieving O(log2n) query time using O(nlogn) space in the comparison model and O((logn/loglogn)2) query time using O(nlogn/loglogn) space in the RAM model, and show that in the cell-probe model, any data structure which supports updates in O(logO(1)n) time must have Ω(logn/loglogn) query time.Our approach naturally generalizes to higher-dimensional range median problems, where element positions and query ranges are multidimensional—it reduces a range median query to a logarithmic number of range counting queries.  相似文献   

17.
For a tree T the Level-Ancestor problem is the following: given a node x and an integer depth d, find the ancestor of x in T that is at depth d in T. We formulate this problem precisely for the Pure Pointer Machine model, i.e., pointer machines without arithmetic capabilities. We then provide a solution for the problem in the fully dynamic case that runs in worst-case O(lgn) time per operation. We also prove a matching lower bound of to solve the problem.  相似文献   

18.
The identification of significant attributes is of major importance to the performance of a variety of Learning Classifier Systems including the newly-emerged Bioinformatics-oriented Hierarchical Evolutionary Learning (BioHEL) algorithm. However, the BioHEL fails to deliver on a set of synthetic datasets which are the checkerboard data mixed with Gaussian noises due to the fact the significant attributes were not successfully recognised. To address this issue, a univariate Estimation of Distribution Algorithm (EDA) technique is introduced to BioHEL which primarily builds a probabilistic model upon the outcome of the generalization and specialization operations. The probabilistic model which estimates the significance of each attribute provides guidance for the exploration of the problem space. Experiment evaluations showed that the proposed BioHEL systems achieved comparable performance to the conventional one on a number of real-world small-scale datasets. Research efforts were also made on finding the optimal parameter for the traditional and proposed BioHEL systems.  相似文献   

19.
Universal hash functions for an infinite universe and hash trees   总被引:1,自引:0,他引:1  
In this note we describe the adaptation of the universal hashing idea to an infinite universe, and to prefix hash trees. These are a structure underlying all extendible hash methods, which have up to now only been studied under the uniform hashing assumption.  相似文献   

20.
In this paper we present a concise O(n) implementation of Cleary's algorithm for generating a sequence of restricted rotations between any two binary trees. The algorithm is described directly in terms of the binary trees, without using any intermediate representation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号

京公网安备 11010802026262号