首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A generalization of binary search trees and binary split trees is developed that takes advantage of two-way key comparisons: the two-way comparison tree. The two-way comparison tree has little use for dynamic situations but is an improvement over the optimal binary search tree and the optimal binary split tree for static data sets. AnO(n) time and space algorithm is presented for constructing an optimal two-way comparison tree when access probabilities are equal, and an exact formula for the optimal cost is developed. The construction of the optimal two-way comparison tree for unequal access frequencies, both successful and unsuccessful, is computable inO(n 5) time andO(n 3) space using algorithms similar to those for the optimal binary split tree. The optimal two-way comparison tree can improve search cost by up to 50% over the optimal binary search tree.  相似文献   

2.
Andrews  Bender  Zhang 《Algorithmica》2008,32(2):277-301
Abstract. Processor speed and memory capacity are increasing several times faster than disk speed. This disparity suggests that disk I/ O performance could become an important bottleneck. Methods are needed for using disks more efficiently. Past analysis of disk scheduling algorithms has largely been experimental and little attempt has been made to develop algorithms with provable performance guarantees. We consider the following disk scheduling problem. Given a set of requests on a computer disk and a convex reachability function that determines how fast the disk head travels between tracks, our goal is to schedule the disk head so that it services all the requests in the shortest time possible. We present a 3/2 -approximation algorithm (with a constant additive term). For the special case in which the reachability function is linear we present an optimal polynomial-time solution. The disk scheduling problem is related to the special case of the Asymmetric Traveling Salesman Problem with the triangle inequality (ATSP-Δ ) in which all distances are either 0 or some constant α . We show how to find the optimal tour in polynomial time and describe how this gives another approximation algorithm for the disk scheduling problem. Finally we consider the on-line version of the problem in which uniformly distributed requests arrive over time. We present an algorithm related to the above ATSP-Δ .  相似文献   

3.
In the single rent-to-buy decision problem, without a priori knowledge of the amount of time a resource will be used we need to decide when to buy the resource, given that we can rent the resource for $1 per unit time or buy it once and for all for $c . In this paper we study algorithms that make a sequence of single rent-to-buy decisions, using the assumption that the resource use times are independently drawn from an unknown probability distribution. Our study of this rent-to-buy problem is motivated by important systems applications, specifically, problems arising from deciding when to spindown disks to conserve energy in mobile computers [4], [13], [15], thread blocking decisions during lock acquisition in multiprocessor applications [7], and virtual circuit holding times in IP-over-ATM networks [11], [19]. We develop a provably optimal and computationally efficient algorithm for the rent-to-buy problem. Our algorithm uses time and space, and its expected cost for the t th resource use converges to optimal as , for any bounded probability distribution on the resource use times. Alternatively, using O(1) time and space, the algorithm almost converges to optimal. We describe the experimental results for the application of our algorithm to one of the motivating systems problems: the question of when to spindown a disk to save power in a mobile computer. Simulations using disk access traces obtained from an HP workstation environment suggest that our algorithm yields significantly improved power/ response time performance over the nonadaptive 2-competitive algorithm which is optimal in the worst-case competitive analysis model. Received October 22, 1996; revised September 25, 1997.  相似文献   

4.
Andersson [1] presented a search algorithm for binary search trees that uses only two-way key comparisons by deferring equality comparisons until the leaves are reached. The use of a different search algorithm means that the optimal tree for the traditional search algorithm, which has been shown to be computable inO(n 2) time by Knuth [3], is not optimal with respect to the different search algorithm. This paper shows that the optimal binary search tree for Andersson's search algorithm can be computed inO(nlogn) time using existing algorithms for the special case of zero successful access frequencies, such as the Hu-Tucker algorithm [2].  相似文献   

5.
Faster Approximate String Matching   总被引:11,自引:0,他引:11  
We present a new algorithm for on-line approximate string matching. The algorithm is based on the simulation of a nondeterministic finite automaton built from the pattern and using the text as input. This simulation uses bit operations on a RAM machine with word length w = Ω (log n) bits, where n is the text size. This is essentially similar to the model used in Wu and Manber's work, although we improve the search time by packing the automaton states differently. The running time achieved is O(n) for small patterns (i.e., whenever mk = O(log n)) , where m is the pattern length and k<m is the number of allowed errors. This is in contrast with the result of Wu and Manber, which is O(kn) for m=O(log n) . Longer patterns can be processed by partitioning the automaton into many machine words, at O(mk/w n) search cost. We allow generalizations in the pattern, such as classes of characters, gaps, and others, at essentially the same search cost. We then explore other novel techniques to cope with longer patterns. We show how to partition the pattern into short subpatterns which can be searched with less errors using the simple automaton, to obtain an average cost close to . Moreover, we allow the superimposition of many subpatterns in a single automaton, obtaining near average complexity (σ is the alphabet size). We perform a complete analysis of all the techniques and show how to combine them in an optimal form, also obtaining new tighter bounds for the probability of an approximate occurrence in random text. Finally, we show experimental results comparing our algorithms against previous work. These experiments show that our algorithms are among the fastest for typical text searching, being the fastest in some cases. Although we aim mainly at text searching, we believe that our ideas can be successfully applied to other areas such as computational biology. Received November 22, 1996; revised October 13 and December 5, 1997.  相似文献   

6.
This paper describes a magnetic/optical access structure for append–only temporal databases. We formally define the properties of an access structure, called the MonotonicB + -Tree (MBT), that is well suited for Write-Once Read Many optical disks. We present an insertion algorithm for the MBT that does not require splitting of index nodes and give the time analysis for this algorithm. We also describe a storage architecture where optical disks work in tandem with magnetic disks. Magnetic disks are used for storing current versions and recent past versions, whereas optical disks are dedicated for archiving older past versions. Our archiving techniques: (1) allow temporal data and the MBT access structure to span magnetic disks and optical disks; (2) minimize the overhead of the migration process by taking advantage of append–only nature of temporal databases; (3) gracefully handle object versions with very long time intervals so that the delay in the migration process is kept to minimum; and (4) ensure that no false magnetic or optical disk address lookup is performed during search operations by duplicating some closed versions on both magnetic and optical disks. To validate our claims for the efficiency of migration techniques, we analyze the performance of temporal access structures partitioned between magnetic and optical disks. We show that the migration process has a minimal effect on the search time. Our simulation identifies important parameters, and shows how they affect the performance of the temporal access structures. These include mean of version lifespan, block size, query time interval length, and total number of versions.  相似文献   

7.
Abstract. The construction of full-text indexes on very large text collections is nowadays a hot problem. The suffix array [32] is one of the most attractive full-text indexing data structures due to its simplicity, space efficiency and powerful/ fast search operations supported. In this paper we analyze, both theoretically and experimentally, the I/ O complexity and the working space of six algorithms for constructing large suffix arrays. Three of them are state-of-the-art, the other three algorithms are our new proposals. We perform a set of experiments based on three different data sets (English texts, amino-acid sequences and random texts) and give a precise hierarchy of these algorithms according to their working-space versus construction-time tradeoff. Given the current trends in model design [12], [32] and disk technology [29], [30], we pose particular attention to differentiate between ``random' and ``contiguous' disk accesses, in order to explain reasonably some practical I/ O phenomena which are related to the experimental behavior of these algorithms and that would otherwise be meaningless in the light of other simpler external-memory models. We also address two other issues. The former is concerned with the problem of building word indexes; we show that our results can be successfully applied to this case too, without any loss in efficiency and without compromising the simplicity of programming to achieve a uniform, simple and efficient approach to both the two indexing models. The latter issue is related to the intriguing and apparently counterintuitive ``contradiction' between the effective practical performance of the well-known Baeza-Yates—Gonnet—Snider algorithm [17], verified in our experiments, and its unappealing worst-case behavior. We devise a new external-memory algorithm that follows the basic philosophy underlying that algorithm but in a significantly different manner, thus resulting in a novel approach which combines good worst-case bounds with efficient practical performance.  相似文献   

8.
Although efficient identification of user access sessions from very large web logs is an unavoidable data preparation task for the success of higher level web log mining, little attention has been paid to algorithmic study of this problem. In this paper we consider two types of user access sessions, interval sessions and gap sessions. We design two efficient algorithms for finding respectively those two types of sessions with the help of some proposed structures. We present theoretical analysis of the algorithms and prove that both algorithms have optimal time complexity and certain error-tolerant properties as well. We conduct empirical performance analysis of the algorithms with web logs ranging from 100 megabytes to 500 megabytes. The empirical analysis shows that the algorithms just take several seconds more than the baseline time, i.e., the time needed for reading the web log once sequentially from disk to RAM, testing whether each user access record is valid or not, and writing each valid user access record back to disk. The empirical analysis also shows that our algorithms are substantially faster than the sorting based session finding algorithms. Finally, optimal algorithms for finding user access sessions from distributed web logs are also presented.  相似文献   

9.
Abstract. We present an optimal parallel randomized algorithm for the Voronoi diagram of a set of n nonintersecting (except possibly at endpoints) line segments in the plane. Our algorithm runs in O(log n) time with high probability using O(n) processors on a CRCW PRAM. This algorithm is optimal in terms of work done since the sequential time bound for this problem is Ω(n log n) . Our algorithm improves by an O(log n) factor the previously best known deterministic parallel algorithm, given by Goodrich, ó'Dúnlaing, and Yap, which runs in O( log 2 n) time using O(n) processors. We obtain this result by using a new ``two-stage' random sampling technique. By choosing large samples in the first stage of the algorithm, we avoid the hurdle of problem-size ``blow-up' that is typical in recursive parallel geometric algorithms. We combine the two-stage sampling technique with efficient search and merge procedures to obtain an optimal algorithm. This technique gives an alternative optimal algorithm for the Voronoi diagram of points as well (all other optimal parallel algorithms for this problem use the transformation to three-dimensional half-space intersection).  相似文献   

10.
Our purpose is to study the optimal way to merge n initially sorted runs, stored on a disk like device, into a unique sorted file. This problem is equivalent to finding a tree with n leaves which minimizes a certain cost function (see Knuth [1]).We shall study some properties of those optimal trees, in the hope of finding efficient ways for constructing them.In particular, if all the initial runs have the same length, an algorithm for constructing the best merge pattern is described ; its running time is proportional to n 2 and it requires space proportional to n.A special case is also analyzed in which the problem is solved in time and space proportional to n, and which provides some insight into the asymptotic behaviour of optimal merge trees.  相似文献   

11.
Abstract. This paper concerns the online list accessing problem. In the first part of the paper we present two new families of list accessing algorithms. The first family is of optimal, 2-competitive, deterministic online algorithms. This family, called the MRI (MOVE-TO-RECENT-ITEM) family, includes as members the well-known MOVE-TO-FRONT (MTF) algorithm and the recent, more ``conservative' algorithm TIMESTAMP due to Albers. So far MOVE-TO-FRONT and TIMESTAMP were the only algorithms known to be optimal in terms of their competitive ratio. This new family contains a sequence of algorithms { A(i) } i \geq 1 where A(1) is equivalent to TIMESTAMP and the limit element A(∈fty) is \mtf. Further, in this class, for each i , the algorithm A(i) is more conservative than algorithm A(i+1) in the sense that it is more reluctant to move an accessed item to the front, thus giving a gradual transition from the conservative TIMESTAMP to the ``reckless' MTF. The second new family, called the PRI (PASS-RECENT-ITEM) family, is also infinite and contains TIMESTAMP. We show that most algorithms in this family attain a competitive ratio of 3. In the second, experimental, part of the paper we report the results of an extensive empirical study of the performances of a large set of online list accessing algorithms (including members of our MRI and PRI families). The algorithms' access cost performances were tested with respect to a number of different request sequences. These include sequences of independent requests generated by probability distributions and sequences generated by Markov sources to examine the influence of locality. It turns out that the degree of locality has a considerable influence on the algorithms' absolute and relative costs, as well as on their rankings. In another experiment we tested the algorithms' performances as data compressors in two variants of the compression scheme of Bentley et al. In both experiments, members of the MRI and PRI families were found to be among the best performing algorithms.  相似文献   

12.
Abstract. We consider the problem of designing a minimum cost access network to carry traffic from a set of endnodes to a core network. Trunks are available in K types reflecting economies of scale . A trunk type with a high initial overhead cost has a low cost per unit bandwidth and a trunk type with a low overhead cost has a high cost per unit bandwidth. We formulate the problem as an integer program. We first use a primal—dual approach to obtain a solution whose cost is within O(K 2 ) of optimal. Typically the value of K is small. This is the first combinatorial algorithm with an approximation ratio that is polynomial in K and is independent of the network size and the total traffic to be carried. We also explore linear program rounding techniques and prove a better approximation ratio of O(K) . Both bounds are obtained under weak assumptions on the trunk costs. Our primal—dual algorithm is motivated by the work of Jain and Vazirani on facility location [7]. Our rounding algorithm is motivated by the facility location algorithm of Shmoys et al. [12].  相似文献   

13.

In machine learning, searching for the optimal feature subset from the original datasets is a very challenging and prominent task. The metaheuristic algorithms are used in finding out the relevant, important features, that enhance the classification accuracy and save the resource time. Most of the algorithms have shown excellent performance in solving feature selection problems. A recently developed metaheuristic algorithm, gaining-sharing knowledge-based optimization algorithm (GSK), is considered for finding out the optimal feature subset. GSK algorithm was proposed over continuous search space; therefore, a total of eight S-shaped and V-shaped transfer functions are employed to solve the problems into binary search space. Additionally, a population reduction scheme is also employed with the transfer functions to enhance the performance of proposed approaches. It explores the search space efficiently and deletes the worst solutions from the search space, due to the updation of population size in every iteration. The proposed approaches are tested over twenty-one benchmark datasets from UCI repository. The obtained results are compared with state-of-the-art metaheuristic algorithms including binary differential evolution algorithm, binary particle swarm optimization, binary bat algorithm, binary grey wolf optimizer, binary ant lion optimizer, binary dragonfly algorithm, binary salp swarm algorithm. Among eight transfer functions, V4 transfer function with population reduction on binary GSK algorithm outperforms other optimizers in terms of accuracy, fitness values and the minimal number of features. To investigate the results statistically, two non-parametric statistical tests are conducted that concludes the superiority of the proposed approach.

  相似文献   

14.
In this paper we describe algorithms for computing the Burrows-Wheeler Transform (bwt) and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size n, they use only n bits of working space on disk while all previous approaches use Θ(nlog n) bits. This is achieved by building the bwt directly without passing through the construction of the Suffix Array/Tree data structure. Moreover, our algorithms access disk data only via sequential scans, thus they take full advantage of modern disk features that make sequential disk accesses much faster than random accesses. We also present a scan-based algorithm for inverting the bwt that uses Θ(n) bits of working space, and a lightweight internal-memory algorithm for computing the bwt which is the fastest in the literature when the available working space is o(n) bits. Finally, we prove lower bounds on the complexity of computing and inverting the bwt via sequential scans in terms of the classic product: internal-memory space × number of passes over the disk data, showing that our algorithms are within an O(log n) factor of the optimal.  相似文献   

15.
We consider the problem of dynamically allocating and deallocating local memory resources among multiple users in a parallel or distributed system. Given a group of independent users and a collection of interconnected local memory devices, we want to render the fragmentation of the memory resources irrelevant by allowing any user to allocate space for his or her purposes as long as there is space available anywhere in the system. In effect, we would like it to appear to the users as though they are allocating memory from a single central pool of memory, even though the space is distributed throughout the system. Our goal is to devise an on-line allocation algorithm that minimizes two cost measures: first, the fraction of unused space , which arises due to fragmentation of the memory; second, the slowdown needed by the system to service user requests, which arises due to the contention for access to the memory devices. We solve this distributed dynamic allocation problem in near-optimal fashion by devising an algorithm that allows the memory to be used to 100% of capacity despite the fragmentation and guarantees that service delays will always be within a constant factor of optimal. The algorithm is completely on-line (no foreknowledge of user activity is assumed) and can accommodate any sequence of allocations and deallocations by the users that does not violate global memory bounds. We also consider the distributed dynamic allocation problem in the more restrictive setting where the local memory devices are connected by a low-degree fixed-connection network, rather than being fully interconnected. In this case, communication costs must be more explicitly considered in our allocation algorithms. We give allocation algorithms for butterfly and hypercube networks, and prove necessary and sufficient conditions on the total amount of memory space needed for near-optimal algorithms to exist. Received November 5, 1996; revised December 10, 1997.  相似文献   

16.
We describe algorithms for constructing optimal binary search trees, in which the access cost of a key depends on the k preceding keys which were reached in the path to it. This problem has applications to searching on secondary memory and robotics. Two kinds of optimal trees are considered, namely optimal worst case trees and weighted average case trees. The time and space complexities of both algorithms are O(nk+2) and O(nk+1), respectively. The algorithms are based on a convenient decomposition and characterizations of sequences of keys which are paths of special kinds in binary search trees. Finally, using generating functions, we present an exact analysis of the number of steps performed by the algorithms.  相似文献   

17.
Scheduling tasks onto the processors of a parallel system is a crucial part of program parallelisation. Due to the NP-hard nature of the task scheduling problem, scheduling algorithms are based on heuristics that try to produce good rather than optimal schedules. Nevertheless, in certain situations it is desirable to have optimal schedules, for example for time-critical systems or to evaluate scheduling heuristics. This paper investigates the task scheduling problem using the A* search algorithm which is a best-first state space search. The adaptation of the A* search algorithm for the task scheduling problem is referred to as the A* scheduling algorithm. The A* scheduling algorithm can produce optimal schedules in reasonable time for small to medium sized task graphs with several tens of nodes. In comparison to a previous approach, the here presented A* scheduling algorithm has a significantly reduced search space due to a much improved consistent and admissible cost function f(s) and additional pruning techniques. Experimental results show that the cost function and the various pruning techniques are very effective for the workload. Last but not least, the results show that the proposed A* scheduling algorithm significantly outperforms the previous approach.  相似文献   

18.
Andrews  Bender  Zhang 《Algorithmica》2002,32(2):277-301
Processor speed and memory capacity are increasing several times faster than disk speed. This disparity suggests that disk I/ O performance could become an important bottleneck. Methods are needed for using disks more efficiently. Past analysis of disk scheduling algorithms has largely been experimental and little attempt has been made to develop algorithms with provable performance guarantees. We consider the following disk scheduling problem. Given a set of requests on a computer disk and a convex reachability function that determines how fast the disk head travels between tracks, our goal is to schedule the disk head so that it services all the requests in the shortest time possible. We present a 3/2 -approximation algorithm (with a constant additive term). For the special case in which the reachability function is linear we present an optimal polynomial-time solution. The disk scheduling problem is related to the special case of the Asymmetric Traveling Salesman Problem with the triangle inequality (ATSP-Δ ) in which all distances are either 0 or some constant α . We show how to find the optimal tour in polynomial time and describe how this gives another approximation algorithm for the disk scheduling problem. Finally we consider the on-line version of the problem in which uniformly distributed requests arrive over time. We present an algorithm related to the above ATSP-Δ .  相似文献   

19.
Irani 《Algorithmica》2002,33(3):384-409
Abstract. We consider the paging problem where the pages have varying size. This problem has applications to page replacement policies for caches containing World Wide Web documents. We consider two models for the cost of an algorithm on a request sequence. In the first (the FAULT model) the goal is to minimize the number of page faults. In the second (the BIT model) the goal is to minimize the total number of bits that have to be read into the cache. We show offline algorithms for both cost models that obtain approximation factors of O(log k) , where k is the ratio of the size of the cache to the size of the smallest page. We show randomized online algorithms for both cost models that are O(log 2 k) -competitive. In addition, if the input sequence is generated by a known distribution, we show an algorithm for the FAULT model whose expected cost is within a factor of O(log k) of any other online algorithm.  相似文献   

20.
改进的二分法查找   总被引:4,自引:0,他引:4  
王海涛  朱洪 《计算机工程》2006,32(10):60-62,118
当前有很多的查找算法,其中在对有序数列的查找算法中二分法查找(binary search)是最常用的。利用二分法,在含有n个元素的有序数列中查找一个元素的最大比较次数为[logn]+1。在很多情况中,在查找之前有序数列分布的很多信息为已知,比如说如果知道了有序数列中每相邻两个元素之差的最大值的一个上界,就可以有比二分法更加有效的查找算法。文章给出了一个称之为改进的二分法查找算法。改进的二分法查找性能明显优于二分法查找,受数列分布的影响,其最坏情况下查找一个元素的最大比较次数在1和[logn]+1之间,明显优于二分查找的[logn]+1。在实际应用中利用改进的二分法可以极大地提高查找效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号