首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
何婧  吴跃  杨帆  尹春雷  周维 《计算机应用》2014,34(11):3218-3221
针对云存储系统大多基于键值对模型存储数据,多维查询需要对整个数据集进行完全扫描,查询效率较低的问题,提出了一种基于KD树和R树的多维索引结构(简称KD-R索引)。KD-R索引采用双层索引模式,在全局服务器建立基于KD树的多维全局索引,在局部数据节点构建R树多维本地索引。基于性能损耗模型,选取索引代价较小的R树节点发布到全局KD树,从而优化多维查询性能。实验结果表明:与全局分布式R树索引相比,KD-R索引能够有效提高多维范围查询性能,并且在出现服务器节点失效的情况下,KD-R索引同样具有高可用性。  相似文献   

2.
何龙  陈晋川  杜小勇 《软件学报》2017,28(3):502-513
SOH(SQL over HDFS)系统通常将数据存储于分布式文件系统HDFS中,采用Map/Reduce或分布式查询引擎来处理查询任务。得益于HDFS以及Map/Reduce的容错能力和可扩展性,SOH系统可以很好地应对数据规模的飞速增长,完成分析型查询处理。然而,在处理选择型查询或交互式查询时,这类系统暴露出性能上的缺陷。本文提出一个通用的索引技术,可以应用于SOH系统中,以提高其查询处理的效率。分析了SOH系统访问HDFS文件的过程,指出了其中影响数据加载时间的关键因素;提出了split层和split内部双层索引机制;设计并实现了聚集索引和非聚集索引。最后,在标准数据集上进行了大量实验,并与现有基于HDFS的索引技术进行了比较。实验结果表明,所提出的索引技术可以有效地提高查询处理的效率。  相似文献   

3.
The RDF-3X engine for scalable management of RDF data   总被引:1,自引:0,他引:1  
RDF is a data model for schema-free structured information that is gaining momentum in the context of Semantic-Web data, life sciences, and also Web 2.0 platforms. The “pay-as-you-go” nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper presents the RDF-3X engine, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with streamlined indexing and query processing. The physical design is identical for all RDF-3X databases regardless of their workloads, and completely eliminates the need for index tuning by exhaustive indexes for all permutations of subject-property-object triples and their binary and unary projections. These indexes are highly compressed, and the query processor can aggressively leverage fast merge joins with excellent performance of processor caches. The query optimizer is able to choose optimal join orders even for complex queries, with a cost model that includes statistical synopses for entire join paths. Although RDF-3X is optimized for queries, it also provides good support for efficient online updates by means of a staging architecture: direct updates to the main database indexes are deferred, and instead applied to compact differential indexes which are later merged into the main indexes in a batched manner. Experimental studies with several large-scale datasets with more than 50 million RDF triples and benchmark queries that include pattern matching, manyway star-joins, and long path-joins demonstrate that RDF-3X can outperform the previously best alternatives by one or two orders of magnitude.  相似文献   

4.
5.
随着大数据时代的到来,传统的计算机因为单机资源有限、运行速度慢、分布式处理支持差,已满足不了现行的医疗体系中的大数据处理需求,基于时空数据的移动医疗呼叫系统方法可以很好地解决这些问题。在移动云计算环境下研究[k]最近邻查询算法是当前一个热点问题,支持可扩展和分布式的空间数据索引对于kNN查询的效率影响很大,目前已有的查询算法不适合并行化或者会导致内容冗余。将MapReduce分布式处理技术与空间kNN查询方法相结合,设计可以快速检索到满足用户查询需求的医生位置信息的移动医疗呼叫算法。提出并构建了一个新的分布式空间数据索引方法:倒排Voronoi图索引,它将倒排索引和Voronoi图索引进行结合;提出了一种基于MapReduce的利用Voronoi图来处理kNN查询的高效算法,其在分布式环境下可以有效提高查询效率;用真实的和仿真的数据集来进行大量实验评估,实验结果表明所提出的方法具有良好的高效性和可扩展性。  相似文献   

6.
一种基于HBase的高效空间关键字查询策略   总被引:2,自引:0,他引:2  
随着移动定位技术的发展以及智能手机的普及,互联网中空间文本对象的数量正在急速增长,如何在规模庞大且动态增长的空间文本对象中进行高效的空间关键字查询成为了许多空间关键字查询应用所关心的问题.现有的方法通常利用基于R树和倒排索引的混合索引结构来处理空间关键字查询,然而,面对数量巨大而且不断增长的空间文本对象,这些方法往往难以为空间关键字查询的高效性和扩展性提供支持.对此,提出一种基于HBase的空间文本数据索引结构SK-HBase.SK-HBase以HBase作为数据存储,通过有效的数据分配策略对空间文本对象的空间信息和文本信息同时进行索引.在SK-HBase的基础上,本文提出了两种空间关键字查询算法,以保证不同空间范围下的空间关键字查询的高效性和可扩展性.实验证明,我们的方法能够在海量数据下进行高效的空间关键字查询并具有良好的可扩展性.  相似文献   

7.
In this paper, we develop a Q-hash index structure to efficiently store the position of moving objects. An environment of moving objects contains continuously changing locations which are hard to index using traditional index structures such as R-trees, QuadTrees and their variants. In order to answer the queries accurately, one of the problems faced in storing these positions is the number of updates that have to be made to the database whenever locations change. The high maintenance overhead on updates leads to performance degradation of these index structures; additionally, it makes the database very bulky which results in very poor performance in terms of query execution time. One of the main objectives of the structure we propose is to minimize the number of updates to the database to an optimal number so that the accuracy and response time of the query result are not compromised and at the same time the number of wireless communications can be reduced. The indexing is done using a hashing technique where the hashing function makes use of a region based QuadTree structure. To improve the efficiency of the query processing our index structure helps us define constraints over speed, direction and location of the moving object at the device level which controls the number of updates. In addition, in order to answer different query types efficiently at all levels we propose a three-tier (moving object, regional server, central repository) architecture. Our extensive performance evaluation and comparison of the proposed technique concludes that our scheme outperforms existing Q + R-tree and QuadTree in terms of range query execution time by a high order of magnitude.  相似文献   

8.
卢秉亮  刘娜 《计算机应用》2011,31(11):3078-3083
扩展了一种支持路网中移动对象的位置相关查询框架的功能,利用存在磁盘上的R树来存储网络连通性和一种基于内存的网格结构来维持移动对象的位置更新,提出了基于范围查询(MNDR)的快照K近邻查询算法(SKNN),对空间中的任意一条边,分析可能受影响的最大数量和最小数量的网格单元格,说明用于快照范围查询处理的搜索空间的最大范围,预估包含查询结果的子空间,使用这个子空间作为范围调用MNDR来有效地计算路网中查询点的KNN POI,降低I/O成本,缩短查询时间。通过实验对比,当规模扩展到数十万的移动对象时,SKNN比种有效查询处理空间网络数据的预计算方法S-GRID有更好大的系统吞吐量。  相似文献   

9.
Approximate similarity retrieval with M-trees   总被引:4,自引:0,他引:4  
Motivated by the urgent need to improve the efficiency of similarity queries, approximate similarity retrieval is investigated in the environment of a metric tree index called the M-tree. Three different approximation techniques are proposed, which show how to forsake query precision for improved performance. Measures are defined that can quantify the improvements in performance efficiency and the quality of approximations. The proposed approximation techniques are then tested on various synthetic and real-life files. The evidence obtained from the experiments confirms our hypothesis that a high-quality approximated similarity search can be performed at a much lower cost than that needed to obtain the exact results. The proposed approximation techniques are scalable and appear to be independent of the metric used. Extensions of these techniques to the environments of other similarity search indexes are also discussed. Received July 7, 1998 / Accepted October 13, 1998  相似文献   

10.
一种支持多维数据范围查询的对等计算索引框架   总被引:1,自引:0,他引:1  
如何有效地支持多维数据范围查询是传统数据管理领域的研究热点之一.但是,在大规模分布式系统中,这仍然是一个具有挑战性的研究工作.VBI-tree是一个对等计算环境下基于平衡树的索引架构,在该架构上可以实现集中式环境下的多种支持多维数据索引的层次化树结构,例如R-tree,X-tree和M-tree等.VBI-tree设计的查询算法保证查询可以从树的任意位置开始,而不是像集中式环境下层次化树结构那样采用从树的根节点开始查询的方法,从而成功地避免了根节点引起的系统性能瓶颈问题.对于有N个节点的网络,索引方法可以保证查询效率是O(log N).VBI-tree提出了基于AVL-tree旋转的网络重构负载均衡策略可以有效地均衡负栽.另外,在数据操作频繁的情况下,为了提高索引的性能,在VBI-tree上建立特殊的祖先-子孙链接形成VBI-tree的结构.通过使用祖先-子孙链接,可保证对于相关查询区域的探索尽量发生在同层节点之间,而不是一直往根节点方向发送,从而减轻上层节点的查询负担,并且显著地降低了更新代价.模拟实验验证了提出的方法的有效性.  相似文献   

11.
云数据管理索引技术研究   总被引:7,自引:3,他引:4  
马友忠  孟小峰 《软件学报》2015,26(1):145-166
数据的爆炸式增长给传统的关系型数据库带来了巨大的挑战,使其在扩展性、容错性等方面遇到了瓶颈.而云计算技术依靠其高扩展性、高可用性、容错性等特点,成为大规模数据管理的有效方案.然而现有的云数据管理系统也存在不足之处,其只能支持基于主键的快速查询,因缺乏索引、视图等机制,所以不能提供高效的多维查询、join等操作,这限制了云计算在很多方面的应用.主要对云数据管理中的索引技术的相关工作进行了深入调研,并作了对比分析,指出了其各自的优点和不足;对在云计算环境下针对海量物联网数据的多维索引技术研究工作进行了简单介绍;最后指出了在云计算环境下针对大数据索引技术的若干挑战性问题.  相似文献   

12.
The RUM-tree: supporting frequent updates in R-trees using memos   总被引:1,自引:0,他引:1  
The problem of frequently updating multi-dimensional indexes arises in many location-dependent applications. While the R-tree and its variants are the dominant choices for indexing multi-dimensional objects, the R-tree exhibits inferior performance in the presence of frequent updates. In this paper, we present an R-tree variant, termed the RUM-tree (which stands for R-tree with update memo) that reduces the cost of object updates. The RUM-tree processes updates in a memo-based approach that avoids disk accesses for purging old entries during an update process. Therefore, the cost of an update operation in the RUM-tree is reduced to the cost of only an insert operation. The removal of old object entries is carried out by a garbage cleaner inside the RUM-tree. In this paper, we present the details of the RUM-tree and study its properties. We also address the issues of crash recovery and concurrency control for the RUM-tree. Theoretical analysis and comprehensive experimental evaluation demonstrate that the RUM-tree outperforms other R-tree variants by up to one order of magnitude in scenarios with frequent updates.  相似文献   

13.
移动对象数据具有规模大、更新频繁的特点,对数据可视化具有较高的性能要求。当数据规模增大时,实时加载数据进行可视化的性能效率会随之降低。为了提高移动对象可视化的效率,提出了GPU环境下的移动对象更新方法,并结合移动对象特征设计出并行查询方案。同时,优化了移动对象的更新函数,通过比较临近的两次可视化查询的时间区间,找出需要更新的时间片,对其进行相应的更新,从而避免了整个时间区间的更新。实验使用了数据规模为400万到1?000万的合成数据集,和包含约960万个采样点的真实出租车数据集。实验结果表明,与CPU上的R-Tree查询、GPU上的R-Tree查询和CPU上更新函数中的串行索引查询方法相比,所提方法具有较好的查询性能,加速比最高可达18.48。移动对象更新函数优化后,当临近的两次可视化查询时间区间完全重叠时,加速效率接近100%。  相似文献   

14.
Efficient Bulk Operations on Dynamic R-Trees   总被引:8,自引:0,他引:8  
In recent years there has been an upsurge of interest in spatial databases. A major issue is how to manipulate efficiently massive amounts of spatial data stored on disk in multidimensional spatial indexes (data structures). Construction of spatial indexes (bulk loading ) has been studied intensively in the database community. The continuous arrival of massive amounts of new data makes it important to update existing indexes (bulk updating ) efficiently. In this paper we present a simple, yet efficient, technique for performing bulk update and query operations on multidimensional indexes. We present our technique in terms of the so-called R-tree and its variants, as they have emerged as practically efficient indexing methods for spatial data. Our method uses ideas from the buffer tree lazy buffering technique and fully utilizes the available internal memory and the page size of the operating system. We give a theoretical analysis of our technique, showing that it is efficient both in terms of I/ O communication, disk storage, and internal computation time. We also present the results of an extensive set of experiments showing that in practice our approach performs better than the previously best known bulk update methods with respect to update time, and that it produces a better quality index in terms of query performance. One important novel feature of our technique is that in most cases it allows us to perform a batch of updates and queries simultaneously. To be able to do so is essential in environments where queries have to be answered even while the index is being updated and reorganized. Received November 15, 1998; revised May 3, 2001.  相似文献   

15.
Efficient cost models for spatial queries using R-trees   总被引:11,自引:0,他引:11  
Selection and join queries are fundamental operations in database management systems (DBMS). Support for nontraditional data, including spatial objects, in an efficient manner is of ongoing interest in database research. Toward this goal, access methods and cost models for spatial queries are necessary tools for spatial query processing and optimization. We present analytical models that estimate the cost (in terms of node and disk accesses) of selection and join queries using R-tree-based structures. The proposed formulae need no knowledge of the underlying R-tree structure(s) and are applicable to uniform-like and nonuniform data distributions. In addition, experimental results are presented which show the accuracy of the analytical estimations when compared to actual runs on both synthetic and real data sets  相似文献   

16.
陈井爽  陈珂  寿黎但  江大伟  陈刚 《软件学报》2022,33(12):4688-4703
学习型索引通过学习数据分布可以准确地预测数据存取的位置,在保持高效稳定的查询下,显著降低索引的内存占用.现有的学习型索引主要针对只读查询进行优化,而对插入和更新支持不足.针对上述挑战,设计了一种基于Radix Tree的工作负载自适应学习型索引ALERT.ALERT使用Radix Tree来管理不定长的分段,段内采用具有最大误差界的线性插值模型进行预测.同时,ALERT使用一种高效的插入缓冲来降低数据插入更新的代价.针对点查询和范围查询提出两种自适应重组优化方法,通过对工作负载进行感知,动态地调整插入缓冲的组织结构.经实验验证,ALERT与业界流行的学习型索引相比,构建时间平均降低了81%,内存占用平均降低了75%,在保持了优秀读性能的同时,使插入延迟平均降低了50%;此外,ALERT使用自适应重组优化能有效感知查询工作负载特征,与不使用自适应重组优化相比,查询延迟平均降低了15%.  相似文献   

17.
In this paper, we propose a distributed method to control the view divergence of data freshness for clients in replicated database systems whose facilitating or administrative roles are equal. Our method provides data with statistically defined freshness to clients when updates are initially accepted by any of the replicas, and then, asynchronously propagated among the replicas that are connected in a tree structure. To provide data with freshness specified by clients, our method selects multiple replicas using a distributed algorithm so that they statistically receive all updates issued up to a specified time before the present time. We evaluated by simulation the distributed algorithm to select replicas for the view divergence control in terms of controlled data freshness, time, message, and computation complexity. The simulation showed that our method achieves more than 36.9 percent improvement in data freshness compared with epidemic-style update propagation.  相似文献   

18.
The significant overhead related to frequent location updates from moving objects often results in poor performance. As most of the location updates do not affect the query results, the network bandwidth and the battery life of moving objects are wasted. Existing solutions propose lazy updates, but such techniques generally avoid only a small fraction of all unnecessary location updates because of their basic approach (e.g., safe regions, time or distance thresholds). Furthermore, most prior work focuses on a simplified scenario where queries are either static or rarely change their positions. In this study, two novel efficient location update strategies are proposed in a trajectory movement model and an arbitrary movement model, respectively. The first strategy for a trajectory movement environment is the Adaptive Safe Region (ASR) technique that retrieves an adjustable safe region which is continuously reconciled with the surrounding dynamic queries. The communication overhead is reduced in a highly dynamic environment where both queries and data objects change their positions frequently. In addition, we design a framework that supports multiple query types (e.g., range and c-kNN queries). In this framework, our query re-evaluation algorithms take advantage of ASRs and issue location probes only to the affected data objects, without flooding the system with many unnecessary location update requests. The second proposed strategy for an arbitrary movement environment is the Partition-based Lazy Update (PLU, for short) algorithm that elevates this idea further by adopting Location Information Tables (LITs) which (a) allow each moving object to estimate possible query movements and issue a location update only when it may affect any query results and (b) enable smart server probing that results in fewer messages. We first define the data structure of an LIT which is essentially packed with a set of surrounding query locations across the terrain and discuss the mobile-side and server-side processes in correspondence to the utilization of LITs. Simulation results confirm that both the ASR and PLU concepts improve scalability and efficiency over existing methods.  相似文献   

19.
Distributed hash tables (DHTs) are very efficient for querying based on key lookups. However, building huge term indexes, as required for IR-style keyword search, poses a scalability challenge for plain DHTs. Due to the large sizes of document term vocabularies, peers joining the network cause huge amounts of key inserts and, consequently, a large number of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance costs. Various approaches in this direction have been pursued, including the use of hybrid infrastructures, or changing the granularity of the inverted index to peer level. We show that indexing costs can be significantly reduced further by letting peers form groups in a self-organized fashion. Instead of each individual peer submitting index information separately, all peers of a group cooperate to publish the index updates to the DHT in batches. Our evaluation shows that this approach reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.  相似文献   

20.
大规模分布式系统中的多属性查询处理   总被引:4,自引:0,他引:4  
大规模分布式系统中的复杂查询处理是将对等计算技术运用于关键应用中的重要问题,是学术界与工业界所共同关注的研究问题.文中介绍了一种高效、可伸缩的通用的基于类Chord协议的多属性查询处理技术GChord.它既支持匹配查询也支持范围查询.和现有其它技术相比,对于任何数据元组,GChord只需要对其编码和索引一次,且能将查询处理的代价限制在一个很小的范围内.因此,它能在索引维护代价和查询效率之间达到平衡.GChord还提供优化技术以进一步提升性能.实验证实了GChord具有较高的查询处理效率以及较低的索引维护代价.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号