首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 892 毫秒
1.
Z树:一个高维度的数据索引结构   总被引:3,自引:0,他引:3       下载免费PDF全文
张强  赵政 《计算机工程》2007,33(15):49-51
Z树能够高效地处理对高维度数据集的矩形区域查询和最邻近搜索。它按照节点的形状变化量优化数据的插入位置,使节点形状趋于合理。文章给出了一个新的无重叠分裂算法,减少超级节点的产生。引入了动态剪枝和重新插入策略,压缩超级节点的数量和体积。提出了矩形节点的球形化方法和最优子树搜索算法。实验表明Z树的矩形区域查询和最邻近搜索的效率远远高于X树和SR树。  相似文献   

2.
Due to the famous dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel composite distance transformation method, which is called CDT, is proposed to support a fast k-nearest-neighbor (k-NN) search in high-dimensional spaces. In CDT, all (n) data points are first grouped into some clusters by a k-Means clustering algorithm. Then a composite distance key of each data point is computed. Finally, these index keys of such n data points are inserted by a partition-based B -tree. Thus, given a query point, its k-NN search in high-dimensional spaces is transformed into the search in the single dimensional space with the aid of CDT index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of the proposed scheme. Our results show-that this method outperforms the state-of-the-art high-dimensional search techniques, such as the X-Tree, VA-file, iDistance and NB-Tree.  相似文献   

3.
Range nearest-neighbor query   总被引:6,自引:0,他引:6  
A range nearest-neighbor (RNN) query retrieves the nearest neighbor (NN) for every point in a range. It is a natural generalization of point and continuous nearest-neighbor queries and has many applications. In this paper, we consider the ranges as (hyper)rectangles and propose efficient in-memory processing and secondary memory pruning techniques for RNN queries in both 2D and high-dimensional spaces. These techniques are generalized for kRNN queries, which return the k nearest neighbors for every point in the range. In addition, we devise an auxiliary solution-based index EXO-tree to speed up any type of NN query. EXO-tree is orthogonal to any existing NN processing algorithm and, thus, can be transparently integrated. An extensive empirical study was conducted to evaluate the CPU and I/O performance of these techniques, and the study showed that they are efficient and robust under various data sets, query ranges, numbers of nearest neighbors, dimensions, and cache sizes.  相似文献   

4.
对顺序索引方法进行了研究,提出一种基于向量近似的高维顺序索引结构,该结构顺序访问部分文件就能完成k近邻查询。在查询过程中依据投影值来终止查询过程,依据距离来排除不匹配的数据。为进一步降低数据访问率,采用椭圆体聚类算法对数据集进行划分。新索引结构支持以多个顺序访问过程完成k近邻查询,能够同时降低查询过程中的I/O开销和CPU开销。在大型高维图像特征库上的实验表明,新的高维索引结构的查询性能优于其他高维索引方法。  相似文献   

5.
Indexing high-dimensional data for efficient in-memory similarity search   总被引:3,自引:0,他引:3  
In main memory systems, the L2 cache typically employs cache line sizes of 32-128 bytes. These values are relatively small compared to high-dimensional data, e.g., >32D. The consequence is that existing techniques (on low-dimensional data) that minimize cache misses are no longer effective. We present a novel index structure, called /spl Delta/-tree, to speed up the high-dimensional query in main memory environment. The /spl Delta/-tree is a multilevel structure where each level represents the data space at different dimensionalities: the number of dimensions increases toward the leaf level. The remaining dimensions are obtained using principal component analysis. Each level of the tree serves to prune the search space more efficiently as the lower dimensions can reduce the distance computation and better exploit the small cache line size. Additionally, the top-down clustering scheme can capture the feature of the data set and, hence, reduces the search space. We also propose an extension, called /spl Delta//sup +/-tree, that globally clusters the data space and then partitions clusters into small regions. The /spl Delta//sup +/-tree can further reduce the computational cost and cache misses. We conducted extensive experiments to evaluate the proposed structures against existing techniques on different kinds of data sets. Our results show that the /spl Delta//sup +/-tree is superior in most cases.  相似文献   

6.
Batch Nearest Neighbor Search for Video Retrieval   总被引:2,自引:0,他引:2  
To retrieve similar videos to a query clip from a large database, each video is often represented by a sequence of high- dimensional feature vectors. Typically, given a query video containing m feature vectors, an independent nearest neighbor (NN) search for each feature vector is often first performed. After completing all the NN searches, an overall similarity is then computed, i.e., a single content-based video retrieval usually involves m individual NN searches. Since normally nearby feature vectors in a video are similar, a large number of expensive random disk accesses are expected to repeatedly occur, which crucially affects the overall query performance. Batch nearest neighbor (BNN) search is stated as a batch operation that performs a number of individual NN searches. This paper presents a novel approach towards efficient high-dimensional BNN search called dynamic query ordering (DQO) for advanced optimizations of both I/O and CPU costs. Observing the overlapped candidates (or search space) of a pervious query may help to further reduce the candidate sets of subsequent queries, DQO aims at progressively finding a query order such that the common candidates among queries are fully utilized to maximally reduce the total number of candidates. Modelling the candidate set relationship of queries by a candidate overlapping graph (COG), DQO iteratively selects the next query to be executed based on its estimated pruning power to the rest of queries with the dynamically updated COG. Extensive experiments are conducted on real video datasets and show the significance of our BNN query processing strategy.  相似文献   

7.
The MaxBRNN problem is to find an optimal region such that setting up a new service within this region might attract the maximum number of customers by proximity. The MaxBRNN problem has many practical applications such as service location planning and emergency schedule. In typical real-life applications the data volume of the problem is huge, thus an efficient solution is highly desired. In this paper, we propose two efficient algorithms, namely, OptRegion, and 3D-OptRegion to tackle the MaxBRNN problem and MaxBRkNN in two- and three-dimensional spaces, especially for the 3D-OptRegion, we propose a powerful pruning strategy Fine-grained Pruning Strategy to reduce the searching space. Our method employs three optimization techniques, i.e., sweep line (sweep plane in a three-dimensional space), pruning strategy (based on upper bound estimation), and influence value computation (of candidate points), to improve the search performance. In a three-dimensional space, we additionally use a fine-grained pruning strategy to further improve the pruning effect. Extensive experimental evaluation using both real and synthetic datasets confirms that both OptRegion and 3D-OptRegion outperform the existing algorithms significantly under all problem instances.  相似文献   

8.
基于矢量量化的快速图像检索   总被引:7,自引:0,他引:7  
叶航军  徐光祐 《软件学报》2004,15(5):712-719
传统索引方法对高维数据存在"维数灾难"的困难.而对数据分布的精确描述及对数据空间的有效划分是高维索引机制中的关键问题.提出一种基于矢量量化的索引方法.该方法使用高斯混合模型描述数据的整体分布,并训练优化的矢量量化器划分数据空间.高斯混合模型能更好地描述真实图像库的数据分布;而矢量量化的划分方法可以充分利用维之间的统计相关性,能够对数据向量构造出更加精确的近似表示,从而提高索引结构的过滤效率并减少需要访问的数据向量.在大容量真实图像库上的实验表明,该方法显著减少了支配检索时间的I/O开销,提高了索引性能.  相似文献   

9.
In a database, a similar information search means finding data records which contain the majority of search keywords. Due to the rapid accumulation of information nowadays, the size of databases has increased dramatically. An efficient information searching scheme can speed up information searching and retrieve all relevant records. This paper proposes a Hilbert curve-based similarity searching scheme (HCS). HCS considers a database to be a multidimensional space and each data record to be a point in the multidimensional space. By using a Hilbert space filling curve, each point is projected from a high-dimensional space to a low-dimensional space, so that the points close to each other in the high-dimensional space are gathered together in the low-dimensional space. Because the database is divided into many clusters of close points, a query is mapped to a certain cluster instead of searching the entire database. Experimental results prove that HCS dramatically reduces the search time latency and exhibits high effectiveness in retrieving similar information.  相似文献   

10.
A distance-preserving method is presented to map high-dimensional data sequentially to low-dimensional space. It preserves exact distances of each data point to its nearest neighbor and to some other near neighbors. Intrinsic dimensionality of data is estimated by examining the preservation of interpoint distances. The method has no user-selectable parameter. It can successfully project data when the data points are spread among multiple clusters. Results of experiments show its usefulness in projecting high-dimensional data.  相似文献   

11.
基于单元区域的高维数据聚类算法   总被引:1,自引:0,他引:1  
高维数据空间维数较高,数据点分布稀疏、密度平均,从中发现数据聚类比较困难,而用基于距离的方法进行高维数据聚类,维数的增多会使得计算对象间距离的时间开销增大. CAHD(clustering algorithm of high-dimensional data)算法首先采用双向搜索策略在指定的n维空间或其子空间上发现数据点密集的单元区域,然后采用逐位与的方法为这些密集单元区域进行聚类分析.双向搜索策略能够有效地减少搜索空间,从而提高算法效率,同时,聚类密集单元区域只用到逐位与和位移两种机器指令,使得算法效率得到进一步提高.算法CAHD可以有效地处理高维数据的聚类问题.基于数据集的实验表明,算法具有很好的有效性.  相似文献   

12.
K‐nearest neighbors (KNN) search in a high‐dimensional vector space is an important paradigm for a variety of applications. Despite the continuous efforts in the past years, algorithms to find the exact KNN answer set at high dimensions are outperformed by a linear scan method. In this paper, we propose a technique to find the exact KNN image objects to a given query object. First, the proposed technique clusters the images using a self‐organizing map algorithm and then it projects the found clusters into points in a linear space based on the distances between each cluster and a selected reference point. These projected points are then organized in a simple, compact, and yet fast index structure called array‐index. Unlike most indexes that support KNN search, the array‐index requires a storage space that is linear in the number of projected points. The experiments show that the proposed technique is more efficient and robust to dimensionality as compared to other well‐known techniques because of its simplicity and compactness. © 2011 Wiley Periodicals, Inc.  相似文献   

13.
The paper proposes a novel symmetrical encoding-based index structure, which is called EDD-tree (for encoding-based dual distance tree), to support fast k-nearest neighbor (k-NN) search in high-dimensional spaces. In the EDD-tree, all data points are first grouped into clusters by a k-means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme, in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B^+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EDD-tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high-dimensional search techniques such as the X-tree, VA-file, iDistance and NB-tree, especially when the query radius is not very large.  相似文献   

14.
核模糊C-均值聚类KFCM是利用核函数将数据映射到高维空间,通过计算数据点与聚类中心的隶属度对数据进行聚类的算法,拥有高效、快捷的特点而被广泛应用于各领域,然而KFCM算法存在对聚类中心的初始值敏感和不能自适应确定聚类数两个局限性。针对这两个问题,提出一种局部搜索自适应核模糊聚类方法,该方法引入核方法提高数据的可分性,并构造基于核函数的评价函数来确定最优的聚类数目和利用部分样本数据进行局部搜索以寻找初始聚类中心。人工数据和UCI数据集上的实验结果验证了该算法的有效性。  相似文献   

15.
Nearest neighbor (NN) search is emerging as an important search paradigm in a variety of applications in which objects are represented as vectors of d numeric features. However, despite decades of efforts, except for the filtering approach such as the VA-file, the current solutions to find exact kNNs are far from satisfactory for large d. The filtering approach represents vectors as compact approximations and by first scanning these smaller approximations, only a small fraction of the real vectors are visited. In this paper, we introduce the local polar coordinate file (LPC-file) using the filtering approach for nearest-neighbor searches in high-dimensional image databases. The basic idea is to partition the vector space into rectangular cells and then to approximate vectors by polar coordinates on the partitioned local cells. The LPC information significantly enhances the discriminatory power of the approximation. To demonstrate the effectiveness of the LPC-file, we conducted extensive experiments and compared the performance with the VA-file and the sequential scan by using synthetic and real data sets. The experimental results demonstrate that the LPC-file outperforms both of the VA-file and the sequential scan in total elapsed time and in the number of disk accesses and that the LPC-file is robust in both "good" distributions (such as random) and "bad" distributions (such as skewed and clustered)  相似文献   

16.
杨舒  陈浩  李军  景宁 《智能系统学报》2017,12(5):653-660
随着航天科技的飞速发展,逐渐出现了由多种异构卫星组成的卫星集群。相比于传统的卫星系统,卫星集群具有规模大、平台多、载荷异构的特点,传统的卫星任务规划方法难以适用。针对卫星集群任务规划中的关键问题——面向任务的卫星Agent团队构建问题,建立了数学模型,提出了基于分支限界的精确搜索算法,并对其时间复杂度进行了分析。针对精确算法时间复杂度较高的缺点,引入了启发式剪枝机制,并按照任务集合排序策略的不同设计了3种启发式卫星团队构建算法。最后,通过多组实验分析了卫星团队构建精确搜索算法与启发式剪枝搜索算法的性能,验证了我们提出算法的有效性和实用性。  相似文献   

17.
Z曲线网格划分的最近邻查询   总被引:1,自引:0,他引:1       下载免费PDF全文
为了解决高维空间最近邻查询问题,在网格划分的基础上,利用Z曲线对网格排序并将二维空间中的点映射到一维空间中。考虑到点的分布和网格形状对查询的影响,提出最小查询层和方向变换的概念。只要给出查询点与任意点之间的方向变换,即可求出该点所在的网格Z值,从而求出任意查询层的所有网格Z值。证明了最近邻查询只需访问至最小查询层后再访问两层。基于此提出了最近邻查询算法,它适用于数据点任意分布的情况,该算法能够得到精确解。  相似文献   

18.
提出一种在椭圆体聚类上进行主分量排序的高维索引方法, 线性访问较少的数据点就可完成k近邻搜索过程。该方法对数据集进行椭圆体聚类划分,在KL变换域上建立近似向量。在k近邻搜索过程中,采用部分失真搜索算法,按照距离下界由小到大的顺序依次搜索各个椭圆体聚类。在大型高维图像特征库上的实验表明,与其他向量近似方法相比,该索引结构降低近似向量的访问数量,能够较显著提高k近邻搜索速度。  相似文献   

19.
基于广义超曲面树的相似性搜索算法   总被引:2,自引:0,他引:2  
张兆功  李建中 《软件学报》2002,13(10):1969-1976
相似性搜索是数据挖掘的主要领域之一.它在数据库中检索出相似的数据,发现数据间的相似性.它可以应用于图像数据库、空间数据库和时间序列分析.对于欧氏空间(一种特殊的度量空间),相似性搜索算法中基于R-tree的方法,在低维时是高效的,当维数增加时,R-tre e的方法将退化为线性扫描.该现象被称为维数灾难(dimensionality curse),主要原因是存在数据重复.当数据量很大且维数很高时,距离计算和I/O操作将非常费时.提出了度量空间上新的空间分割方法和索引结构rgh-tree,利用数据库的数据对象与很少几个固定参考对象的距离信息进行数据分割和分布,产生一个各节点没有数据重复的平衡树.另外,在rgh-tree的基础上提出了相应的相似性搜索算法,该算法具有较小的I/O代价和距离计算次数,平均复杂性近似为o(n0.58).解决了目前算法存在的一些问题.  相似文献   

20.
Due to the famous dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel composite distance transformation method, which is called CDT, is proposed to support a fast κ-nearest-neighbor (κ-NN) search in high-dimensional spaces. In CDT, all (n) data points are first grouped into some clusters by a κ-Means clustering algorithm. Then a composite distance key of each data point is computed. Finally, these index keys of such n data points are inserted by a partition-based B^+-tree. Thus, given a query point, its κ-NN search in high-dimensional spaces is transformed into the search in the single dimensional space with the aid of CDT index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of the proposed scheme. Our results show that this method outperforms the state-of-the-art high-dimensional search techniques, such as the X-Tree, VA-file, iDistance and NB-Tree.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号