首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 265 毫秒
1.
提出一种在椭圆体聚类上进行主分量排序的高维索引方法, 线性访问较少的数据点就可完成k近邻搜索过程。该方法对数据集进行椭圆体聚类划分,在KL变换域上建立近似向量。在k近邻搜索过程中,采用部分失真搜索算法,按照距离下界由小到大的顺序依次搜索各个椭圆体聚类。在大型高维图像特征库上的实验表明,与其他向量近似方法相比,该索引结构降低近似向量的访问数量,能够较显著提高k近邻搜索速度。  相似文献   

2.
传统索引方法对高维数据进行近邻搜索时会面临维数灾难问题,向量近似方法是一种有效的高维检索方法。提出一种 Hadamard 变换域上的向量近似方法,在变换域能量最大的分量上建立顺序索引,然后建立近似向量文件。同时提出低維过滤算法,可以在近邻搜索过程中高效排除不匹配近似向量,减少 I/O 访问时间,提高查询效率。在大型高维图像特征库上的实验表明,该方法性能优于小波变换域的向量近似方法。  相似文献   

3.
张少敏  蔡盼  李翠平  陈红 《软件学报》2023,34(5):2413-2426
在数据量与数据复杂度不断增加的时代,大数据处理与分析成为当前的热门研究内容,高维空间数据的使用越来越频繁,数据检索和访问速度成了衡量数据处理系统性能的重要指标.因此,如何设计实现一种高效的高维索引结构,提高查询访问速率、降低内存占用,变得至关重要.近年, Kraska等人提出了学习型索引的方法.实验证明该方法在真实数据集上表现良好.之后机器学习与深度学习在数据库系统中的运用越来越广泛.众多研究者尝试在高维数据上构建学习型索引,来提升高维数据的查询速度.但是目前的高维学习型索引采用的方法并不能将数据分布的信息有效利用起来,而且过于复杂的深度学习模型使得索引初始化开销过大.结合空间区域划分与降维两种技术,提出一种新颖的高维学习型索引.它能更有效地利用数据分布信息提高索引的查询效率,并利用多段线性模型在保证查找精确度的前提下尽可能减少索引初始化的开销.分别在随机生成的数据集和开源街区地图数据集上进行实验验证.结果表明,与现有的高维索引相比,其在索引构建、查询效率、以及内存占用方面都有显著提高.  相似文献   

4.
文中提出一种支持概率k近邻查询的不确定高维索引结构--ISU-Tree.在高维空间,首先对n个不确定数据对象进行k平均聚类,然后分别对每个不确定超球进行初始"切片",并对其进行多特征编码得到对应的统一化索引键值,并且用B+树建立索引.这样,高维空间的概率查询就转变成对一维空间的启发式的范围查询及求精运算.理论及实验分析表明ISU-Tree索引能更有效地缩小搜索空间,减少积分计算的代价.在查询效率方面要明显优于其它的索引方法,尤其适合海量高维不确定数据的概率查询.  相似文献   

5.
提出一种在网格环境下的k近邻查询方法——GkNN.到目前为止,尚未有文献提出数据网格环境下的k近邻查询算法.当用户在查询节点提交一个查询向量和k,首先以一个较小的查询半径。在数据节点进行基于双重距离尺度的向量缩减,然后将缩减后的向量按照向量“打包”传输的方式发送到执行节点,在执行节点并行地对这些候选向量进行距离(求精)运算.最终将结果向量返回到查询节点.当返回的向量个数小于k时,扩大半径值,继续循环直到得到k个最近邻向量为止.理论分析和实验证明该方法在减少网络通信开销、增加I/O和CPU并行、降低-向应时间方面具有较好的性能,非常适合海量高维数据的查询.  相似文献   

6.
高维索引作为基于内容检索和模式识别等领域的一项关键技术,其性能直接影响整个系统的查询速度和准确率,但高维情况下的 “维度灾难”一直制约着相应检索性能的提高。通过分析小世界模型,提出了完整的逐跳逼近索引算法,该算法仅维护点与点在度量空间上的局部邻近关系,通过将查询过程的“关注点”逐步往查询命中区域跳跃逼近来实现高维空间数据点间的范围查询和近似近邻查询。实验证明该方法在不依赖索引数据的先验分布情况下能有效地处理高维数据向量的检索,且具有良好的可维护性与拓展性。  相似文献   

7.
高维索引技术作为高维空间数据的快速查询手段,对使用高维数据的基于内容图像检索有着广泛的应用。本文提出以Guttm an提出的R树结构建立存储图像的特征值的高维索引结构来提高图像检索效率。首先对R树的结构进行介绍,然后通过对比相同情况下使用线性查询和R树查询各自的查询次数和查询时间分析R树查询的优势。实验结果表明,利用R树结构可以减少图像检索的查询次数和查询时间,明显地提高图像检索的效率。  相似文献   

8.
遥感高光谱数据是一种具有空间聚集特性的高维数据。对PT方法进行改进使之与iDistance的索引机制相适应,并融合这两种不同的空间划分策略,提出一种适用于高光谱数据的索引结构。该索引是一种度量空间的高维索引,采用两级空间划分,在处理光谱相似性查询时可同时完成针对距离和空间方位的数据过滤。实验证明该索引可以有效降低I/O和距离计算次数,具有较高的剪枝效率,适用于高光谱数据相似性查询。  相似文献   

9.
针对MapReduce数据块处理机制、高维数据分布特征和KNN查询需求,本文设计一种基于B 树的高维索引结构(iPartition),创新性提出基于主成分区分度的优化数据划分策略和邻接数据域分散存储等原则,将数据均匀划分到不同的Slave节点,使尽可能多的数据域对计算共同贡献,提升MapReduce任务处理并行性;利用B 树构造分布式的双层索引实现查询时数据范围快速过滤,降低高维计算代价。实验表明,iPartition在高维数据近似查询环境下,具有良好的性能和扩展性。  相似文献   

10.
真实世界中,常存在很多障碍物,影响空间对象到查询点的可见性及距离,可见k近邻查询查找距查询点最近的k个可见对象,是时空查询领域的一类重要算法.由于度量设备误差以及通信开销的限制等因素,空间对象位置不确定因素广泛存在.文中拟对不确定对象执行可见k近邻查询,提出了概率可见k近邻(PVkNN)查询,即查找前k个成为查询点最近邻居概率最大的节点.为了高效地执行这一查询,文中提出了k-界限剪枝方法,基于可见质心的紧缩过滤以及对不可见对象的剪枝策略,从空间角度过滤掉不符合条件的对象.为避免对候选集合中每个对象的概率都进行精确计算,从概率角度提出了根据概率上下限来对候选集合进行进一步的求精方法,采用近似采样技术来获取可见区域的比例,实现了对PVkNN的高效计算.采用真实和模拟数据集设计实验,充分验证了算法的效率和精度.  相似文献   

11.
Efficient high-dimensional indexing by sorting principal component   总被引:1,自引:0,他引:1  
The vector approximation file (VA-file) approach is an efficient high-dimensional indexing method for image retrieval in large database. Some extensions of VA-file have been proposed towards better query performance. However, all of these methods applying sequential scan need read the whole vector approximation file. In this paper, we present a new indexing structure based on vector approximation method, in which only a small part of approximation file need be accessed. First, principal component analysis is used to map multidimensional points to a 1D line. Then a B+-tree is built to index the approximate vector according to principal component. When performing k-nearest neighbor search, the partial distortion searching algorithm is used to reject the improper approximate vectors. Only a small set of approximate vectors need to be sequentially scanned during the search, which can reduce the CPU cost and I/O cost dramatically. Experiment results on large image databases show that the new approach provides a faster search speed than the other VA-file approaches.  相似文献   

12.
Multilevel indexes have long been used for accessing records in sorted files. Given the access cost at each level, the total cost of retrieving a record from the file can be substantially reduced by selecting the proper size of the index at each level. Organizations involving a variable number of levels are covered and binary searching is compared to sequential searching.  相似文献   

13.
Vertical partitioning is a design technique for reducing the number of disk accesses to execute a given set of queries by minimizing the number of irrelevant instance variables accessed. This is accomplished by grouping the frequently accessed instance variables as vertical class fragments. The complexity of object-oriented database models due to subclass hierarchy and class composition hierarchy complicates the definition and representation of vertical partitioning of the classes, which makes the problem of vertical partitioning in OODBs very challenging. In this paper, we develop a comprehensive analytical cost model for processing of queries on vertically partitioned OODB classes. A set of analytical evaluation results is presented to show the effect of vertical partitioning, and to study the trade-off between the projection ratio versus selectivity factor vis-a-vis sequential versus index access. Furthermore, an empirical experimental prototype supporting vertical class partitioning has been implemented on a commercial OODB tool kit to validate our analytical cost model.  相似文献   

14.
赵奇  陈燕  何云  徐敬东 《计算机工程》2007,33(6):147-149
提出一种提高无结构型对等网络查询效率的机制。在该机制下,节点根据地理位置自动聚类,类之间用Chord方式组合起来,从而减轻了逻辑网络与物理网络拓扑结构的不匹配。为了进一步提高查询效率,引入了一种类间索引技术。该技术使得查询消息不需要遍历所有的类就能获得全局搜索结果。与Gnutella中的洪泛滥查询相比,在TTL=5的情况下,该机制最多能减少超过80%的资源开销,最多可以将响应时间缩短59%。  相似文献   

15.
In order to improve the performance and efficiency of truss structure optimization, this paper presents a general framework that embeds and seamlessly integrates commercial CAD and CAE software through common programming languages and application programming interface (API). Along with the automatic CAD/CAE integration, an adaptive metamodel-based optimization called sequential radial basis function (SRBF) is applied to truss structure optimization involving sizing, geometry and topology variables. SRBF distinguishingly features two-loops searching strategy, the “inner loop” and the “outer loop”. The “inner loop” aims to search a feasible point through updating the factors of the augmented Lagrangian function. With the improved significant sampling space (ISSS) method, the “outer loop” sequentially generates new additional samples to update the RBF model. The continuous relaxation method is developed to deal with the mixed-discrete variables during the truss structure optimization. Applied to practical truss structure optimization problems from small scale to large scale, the proposed framework demonstrates feasibility of the CAD/CAE integration system during the structure modeling and analysis, and facilitates the truss structure optimization process. The comparison results between the SRBF and other approaches show that SRBF improves merit of searching global optimum and reduces the computation cost.  相似文献   

16.
本文给出一种以词语为索引项的索引文件存储结构,以及基于这种结构的索引查询算法.首先分析中文索引库的分布规律,接着在此基础上设计了一种逆序存储的三层索引结构,这种结构在创建索引时能根据词语频率自动调整存储顺序,最后给出一种基于自动机和逆向最大匹配的索引查询算法.实验系统TIFS将三层索引结构与B树、哈希方法在时间和空间复杂度方面进行对比,结果表明,对于大规模的中文文本检索,三层索引结构的综合效果最好.  相似文献   

17.
传统的多模式匹配算法是用树型结构的有限自动机实现的 ,它具有很多缺点 .本文提出的多模式匹配算法是基于有序二叉树的多模式匹配算法 .实验证明 ,本文算法不但具有和传统算法相当的查找速度 ,而且构造速度快、内存耗费少 .因此 ,本文提出的算法特别适用于要求动态构造自动机的情况  相似文献   

18.
在大规模多媒体数据库中进行基于内容的检索,高维数据牵引结构的研究是重要问题,提出了一种有效的高维索引结构-自适应近似树,阐述了它的结构,给出了构建和检索算法,它结合了树结构和顺序检索的共同优点,针对不同的数据分布情况可以自适应地调整结构,维数较低或数据分布偏斜较大时它呈现树的结构,高维或数据分布密集时呈现顺序扫描的结构,以达到更优的检索效率,在结构上,对MBR使用了压缩存储的方法以节省存储空间,在算法中充分利用了空间划分是MBS和MBR共存的特点,减少了大量复杂的计算,从而大大提高检索效率。  相似文献   

19.
This article deals with the group interview problem, in which each group contains several alternatives and each group of alternatives is presented and evaluated sequentially over time. We derive the optimal selection strategy for the group interview problem with a general utility function. Among the various types of utility function, we focus on the best choice problem, in which our utility is one if we successfully select the best choice and zero otherwise. We derive a simple selection rule called the optimal partitioning strategy in which the decision-maker divides the entire groups into two disjoint sets and, after evaluating the choices in the first set, chooses the relatively best choice available for the first time in the second set. Because the selected choice is not necessarily the absolutely best choice, we also consider the probability distribution of the actual rank of the choice selected under the partitioning strategy.

Scope and purpose

In many managerial decision situations such as buying a car, selling a house, or searching for a job, several alternatives are presented sequentially and an accept-or-reject decision is made immediately after evaluating each alternative. The classical secretary problem and its extensions have been successfully applied to such a sequential search and selection problem. This article deals with a more generalized version of the secretary problem, called the group interview problem, in which several groups of alternatives are presented and evaluated sequentially over time. Based on a stochastic dynamic programming approach, we propose the optimal selection strategy for the group interview problem with various types of the decision-maker's utility function. There are many potential applications of the group interview problem, including consumer search and purchase process, job search problem, sequential assignment of batch jobs, and so on.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号