期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A hyperplane based indexing technique for high-dimensional data 总被引：1，自引：0，他引：1

Guoren Wang Xiangmin Zhou Bin Wang Baiyou Qiao Donghong Han 《Information Sciences》2007,177(11):2255-2268

In this paper, we propose a novel hyperplane based indexing method to support efficient processing of similarity search queries in high-dimensional spaces. The main idea of the proposed index is to improve data partitioning efficiency in a high-dimensional space by using a hyperplane, which further partitions a subspace and can also take advantage of the twin node concept used in the key dimension based index. Compared with the key dimension concept, the hyperplane is more effective in data filtering. High space utilization is achieved by dynamically performing data reallocation between twin nodes. In addition, a post processing step is used after index building to ensure effective filtration. Extensive experiments based on two types of real data sets are conducted and the results illustrate a significantly improved filtering efficiency. Because of the feature of hyperplane, the proposed indexing method is only suitable to Euclidean spaces. 相似文献

2.

Indexing high-dimensional data for main-memory similarity search 总被引：1，自引：0，他引：1

Xiaohui Yu Junfeng Dong 《Information Systems》2010

As RAM gets cheaper and larger, in-memory processing of data becomes increasingly affordable. In this paper, we propose a novel index structure, the CSR⁺-tree, to support efficient high-dimensional similarity search in main memory. We introduce quantized bounding spheres (QBSs) that approximate bounding spheres (BSs) or data points. We analyze the respective pros and cons of both QBSs and the previously proposed quantized bounding rectangles (QBRs), and take the best of both worlds by carefully incorporating both of them into the CSR⁺-tree. We further propose a novel distance computation scheme that eliminates the need for decompressing QBSs or QBRs, which results in significant cost savings. We present an extensive experimental evaluation and analysis of the CSR⁺-tree, and compare its performance against that of other representative indexes in the literature. Our results show that the CSR⁺-tree consistently outperforms other index structures. 相似文献

3.

Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches

Changzhou Wang X. Sean Wang 《The VLDB Journal The International Journal on Very Large Data Bases》2001,9(4):344-361

Similarity queries on complex objects are usually translated into searches among their feature vectors. This paper studies indexing techniques for very high-dimensional (e.g., in hundreds) vectors that are sparse or quasi-sparse, i.e., vectors each having only a small number (e.g., ten) of non-zero or significant values. Based on the R-tree, the paper introduces the xS-tree that uses lossy compression of bounding regions to guarantee a reasonable minimum fan-out within the allocated storage space for each node. In addition, the paper studies the performance and scalability of the xS-tree via experiments. Received: 3 May 1999 / Accepted: 23 October 2000 Published online: 27 April 2001 相似文献

4.

Searching in metric spaces by spatial approximation 总被引：5，自引：0，他引：5

Gonzalo Navarro 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(1):28-46

We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The complexity measure is the number of distances computed to achieve this goal. Our data structure, called sa-tree (“spatial approximation tree”), is based on approaching the searched objects spatially, that is, getting closer and closer to them, rather than the classic divide-and-conquer approach of other data structures. We analyze our method and show that the number of distance evaluations to search among n objects is sublinear. We show experimentally that the sa-tree is the best existing technique when the metric space is hard to search or the query has low selectivity. These are the most important unsolved cases in real applications. As a practical advantage, our data structure is one of the few that does not need to tune parameters, which makes it appealing for use by non-experts. Edited by R. Sacks-Davis Received: 17 April 2001 / Accepted: 24 January 2002 / Published online: 14 May 2002 相似文献

5.

Surface approximation to scanned data 总被引：6，自引：0，他引：6

L.A. Piegl W. Tiller 《The Visual computer》2000,16(7):386-395

A method to approximate scanned data points with a B-spline surface is presented. The data are assumed to be organized in the form of Q _i,j, i=0,…,n; j=0,…,m _i, i.e., in a row-wise fashion. The method produces a C ^{(p-1, q-1)} continuous surface (p and q are the required degrees) that does not deviate from the data by more than a user-specified tolerance. The parametrization of the surface is not affected negatively by the distribution of the points in each row, and it can be influenced by a user-supplied knot vector. 相似文献

6.

Efficient similarity search for market basket data 总被引：2，自引：0，他引：2

Alexandros Nanopoulos Yannis Manolopoulos 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(2):138-152

Several organizations have developed very large market basket databases for the maintenance of customer transactions. New applications, e.g., Web recommendation systems, present the requirement for processing similarity queries in market basket databases. In this paper, we propose a novel scheme for similarity search queries in basket data. We develop a new representation method, which, in contrast to existing approaches, is proven to provide correct results. New algorithms are proposed for the processing of similarity queries. Extensive experimental results, for a variety of factors, illustrate the superiority of the proposed scheme over the state-of-the-art method. Edited by R. Ng. Received: August 6, 2001 / Accepted: May 21, 2002 Published online: September 25, 2002 相似文献

7.

Efficient high-dimensional indexing by sorting principal component 总被引：1，自引：0，他引：1

Jiangtao Cui Shuisheng Zhou Junding Sun 《Pattern recognition letters》2007,28(16):2412-2418

The vector approximation file (VA-file) approach is an efficient high-dimensional indexing method for image retrieval in large database. Some extensions of VA-file have been proposed towards better query performance. However, all of these methods applying sequential scan need read the whole vector approximation file. In this paper, we present a new indexing structure based on vector approximation method, in which only a small part of approximation file need be accessed. First, principal component analysis is used to map multidimensional points to a 1D line. Then a B⁺-tree is built to index the approximate vector according to principal component. When performing k-nearest neighbor search, the partial distortion searching algorithm is used to reject the improper approximate vectors. Only a small set of approximate vectors need to be sequentially scanned during the search, which can reduce the CPU cost and I/O cost dramatically. Experiment results on large image databases show that the new approach provides a faster search speed than the other VA-file approaches. 相似文献

8.

Combining multi-visual features for efficient indexing in a large image database

Anne H.H. Ngu Quan Z. Sheng Du Q. Huynh Ron Lei 《The VLDB Journal The International Journal on Very Large Data Bases》2001,9(4):279-293

The optimized distance-based access methods currently available for multidimensional indexing in multimedia databases have been developed based on two major assumptions: a suitable distance function is known a priori and the dimensionality of the image features is low. It is not trivial to define a distance function that best mimics human visual perception regarding image similarity measurements. Reducing high-dimensional features in images using the popular principle component analysis (PCA) might not always be possible due to the non-linear correlations that may be present in the feature vectors. We propose in this paper a fast and robust hybrid method for non-linear dimensions reduction of composite image features for indexing in large image database. This method incorporates both the PCA and non-linear neural network techniques to reduce the dimensions of feature vectors so that an optimized access method can be applied. To incorporate human visual perception into our system, we also conducted experiments that involved a number of subjects classifying images into different classes for neural network training. We demonstrate that not only can our neural network system reduce the dimensions of the feature vectors, but that the reduced dimensional feature vectors can also be mapped to an optimized access method for fast and accurate indexing. Received 11 June 1998 / Accepted 25 July 2000 Published online: 13 February 2001 相似文献

9.

MKL-tree: an index structure for high-dimensional vector spaces

Annalisa Franco Alessandra Lumini Dario Maio 《Multimedia Systems》2007,12(6):533-550

In this work, a novel hierarchical data structure for high dimensional data indexing is proposed. MKL-tree is based on dimensionality reduction operated by means of the MKL transform, a multi-space generalization of the KL transform. A local dimensionality reduction is performed at each node of the tree, allowing more selective features to be extracted and thus increasing the discriminating power of the index. The mathematical foundation for nodes and leaves representation and for the techniques aimed to manage the structure is detailed. Moreover, the algorithms for bulk loading MKL-tree (i.e., for creating the tree given a large number of objects simultaneously), for updating and splitting nodes after the insertion of new objects and for performing similarity searches are described. Results are reported for the comparison of MKL-tree with other well-known access methods in terms of I/O and CPU costs and precision of the result in the execution of similarity queries. 相似文献

10.

An efficient algorithm for hyperspherical range query processing in high-dimensional data space

Dong-Ho Lee Shin Heu 《Information Processing Letters》2002,83(2):115-123

相似文献

11.

Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances 总被引：1，自引：0，他引：1

Ada Wai-chee Fu Polly Mei-shuen Chan Yin-Ling Cheung Yiu Sang Moon 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(2):154-173

Abstract. For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation, the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the -tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods, and study their impact on the number of distance computation. Received June 9, 1998 / Accepted January 31, 2000 相似文献

12.

Clustering categorical data: an approach based on dynamical systems

David Gibson Jon Kleinberg Prabhakar Raghavan 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):222-236

We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data,” we mean tables with fields that cannot be naturally ordered by a metric – e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of non-linear dynamical systems. Received February 15, 1999 / Accepted August 15, 1999 相似文献

13.

Analysis,management and indexing of distributed multimedia performances based on audience feedback

Nikitas M. Sgouros 《Multimedia Systems》2003,8(6):470-481

Interactive performance systems allow multiple, networked users to take part in a performance either as players or audience members and influence the performance's development in real time. Among the most prominent research issues concerning the development of these systems are the provision of effective interaction capabilities and adequate means of expression to the audience. This article describes a set of methods for the detection, analysis, synchronization and rendering of audience behavior in real-time during such an event. Furthermore, it describes ways in which audience feedback is utilized to dynamically index the contents of a performance so as to embed this event in large-scale multimedia services. All these methods have been applied in the creation of MISSION, a multi-user game involving both players and audience. This article presents and analyzes the design of the system and the results of user trials with it. 相似文献

14.

UnQL: a query language and algebra for semistructured data based on structural recursion 总被引：5，自引：0，他引：5

Peter Buneman Mary Fernandez Dan Suciu 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(1):76-110

Abstract. This paper presents structural recursion as the basis of the syntax and semantics of query languages for semistructured data and XML. We describe a simple and powerful query language based on pattern matching and show that it can be expressed using structural recursion, which is introduced as a top-down, recursive function, similar to the way XSL is defined on XML trees. On cyclic data, structural recursion can be defined in two equivalent ways: as a recursive function which evaluates the data top-down and remembers all its calls to avoid infinite loops, or as a bulk evaluation which processes the entire data in parallel using only traditional relational algebra operators. The latter makes it possible for optimization techniques in relational queries to be applied to structural recursion. We show that the composition of two structural recursion queries can be expressed as a single such query, and this is used as the basis of an optimization method for mediator systems. Several other formal properties are established: structural recursion can be expressed in first-order logic extended with transitive closure; its data complexity is PTIME; and over relational data it is a conservative extension of the relational calculus. The underlying data model is based on value equality, formally defined with bisimulation. Structural recursion is shown to be invariant with respect to value equality. Received: July 9, 1999 / Accepted: December 24, 1999 相似文献

15.

Wavelet-based video indexing and querying

Xiaodong Wen Theodore D. Huffmire Helen H. Hu Adam Finkelstein 《Multimedia Systems》1999,7(5):350-358

We present several algorithms suitable for analysis of broadcast video. First, we show how wavelet analysis of frames of video can be used to detect transitions between shots in a video stream, thereby dividing the stream into segments. Next we describe how each segment can be inserted into a video database using an indexing scheme that involves a wavelet-based “signature.” Finally, we show that during a subsequent broadcast of a similar or identical video clip, the segment can be found in the database by quickly searching for the relevant signature. The method is robust against noise and typical variations in the video stream, even global changes in brightness that can fool histogram-based techniques. In the paper, we compare experimentally our shot transition mechanism to a color histogram implementation, and also evaluate the effectiveness of our database-searching scheme. Our algorithms are very efficient and run in realtime on a desktop computer. We describe how this technology could be employed to construct a “smart VCR” that was capable of alerting the viewer to the beginning of a specific program or identifying 相似文献

16.

Approximate similarity retrieval with M-trees 总被引：4，自引：0，他引：4

Pavel Zezula Pasquale Savino Giuseppe Amato Fausto Rabitti 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(4):275-293

Motivated by the urgent need to improve the efficiency of similarity queries, approximate similarity retrieval is investigated in the environment of a metric tree index called the M-tree. Three different approximation techniques are proposed, which show how to forsake query precision for improved performance. Measures are defined that can quantify the improvements in performance efficiency and the quality of approximations. The proposed approximation techniques are then tested on various synthetic and real-life files. The evidence obtained from the experiments confirms our hypothesis that a high-quality approximated similarity search can be performed at a much lower cost than that needed to obtain the exact results. The proposed approximation techniques are scalable and appear to be independent of the metric used. Extensions of these techniques to the environments of other similarity search indexes are also discussed. Received July 7, 1998 / Accepted October 13, 1998 相似文献

17.

Polygonal boundary approximation for a 2D general sweep based on envelope and boolean operations

Joo-Haeng Lee Sung Je Hong Myung-Soo Kim 《The Visual computer》2000,16(3-4):208-240

相似文献

18.

Automatic document classification and indexing in high-volume applications

E. Appiani F. Cesarini A.M. Colla M. Diligenti M. Gori S. Marinai G. Soda 《International Journal on Document Analysis and Recognition》2001,4(2):69-83

In this paper a system for analysis and automatic indexing of imaged documents for high-volume applications is described. This system, named STRETCH (STorage and RETrieval by Content of imaged documents), is based on an Archiving and Retrieval Engine, which overcomes the bottleneck of document profiling bypassing some limitations of existing pre-defined indexing schemes. The engine exploits a structured document representation and can activate appropriate methods to characterise and automatically index heterogeneous documents with variable layout. The originality of STRETCH lies principally in the possibility for unskilled users to define the indexes relevant to the document domains of their interest by simply presenting visual examples and applying reliable automatic information extraction methods (document classification, flexible reading strategies) to index the documents automatically, thus creating archives as desired. STRETCH offers ease of use and application programming and the ability to dynamically adapt to new types of documents. The system has been tested in two applications in particular, one concerning passive invoices and the other bank documents. In these applications, several classes of documents are involved. The indexing strategy first automatically classifies the document, thus avoiding pre-sorting, then locates and reads the information pertaining to the specific document class. Experimental results are encouraging overall; in particular, document classification results fulfill the requirements of high-volume application. Integration into production lines is under execution. Received March 30, 2000 / Revised June 26, 2001 相似文献

19.

Indexing of now-relative spatio-bitemporal data

Simonas Šaltenis Christian S. Jensen 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(1):1-16

Real-world entities are inherently spatially and temporally referenced, and database applications increasingly exploit databases that record the past, present, and anticipated future locations of entities, e.g., the residences of customers obtained by the geo-coding of addresses. Indices that efficiently support queries on the spatio-temporal extents of such entities are needed. However, past indexing research has progressed in largely separate spatial and temporal streams. Adding time dimensions to spatial indices, as if time were a spatial dimension, neither supports nor exploits the special properties of time. On the other hand, temporal indices are generally not amenable to extension with spatial dimensions. This paper proposes the first efficient and versatile index for a general class of spatio-temporal data: the discretely changing spatial aspect of an object may be a point or may have an extent; both transaction time and valid time are supported, and a generalized notion of the current time, now, is accommodated for both temporal dimensions. The index is based on the R-tree and provides means of prioritizing space versus time, which enables it to adapt to spatially and temporally restrictive queries. Performance experiments are reported that evaluate pertinent aspects of the index. Edited by T. Sellis. Received: 7 December 2000 / Accepted: 1 September 2001 Published online: 18 December 2001 相似文献

20.

Efficient CAD-based triangulation of unordered 3D data

Mihailo Ristic Djordje Brujic Surya Handayani 《Machine Vision and Applications》2000,12(2):98-106

Abstract. The paper proposes a new method for efficient triangulation of large, unordered sets of 3D points using a CAD model comprising NURBS entities. It is primarily aimed at engineering applications involving analysis and visualisation of measured data, such as inspection, where a model of the object in question is available. Registration of the data to the model is the necessary first step, enabling the triangulation to be efficiently performed in 2D, on the projections of the measured points onto the model entities. The derived connectivity is then applied to the original 3D data. Improvement of the generated 3D mesh is often necessary, involving mesh smoothing, constraint-based elimination of redundant triangles and merging of mesh patches. Examples involving random measurements on aerospace and automotive free-form components are presented. Received: 30 August 1999 / Accepted: 10 January 2000 相似文献