共查询到20条相似文献,搜索用时 0 毫秒
1.
A hyperplane based indexing technique for high-dimensional data 总被引:1,自引:0,他引:1
Guoren Wang Xiangmin Zhou Bin Wang Baiyou Qiao Donghong Han 《Information Sciences》2007,177(11):2255-2268
In this paper, we propose a novel hyperplane based indexing method to support efficient processing of similarity search queries in high-dimensional spaces. The main idea of the proposed index is to improve data partitioning efficiency in a high-dimensional space by using a hyperplane, which further partitions a subspace and can also take advantage of the twin node concept used in the key dimension based index. Compared with the key dimension concept, the hyperplane is more effective in data filtering. High space utilization is achieved by dynamically performing data reallocation between twin nodes. In addition, a post processing step is used after index building to ensure effective filtration. Extensive experiments based on two types of real data sets are conducted and the results illustrate a significantly improved filtering efficiency. Because of the feature of hyperplane, the proposed indexing method is only suitable to Euclidean spaces. 相似文献
2.
Indexing high-dimensional data for main-memory similarity search 总被引:1,自引:0,他引:1
As RAM gets cheaper and larger, in-memory processing of data becomes increasingly affordable. In this paper, we propose a novel index structure, the CSR+-tree, to support efficient high-dimensional similarity search in main memory. We introduce quantized bounding spheres (QBSs) that approximate bounding spheres (BSs) or data points. We analyze the respective pros and cons of both QBSs and the previously proposed quantized bounding rectangles (QBRs), and take the best of both worlds by carefully incorporating both of them into the CSR+-tree. We further propose a novel distance computation scheme that eliminates the need for decompressing QBSs or QBRs, which results in significant cost savings. We present an extensive experimental evaluation and analysis of the CSR+-tree, and compare its performance against that of other representative indexes in the literature. Our results show that the CSR+-tree consistently outperforms other index structures. 相似文献
3.
Changzhou Wang X. Sean Wang 《The VLDB Journal The International Journal on Very Large Data Bases》2001,9(4):344-361
Similarity queries on complex objects are usually translated into searches among their feature vectors. This paper studies
indexing techniques for very high-dimensional (e.g., in hundreds) vectors that are sparse or quasi-sparse, i.e., vectors each having only a small number (e.g., ten) of non-zero or significant values. Based on the R-tree, the paper introduces the xS-tree
that uses lossy compression of bounding regions to guarantee a reasonable minimum fan-out within the allocated storage space
for each node. In addition, the paper studies the performance and scalability of the xS-tree via experiments.
Received: 3 May 1999 / Accepted: 23 October 2000 Published online: 27 April 2001 相似文献
4.
Searching in metric spaces by spatial approximation 总被引:5,自引:0,他引:5
Gonzalo Navarro 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(1):28-46
We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them which satisfies the triangle inequality. The goal is, given a set of objects and a query, retrieve those
objects close enough to the query. The complexity measure is the number of distances computed to achieve this goal. Our data
structure, called sa-tree (“spatial approximation tree”), is based on approaching the searched objects spatially, that is, getting closer and closer
to them, rather than the classic divide-and-conquer approach of other data structures. We analyze our method and show that
the number of distance evaluations to search among n objects is sublinear. We show experimentally that the sa-tree is the best existing technique when the metric space is hard to search or the query has low selectivity. These are the most
important unsolved cases in real applications. As a practical advantage, our data structure is one of the few that does not
need to tune parameters, which makes it appealing for use by non-experts.
Edited by R. Sacks-Davis Received: 17 April 2001 / Accepted: 24 January 2002 / Published online: 14 May 2002 相似文献
5.
Surface approximation to scanned data 总被引:6,自引:0,他引:6
A method to approximate scanned data points with a B-spline surface is presented. The data are assumed to be organized in
the form of Q
i,j, i=0,…,n; j=0,…,m
i, i.e., in a row-wise fashion. The method produces a C
(p-1, q-1) continuous surface (p and q are the required degrees) that does not deviate from the data by more than a user-specified tolerance. The parametrization
of the surface is not affected negatively by the distribution of the points in each row, and it can be influenced by a user-supplied
knot vector. 相似文献
6.
Efficient similarity search for market basket data 总被引:2,自引:0,他引:2
Alexandros Nanopoulos Yannis Manolopoulos 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(2):138-152
Several organizations have developed very large market basket databases for the maintenance of customer transactions. New
applications, e.g., Web recommendation systems, present the requirement for processing similarity queries in market basket
databases. In this paper, we propose a novel scheme for similarity search queries in basket data. We develop a new representation
method, which, in contrast to existing approaches, is proven to provide correct results. New algorithms are proposed for the
processing of similarity queries. Extensive experimental results, for a variety of factors, illustrate the superiority of
the proposed scheme over the state-of-the-art method.
Edited by R. Ng. Received: August 6, 2001 / Accepted: May 21, 2002 Published online: September 25, 2002 相似文献
7.
The vector approximation file (VA-file) approach is an efficient high-dimensional indexing method for image retrieval in large database. Some extensions of VA-file have been proposed towards better query performance. However, all of these methods applying sequential scan need read the whole vector approximation file. In this paper, we present a new indexing structure based on vector approximation method, in which only a small part of approximation file need be accessed. First, principal component analysis is used to map multidimensional points to a 1D line. Then a B+-tree is built to index the approximate vector according to principal component. When performing k-nearest neighbor search, the partial distortion searching algorithm is used to reject the improper approximate vectors. Only a small set of approximate vectors need to be sequentially scanned during the search, which can reduce the CPU cost and I/O cost dramatically. Experiment results on large image databases show that the new approach provides a faster search speed than the other VA-file approaches. 相似文献
8.
Anne H.H. Ngu Quan Z. Sheng Du Q. Huynh Ron Lei 《The VLDB Journal The International Journal on Very Large Data Bases》2001,9(4):279-293
The optimized distance-based access methods currently available for multidimensional indexing in multimedia databases have
been developed based on two major assumptions: a suitable distance function is known a priori and the dimensionality of the
image features is low. It is not trivial to define a distance function that best mimics human visual perception regarding
image similarity measurements. Reducing high-dimensional features in images using the popular principle component analysis
(PCA) might not always be possible due to the non-linear correlations that may be present in the feature vectors. We propose
in this paper a fast and robust hybrid method for non-linear dimensions reduction of composite image features for indexing
in large image database. This method incorporates both the PCA and non-linear neural network techniques to reduce the dimensions
of feature vectors so that an optimized access method can be applied. To incorporate human visual perception into our system,
we also conducted experiments that involved a number of subjects classifying images into different classes for neural network
training. We demonstrate that not only can our neural network system reduce the dimensions of the feature vectors, but that
the reduced dimensional feature vectors can also be mapped to an optimized access method for fast and accurate indexing.
Received 11 June 1998 / Accepted 25 July 2000 Published online: 13 February 2001 相似文献
9.
In this work, a novel hierarchical data structure for high dimensional data indexing is proposed. MKL-tree is based on dimensionality
reduction operated by means of the MKL transform, a multi-space generalization of the KL transform. A local dimensionality
reduction is performed at each node of the tree, allowing more selective features to be extracted and thus increasing the
discriminating power of the index. The mathematical foundation for nodes and leaves representation and for the techniques
aimed to manage the structure is detailed. Moreover, the algorithms for bulk loading MKL-tree (i.e., for creating the tree
given a large number of objects simultaneously), for updating and splitting nodes after the insertion of new objects and for
performing similarity searches are described. Results are reported for the comparison of MKL-tree with other well-known access
methods in terms of I/O and CPU costs and precision of the result in the execution of similarity queries. 相似文献
10.
11.
Ada Wai-chee Fu Polly Mei-shuen Chan Yin-Ling Cheung Yiu Sang Moon 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(2):154-173
Abstract. For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional
space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach
maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees
and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only
be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply
the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation,
the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset
and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the -tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are
efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods,
and study their impact on the number of distance computation.
Received June 9, 1998 / Accepted January 31, 2000 相似文献
12.
David Gibson Jon Kleinberg Prabhakar Raghavan 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):222-236
We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical
data. By “categorical data,” we mean tables with fields that cannot be naturally ordered by a metric – e.g., the names of
producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method
for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure
arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types
of non-linear dynamical systems.
Received February 15, 1999 / Accepted August 15, 1999 相似文献
13.
Nikitas M. Sgouros 《Multimedia Systems》2003,8(6):470-481
Interactive performance systems allow multiple, networked users to take part in a performance either as players or audience
members and influence the performance's development in real time. Among the most prominent research issues concerning the
development of these systems are the provision of effective interaction capabilities and adequate means of expression to the
audience. This article describes a set of methods for the detection, analysis, synchronization and rendering of audience behavior
in real-time during such an event. Furthermore, it describes ways in which audience feedback is utilized to dynamically index
the contents of a performance so as to embed this event in large-scale multimedia services. All these methods have been applied
in the creation of MISSION, a multi-user game involving both players and audience. This article presents and analyzes the
design of the system and the results of user trials with it. 相似文献
14.
UnQL: a query language and algebra for semistructured data based on structural recursion 总被引:5,自引:0,他引:5
Peter Buneman Mary Fernandez Dan Suciu 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(1):76-110
Abstract. This paper presents structural recursion as the basis of the syntax and semantics of query languages for semistructured data
and XML. We describe a simple and powerful query language based on pattern matching and show that it can be expressed using
structural recursion, which is introduced as a top-down, recursive function, similar to the way XSL is defined on XML trees.
On cyclic data, structural recursion can be defined in two equivalent ways: as a recursive function which evaluates the data
top-down and remembers all its calls to avoid infinite loops, or as a bulk evaluation which processes the entire data in parallel
using only traditional relational algebra operators. The latter makes it possible for optimization techniques in relational
queries to be applied to structural recursion. We show that the composition of two structural recursion queries can be expressed
as a single such query, and this is used as the basis of an optimization method for mediator systems. Several other formal
properties are established: structural recursion can be expressed in first-order logic extended with transitive closure; its
data complexity is PTIME; and over relational data it is a conservative extension of the relational calculus. The underlying
data model is based on value equality, formally defined with bisimulation. Structural recursion is shown to be invariant with
respect to value equality.
Received: July 9, 1999 / Accepted: December 24, 1999 相似文献
15.
Xiaodong Wen Theodore D. Huffmire Helen H. Hu Adam Finkelstein 《Multimedia Systems》1999,7(5):350-358
We present several algorithms suitable for analysis of broadcast video. First, we show how wavelet analysis of frames of
video can be used to detect transitions between shots in a video stream, thereby dividing the stream into segments. Next we
describe how each segment can be inserted into a video database using an indexing scheme that involves a wavelet-based “signature.”
Finally, we show that during a subsequent broadcast of a similar or identical video clip, the segment can be found in the
database by quickly searching for the relevant signature. The method is robust against noise and typical variations in the
video stream, even global changes in brightness that can fool histogram-based techniques. In the paper, we compare experimentally
our shot transition mechanism to a color histogram implementation, and also evaluate the effectiveness of our database-searching
scheme. Our algorithms are very efficient and run in realtime on a desktop computer. We describe how this technology could
be employed to construct a “smart VCR” that was capable of alerting the viewer to the beginning of a specific program or identifying 相似文献
16.
Approximate similarity retrieval with M-trees 总被引:4,自引:0,他引:4
Pavel Zezula Pasquale Savino Giuseppe Amato Fausto Rabitti 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(4):275-293
Motivated by the urgent need to improve the efficiency of similarity queries, approximate similarity retrieval is investigated
in the environment of a metric tree index called the M-tree. Three different approximation techniques are proposed, which
show how to forsake query precision for improved performance. Measures are defined that can quantify the improvements in performance
efficiency and the quality of approximations. The proposed approximation techniques are then tested on various synthetic and
real-life files. The evidence obtained from the experiments confirms our hypothesis that a high-quality approximated similarity
search can be performed at a much lower cost than that needed to obtain the exact results. The proposed approximation techniques
are scalable and appear to be independent of the metric used. Extensions of these techniques to the environments of other
similarity search indexes are also discussed.
Received July 7, 1998 / Accepted October 13, 1998 相似文献
17.
18.
E. Appiani F. Cesarini A.M. Colla M. Diligenti M. Gori S. Marinai G. Soda 《International Journal on Document Analysis and Recognition》2001,4(2):69-83
In this paper a system for analysis and automatic indexing of imaged documents for high-volume applications is described.
This system, named STRETCH (STorage and RETrieval by Content of imaged documents), is based on an Archiving and Retrieval Engine, which overcomes the bottleneck of document profiling bypassing some limitations of existing pre-defined indexing schemes.
The engine exploits a structured document representation and can activate appropriate methods to characterise and automatically
index heterogeneous documents with variable layout. The originality of STRETCH lies principally in the possibility for unskilled
users to define the indexes relevant to the document domains of their interest by simply presenting visual examples and applying
reliable automatic information extraction methods (document classification, flexible reading strategies) to index the documents
automatically, thus creating archives as desired. STRETCH offers ease of use and application programming and the ability to
dynamically adapt to new types of documents. The system has been tested in two applications in particular, one concerning
passive invoices and the other bank documents. In these applications, several classes of documents are involved. The indexing
strategy first automatically classifies the document, thus avoiding pre-sorting, then locates and reads the information pertaining
to the specific document class. Experimental results are encouraging overall; in particular, document classification results
fulfill the requirements of high-volume application. Integration into production lines is under execution.
Received March 30, 2000 / Revised June 26, 2001 相似文献
19.
Simonas Šaltenis Christian S. Jensen 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(1):1-16
Real-world entities are inherently spatially and temporally referenced, and database applications increasingly exploit databases
that record the past, present, and anticipated future locations of entities, e.g., the residences of customers obtained by
the geo-coding of addresses. Indices that efficiently support queries on the spatio-temporal extents of such entities are
needed. However, past indexing research has progressed in largely separate spatial and temporal streams. Adding time dimensions
to spatial indices, as if time were a spatial dimension, neither supports nor exploits the special properties of time. On
the other hand, temporal indices are generally not amenable to extension with spatial dimensions. This paper proposes the
first efficient and versatile index for a general class of spatio-temporal data: the discretely changing spatial aspect of
an object may be a point or may have an extent; both transaction time and valid time are supported, and a generalized notion
of the current time, now, is accommodated for both temporal dimensions. The index is based on the R-tree and provides means of prioritizing space versus time, which enables it to adapt to spatially and temporally restrictive
queries. Performance experiments are reported that evaluate pertinent aspects of the index.
Edited by T. Sellis. Received: 7 December 2000 / Accepted: 1 September 2001 Published online: 18 December 2001 相似文献
20.
Abstract. The paper proposes a new method for efficient triangulation of large, unordered sets of 3D points using a CAD model comprising
NURBS entities. It is primarily aimed at engineering applications involving analysis and visualisation of measured data, such
as inspection, where a model of the object in question is available. Registration of the data to the model is the necessary
first step, enabling the triangulation to be efficiently performed in 2D, on the projections of the measured points onto the
model entities. The derived connectivity is then applied to the original 3D data. Improvement of the generated 3D mesh is
often necessary, involving mesh smoothing, constraint-based elimination of redundant triangles and merging of mesh patches.
Examples involving random measurements on aerospace and automotive free-form components are presented.
Received: 30 August 1999 / Accepted: 10 January 2000 相似文献