共查询到20条相似文献,搜索用时 46 毫秒
1.
Similarity searching in metric spaces has a vast number of applications in several fields like multimedia databases, text
retrieval, computational biology, and pattern recognition. In this context, one of the most important similarity queries is
the k nearest neighbor (k-NN) search. The standard best-first k-NN algorithm uses a lower bound on the distance to prune objects during the search. Although optimal in several aspects,
the disadvantage of this method is that its space requirements for the priority queue that stores unprocessed clusters can
be linear in the database size. Most of the optimizations used in spatial access methods (for example, pruning using MinMaxDist)
cannot be applied in metric spaces, due to the lack of geometric properties. We propose a new k-NN algorithm that uses distance estimators, aiming to reduce the storage requirements of the search algorithm. The method stays optimal, yet it can significantly prune
the priority queue without altering the output of the query. Experimental results with synthetic and real datasets confirm
the reduction in storage space of our proposed algorithm, showing savings of up to 80% of the original space requirement.
Benjamin Bustos is an assistant professor in the Department of Computer Science at the University of Chile. He is also a researcher at the Millennium Nucleus Center for Web Research. His research interests are similarity searching and multimedia information retrieval. He has a doctoral degree in natural sciences from the University of Konstanz, Germany. Contact him at bebustos@dcc.uchile.cl. Gonzalo Navarro earned his PhD in Computer Science at the University of Chile in 1998, where he is now Full Professor. His research interests include similarity searching, text databases, compression, and algorithms and data structures in general. He has coauthored a book on string matching and around 200 international papers. He has (co)chaired international conferences SPIRE 2001, SCCC 2004, SPIRE 2005, SIGIR Posters 2005, IFIP TCS 2006, and ENC 2007 Scalable Pattern Recognition track; and belongs to the Editorial Board of Information Retrieval Journal. He is currently Head of the Department of Computer Science at University of Chile, and Head of the Millenium Nucleus Center for Web Research, the largest Chilean project in Computer Science research. 相似文献
Gonzalo NavarroEmail: |
Benjamin Bustos is an assistant professor in the Department of Computer Science at the University of Chile. He is also a researcher at the Millennium Nucleus Center for Web Research. His research interests are similarity searching and multimedia information retrieval. He has a doctoral degree in natural sciences from the University of Konstanz, Germany. Contact him at bebustos@dcc.uchile.cl. Gonzalo Navarro earned his PhD in Computer Science at the University of Chile in 1998, where he is now Full Professor. His research interests include similarity searching, text databases, compression, and algorithms and data structures in general. He has coauthored a book on string matching and around 200 international papers. He has (co)chaired international conferences SPIRE 2001, SCCC 2004, SPIRE 2005, SIGIR Posters 2005, IFIP TCS 2006, and ENC 2007 Scalable Pattern Recognition track; and belongs to the Editorial Board of Information Retrieval Journal. He is currently Head of the Department of Computer Science at University of Chile, and Head of the Millenium Nucleus Center for Web Research, the largest Chilean project in Computer Science research. 相似文献
2.
Nearest Neighbor (NN) search has been in the core of spatial and spatiotemporal database research during the last decade.
The literature on NN query processing algorithms so far deals with either stationary or moving query points over static datasets
or future (predicted) locations over a set of continuously moving points. With the increasing number of Mobile Location Services
(MLS), the need for effective k-NN query processing over historical trajectory data has become the vehicle for data analysis, thus improving existing or
even proposing new services. In this paper, we investigate mechanisms to perform NN search on R-tree-like structures storing
historical information about moving object trajectories. The proposed (depth-first and best-first) algorithms vary with respect
to the type of the query object (stationary or moving point) as well as the type of the query result (historical continuous
or not), thus resulting in four types of NN queries. We also propose novel metrics to support our search ordering and pruning
strategies. Using the implementation of the proposed algorithms on two members of the R-tree family for trajectory data (namely,
the TB-tree and the 3D-R-tree), we demonstrate their scalability and efficiency through an extensive experimental study using
large synthetic and real datasets.
相似文献
Yannis Theodoridis (Corresponding author)Email: URL: http://dke.cti.gr http://isl.cs.unipi.gr/db |
3.
Similarity search implemented via k-nearest neighbor—
k-NN queries on multidimensional indices is an extremely useful paradigm for content-based image retrieval. As the dimensionality
of feature vectors increases the curse of dimensionality sets in, i.e., the performance of k-NN search of disk-resident indices in the R-tree family degrades rapidly due to the overlap in index pages in high dimensions.
This problem is dealt with in this study by utilizing the double filtering effect of clustering and indexing. The clustering
algorithm ensures that the largest cluster fits into main memory and that only clusters closest to a query point need to be
searched and hence loaded into main memory. We organize the data in each cluster according to the ordered-partition—OP-tree
main memory resident index, which is not prone to the curse of dimensionality and highly efficient for processing k-NN queries. We serialize an OP-tree by writing its dynamically allocated nodes into contiguous memory locations, optimize
its parameters, and make it persistent by writing it to disk. The time to read and write clusters constituting an OP-tree
with a single sequential access to disk benefits from higher data transfer rates of modern disk drives. The performance of
the index is further improved by applying the Karhunen–Loève transformation—KLT to the dataset, since this results in a more efficient computation of distances for k-NN queries. We compare OP-trees and sequential scans with and without a KL-transformation and with and without using a shortcut
method in calculating Euclidean distances. A comparison against the OMNI-sequential scan is also reported. We finally compare
a clustered and persistent version of the OP-tree against a clustered version of the SR-tree and the VA-file method. CPU time
is measured and elapsed time is estimated in this study. It is observed that the OP-tree index outperforms the other two methods
and that the improvement increases with the number of dimensions.
相似文献
Lijuan ZhangEmail: |
4.
RRSi: indexing XML data for proximity twig queries 总被引:2,自引:2,他引:0
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing
is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are
heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore
some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this
paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting
proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show
good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural
indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show
that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
相似文献
Vincent T. Y. NgEmail: |
5.
Krishnaprasad Thirunarayan Trivikram Immaneni 《Journal of Intelligent Information Systems》2009,32(2):139-162
Text search engines are inadequate for indexing and searching XML documents because they ignore metadata and aggregation structure
implicit in the XML documents. On the other hand, the query languages supported by specialized XML search engines are very
complex. In this paper, we present a simple yet flexible query language, and develop its semantics to enable intuitively appealing
extraction of relevant fragments of information while simultaneously falling back on retrieval through plain text search if necessary. Our approach combines and generalizes several available techniques to obtain precise
and coherent results.
相似文献
Trivikram ImmaneniEmail: URL: http://www.cs.wright.edu/~tkprasad |
6.
Exploiting maximal redundancy to optimize SQL queries 总被引:1,自引:1,他引:0
Detecting and dealing with redundancy is an ubiquitous problem in query optimization, which manifests itself in many areas
of research such as materialized views, multi-query optimization, and query-containment algorithms. In this paper, we focus
on the issue of intra-query redundancy, redundancy present within a query. We present a method to detect the maximal redundancy present between a main (outer) query block and a subquery block.
We then use the method for query optimization, introducing query plans and a new operator that take full advantage of the
redundancy discovered. Our approach can deal with redundancy in a wider spectrum of queries than existing techniques. We show
experimental evidence that our approach works under certain conditions, and compares favorably to existing optimization techniques
when applicable.
相似文献
Antonio BadiaEmail: |
7.
In spite of significant improvements in video data retrieval, a system has not yet been developed that can adequately respond
to a user’s query. Typically, the user has to refine the query many times and view query results until eventually the expected
videos are retrieved from the database. The complexity of video data and questionable query structuring by the user aggravates
the retrieval process. Most previous research in this area has focused on retrieval based on low-level features. Managing
imprecise queries using semantic (high-level) content is no easier than queries based on low-level features due to the absence
of a proper continuous distance function. We provide a method to help users search for clips and videos of interest in video
databases. The video clips are classified as interesting and uninteresting based on user browsing. The attribute values of clips are classified by commonality, presence, and frequency within each
of the two groups to be used in computing the relevance of each clip to the user’s query. In this paper, we provide an intelligent
query structuring system, called I-Quest, to rank clips based on user browsing feedback, where a template generation from the set of interesting and uninteresting
sets is impossible or yields poor results.
相似文献
Ramazan Savaş Aygün (Corresponding author)Email: |
8.
An analysis of how training data complexity affects the nearest neighbor classifiers 总被引:3,自引:2,他引:1
The k-nearest neighbors (k-NN) classifier is one of the most popular supervised classification methods. It is very simple, intuitive and accurate in
a great variety of real-world domains. Nonetheless, despite its simplicity and effectiveness, practical use of this rule has
been historically limited due to its high storage requirements and the computational costs involved. On the other hand, the
performance of this classifier appears strongly sensitive to training data complexity. In this context, by means of several
problem difficulty measures, we try to characterize the behavior of the k-NN rule when working under certain situations. More specifically, the present analysis focuses on the use of some data complexity
measures to describe class overlapping, feature space dimensionality and class density, and discover their relation with the
practical accuracy of this classifier.
相似文献
J. S. SánchezEmail: |
9.
Mental image search by boolean composition of region categories 总被引:1,自引:0,他引:1
Existing content-based image retrieval paradigms almost never address the problem of starting the search, when the user has no starting example image but rather a mental image. We propose a new image retrieval system to allow the user to perform mental image search by formulating boolean composition of region categories. The query interface is a region photometric thesaurus which can be viewed as a visual summary of salient regions available in the database. It is generated from the unsupervised clustering of regions with similar visual content into categories. In this thesaurus, the user simply selects the types of regions which should and should not be present in the mental image (boolean composition). The natural use of inverted tables on the region category labels enables powerful boolean search and very fast retrieval in large image databases. The process of query and search of images relates to that of documents with Google. The indexing scheme is fully unsupervised and the query mode requires minimal user interaction (no example image to provide, no sketch to draw). We demonstrate the feasibility of such a framework to reach the user mental target image with two applications: a photo-agency scenario on Corel Photostock and a TV news scenario. Perspectives will be proposed for this simple and innovative framework, which should motivate further development in various research areas.
相似文献
Nozha BoujemaaEmail: URL: http://www-rocq.inria.fr/imedia/ |
10.
SPARQL graph pattern rewriting for OWL-DL inference queries 总被引:1,自引:1,他引:0
This paper focuses on the issue of OWL-DL ontology queries implemented in SPARQL. Currently, ontology repositories construct
inference ontology models, and match SPARQL queries to the models, to derive inference results. Because an inference model
uses much more storage space than the original model, and cannot be reused as inference requirements vary, this method is
not suitable for large-scale deployment. To solve this problem, this paper proposes a novel method that passes rewritten SPARQL
queries to the original ontology model, to retrieve inference results. We define OWL-DL inference rules and apply them to
rewriting Graph Patterns in queries. The paper classifies the inference rules and discusses how these rules affect query rewriting.
To illustrate the advantages of our proposal, we present a prototype system based on Jena, and address query optimization,
to eliminate the disadvantages of augmented query sentences. We perform a set of query tests and compare the results with
related works. The results show that the proposed method results in significantly improved query efficiency, without compromising
completeness or soundness.
相似文献
Doo-Kwon BaikEmail: |
11.
Anirban Mondal Sanjay Kumar Madria Masaru Kitsuregawa 《Peer-to-Peer Networking and Applications》2009,2(3):230-251
In mobile ad hoc peer-to-peer (M-P2P) networks, economic models become a necessity for enticing non-cooperative mobile peers
to provide service. M-P2P users may issue queries with varying constraints on query response time, data quality of results
and trustworthiness of the data source. Hence, we propose ConQuer, which is an economic incentive model for the efficient processing of constraint queries in M-P2P networks. ConQuer also
provides incentives for peer collaboration in order to improve data availability. The main contributions of ConQuer are three-fold. First, it uses a broker-based economic M-P2P model for processing constraint queries via a Vickrey auction mechanism. Second, it proposes the CR*-tree, a dynamic
multidimensional R-tree-based index for constraints of data quality, trust and price of data to determine target peers efficiently.
The CR*-tree is hosted by brokers, who can sell it to other peers, thereby encouraging the creation of multiple copies of the index for facilitating routing. Third, it provides
incentives for peers to form collaborative peer groups for maximizing data availability and revenues by mutually allocating
and deallocating data items using royalty-based revenue-sharing. Such reallocations facilitate better data quality, thereby further increasing peer revenues. Our performance
study shows that ConQuer is indeed effective in answering constraint queries with improved response time, success rate and
data quality, and querying hop-counts.
相似文献
Masaru KitsuregawaEmail: |
12.
13.
In this paper, we study the important issues in the design of an efficient wireless real-time visual surveillance system (WISES).
Two important considerations are to minimize: (1) the video workload on the wireless network; and (2) the processing workload
at the front-end video capturing unit. To achieve the first objective, we propose a cooperative framework for semantic filtering of video frames instead of forwarding every video frame to the back-end server for analysis and monitoring query evaluation.
To minimize the processing workload at the front-end unit, a hierarchical object model (HOM) is designed to model the status
of the objects, and their temporal and spatial properties in the video scene. With the information provided from the back-end
server, the front-end unit pre-analyses the current status of the objects in the HOM by comparing the selection conditions
in the submitted monitoring queries following the adaptive object-based evaluation (APOBE) scheme which is proposed to reduce
the processing workload at the front-end unit. In APOBE, a higher evaluation frequency is given to the object which is closer
to satisfy the condition in the monitoring queries. The performance of WISES has been studied to demonstrate the efficiency
of the proposed scheme.
相似文献
Calvin K. H. ChiuEmail: |
14.
Bayesian network based business information retrieval model 总被引:3,自引:3,他引:0
The quality of business information can significantly affect the operation level of enterprise. This paper analyses the problem
of business information retrieval (BIR). A Bayesian Network Based business information retrieval model (BN-BIRM) is proposed
by means of Bayesian network (BN) and information retrieval (IR) theory and a method for query adaptation is presented. In
this model the customized query requirement of enterprise (CQR) is expressed in terms of the predefined illustrative documents
related to business domain. The similarities between the documents and the query are evaluated with the conditional probabilities
among the nodes in the BN. In the experiments, BN-BIRM is compared with the Belief Network model based on vector space model
(VSM) ranking strategy and the Inference Network model based on TF-IDF ranking strategy. The experimental results show that
BN-BIRM is effective for collecting business information on a large scale.
相似文献
Zheng WangEmail: |
15.
Due to the increase of XML-based applications, XML schema design has become an important task. One approach is to consider
conceptual schemas as a basis for generating XML documents compliant to consensual information of specific domains. However,
the conversion of conceptual schemas to XML schemas is not a straightforward process and inconvenient design decisions can
lead to a poor query processing on XML documents generated. This paper presents a conversion approach which considers data
and query workload estimated for XML applications, in order to generate an XML schema from a conceptual schema. Load information
is used to produce XML schemas which can respond well to the main queries of an XML application. We evaluate our approach
through a case study carried out on a native XML database. The experimental results demonstrate that the XML schemas generated
by our methodology contribute to a better query performance than related approaches.
相似文献
Ronaldo dos Santos MelloEmail: |
16.
Enabling Location-based Services—Multi-Graph Representation of Transportation Networks 总被引:2,自引:2,他引:0
Advances in wireless communications, positioning technologies, and consumer electronics combine to enable a range of applications
that use a mobile user’s geo-spatial location to deliver on-line, location-enhanced services, often referred to as location-based
services. This paper assumes that the service users are constrained to a transportation network, and it delves into the modeling
of such networks, points of interest, and the service users with the objective of supporting location-based services. In particular,
the paper presents a framework that encompasses two interrelated models—a two-dimensional, spatial representation and a multi-graph
presentation. The former, high-fidelity model may be used for the positioning of content and users in the infrastructure (e.g.,
using map matching). The latter type of model is recognized as an ideal basis for a variety of query processing tasks, e.g.,
route and distance computations. Together, the two models capture central aspects of the problem domain needed in order to
support the different types of queries that underlie location-based services. Notably, the framework is capable of capturing
roads with lanes, lane shift and u-turn regulations, and turn restrictions. As part of the framework, the paper constructively
demonstrates how it is possible map instances of the semantically rich two-dimensional model to instances of the graph model
that preserve the topology of the two-dimensional model instances. In doing so, the paper demonstrates how a wealth of previously
proposed query processing techniques based on graphs are applicable even in the context of complex transportation networks.
The paper also presents means of compacting graphs while preserving aspects of the graphs that are important for the intended
applications.
相似文献
Christian S. JensenEmail: |
17.
Phivos Mylonas Thanos Athanasiadis Manolis Wallace Yannis Avrithis Stefanos Kollias 《Multimedia Tools and Applications》2008,39(3):293-327
In this paper we present a framework for unified, personalized access to heterogeneous multimedia content in distributed repositories.
Focusing on semantic analysis of multimedia documents, metadata, user queries and user profiles, it contributes to the bridging
of the gap between the semantic nature of user queries and raw multimedia documents. The proposed approach utilizes as input
visual content analysis results, as well as analyzes and exploits associated textual annotation, in order to extract the underlying
semantics, construct a semantic index and classify documents to topics, based on a unified knowledge and semantics representation
model. It may then accept user queries, and, carrying out semantic interpretation and expansion, retrieve documents from the
index and rank them according to user preferences, similarly to text retrieval. All processes are based on a novel semantic
processing methodology, employing fuzzy algebra and principles of taxonomic knowledge representation. The first part of this
work presented in this paper deals with data and knowledge models, manipulation of multimedia content annotations and semantic
indexing, while the second part will continue on the use of the extracted semantic information for personalized retrieval.
相似文献
Stefanos KolliasEmail: |
18.
With the rapid advancements in positioning technologies such as the Global Positioning System (GPS) and wireless communications,
the tracking of continuously moving objects has become more convenient. However, this development poses new challenges to
database technology since maintaining up-to-date information regarding the location of moving objects incurs an enormous amount
of updates. Existing indexes can no longer keep up with the high update rate while providing speedy retrieval at the same
time. This study aims to improve k nearest neighbor (kNN) query performance while reducing update costs. Our approach is based
on an important observation that queries usually occur around certain places or spatial landmarks of interest, called reference
points. We propose the Reference-Point-based tree (RP-tree), which is a two-layer index structure that indexes moving objects
according to reference points. Experimental results show that the RP-tree achieves significant improvement over the TPR-tree.
相似文献
Aoying ZhouEmail: |
19.
The technique of relevance feedback has been introduced to content-based 3D model retrieval, however, two essential issues
which affect the retrieval performance have not been addressed. In this paper, a novel relevance feedback mechanism is presented,
which effectively makes use of strengths of different feature vectors and perfectly solves the problem of small sample and
asymmetry. During the retrieval process, the proposed method takes the user’s feedback details as the relevant information
of query model, and then dynamically updates two important parameters of each feature vector, narrowing the gap between high-level
semantic knowledge and low-level object representation. The experiments, based on the publicly available 3D model database
Princeton Shape Benchmark (PSB), show that the proposed approach not only precisely captures the user’s semantic knowledge,
but also significantly improves the retrieval performance of 3D model retrieval. Compared with three state-of-the-art query
refinement schemes for 3D model retrieval, it provides superior retrieval effectiveness only with a few rounds of relevance
feedback based on several standard measures.
相似文献
Biao LengEmail: |
20.
We propose a SAO index to approximately answer arbitrary linear optimization queries in a sliding window of a data stream.
It uses limited memory to maintain the most “important” tuples. At any time, for any linear optimization query, we can retrieve
the approximate top-K tuples in the sliding window almost instantly. The larger the amount of available memory, the better the quality of the answers
is. More importantly, for a given amount of memory, the quality of the answers can be further improved by dynamically allocating
a larger portion of the memory to the outer layers of the SAO index.
相似文献
Philip S. YuEmail: |