首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Similarity searching in metric spaces has a vast number of applications in several fields like multimedia databases, text retrieval, computational biology, and pattern recognition. In this context, one of the most important similarity queries is the k nearest neighbor (k-NN) search. The standard best-first k-NN algorithm uses a lower bound on the distance to prune objects during the search. Although optimal in several aspects, the disadvantage of this method is that its space requirements for the priority queue that stores unprocessed clusters can be linear in the database size. Most of the optimizations used in spatial access methods (for example, pruning using MinMaxDist) cannot be applied in metric spaces, due to the lack of geometric properties. We propose a new k-NN algorithm that uses distance estimators, aiming to reduce the storage requirements of the search algorithm. The method stays optimal, yet it can significantly prune the priority queue without altering the output of the query. Experimental results with synthetic and real datasets confirm the reduction in storage space of our proposed algorithm, showing savings of up to 80% of the original space requirement.
Gonzalo NavarroEmail:

Benjamin Bustos   is an assistant professor in the Department of Computer Science at the University of Chile. He is also a researcher at the Millennium Nucleus Center for Web Research. His research interests are similarity searching and multimedia information retrieval. He has a doctoral degree in natural sciences from the University of Konstanz, Germany. Contact him at bebustos@dcc.uchile.cl. Gonzalo Navarro   earned his PhD in Computer Science at the University of Chile in 1998, where he is now Full Professor. His research interests include similarity searching, text databases, compression, and algorithms and data structures in general. He has coauthored a book on string matching and around 200 international papers. He has (co)chaired international conferences SPIRE 2001, SCCC 2004, SPIRE 2005, SIGIR Posters 2005, IFIP TCS 2006, and ENC 2007 Scalable Pattern Recognition track; and belongs to the Editorial Board of Information Retrieval Journal. He is currently Head of the Department of Computer Science at University of Chile, and Head of the Millenium Nucleus Center for Web Research, the largest Chilean project in Computer Science research.   相似文献   

2.
Algorithms for Nearest Neighbor Search on Moving Object Trajectories   总被引:2,自引:1,他引:1  
Nearest Neighbor (NN) search has been in the core of spatial and spatiotemporal database research during the last decade. The literature on NN query processing algorithms so far deals with either stationary or moving query points over static datasets or future (predicted) locations over a set of continuously moving points. With the increasing number of Mobile Location Services (MLS), the need for effective k-NN query processing over historical trajectory data has become the vehicle for data analysis, thus improving existing or even proposing new services. In this paper, we investigate mechanisms to perform NN search on R-tree-like structures storing historical information about moving object trajectories. The proposed (depth-first and best-first) algorithms vary with respect to the type of the query object (stationary or moving point) as well as the type of the query result (historical continuous or not), thus resulting in four types of NN queries. We also propose novel metrics to support our search ordering and pruning strategies. Using the implementation of the proposed algorithms on two members of the R-tree family for trajectory data (namely, the TB-tree and the 3D-R-tree), we demonstrate their scalability and efficiency through an extensive experimental study using large synthetic and real datasets.
Yannis Theodoridis (Corresponding author)Email: URL: http://dke.cti.gr http://isl.cs.unipi.gr/db
  相似文献   

3.
Similarity search implemented via k-nearest neighbor— k-NN queries on multidimensional indices is an extremely useful paradigm for content-based image retrieval. As the dimensionality of feature vectors increases the curse of dimensionality sets in, i.e., the performance of k-NN search of disk-resident indices in the R-tree family degrades rapidly due to the overlap in index pages in high dimensions. This problem is dealt with in this study by utilizing the double filtering effect of clustering and indexing. The clustering algorithm ensures that the largest cluster fits into main memory and that only clusters closest to a query point need to be searched and hence loaded into main memory. We organize the data in each cluster according to the ordered-partition—OP-tree main memory resident index, which is not prone to the curse of dimensionality and highly efficient for processing k-NN queries. We serialize an OP-tree by writing its dynamically allocated nodes into contiguous memory locations, optimize its parameters, and make it persistent by writing it to disk. The time to read and write clusters constituting an OP-tree with a single sequential access to disk benefits from higher data transfer rates of modern disk drives. The performance of the index is further improved by applying the Karhunen–Loève transformation—KLT to the dataset, since this results in a more efficient computation of distances for k-NN queries. We compare OP-trees and sequential scans with and without a KL-transformation and with and without using a shortcut method in calculating Euclidean distances. A comparison against the OMNI-sequential scan is also reported. We finally compare a clustered and persistent version of the OP-tree against a clustered version of the SR-tree and the VA-file method. CPU time is measured and elapsed time is estimated in this study. It is observed that the OP-tree index outperforms the other two methods and that the improvement increases with the number of dimensions.
Lijuan ZhangEmail:
  相似文献   

4.
RRSi: indexing XML data for proximity twig queries   总被引:2,自引:2,他引:0  
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
Vincent T. Y. NgEmail:
  相似文献   

5.
Text search engines are inadequate for indexing and searching XML documents because they ignore metadata and aggregation structure implicit in the XML documents. On the other hand, the query languages supported by specialized XML search engines are very complex. In this paper, we present a simple yet flexible query language, and develop its semantics to enable intuitively appealing extraction of relevant fragments of information while simultaneously falling back on retrieval through plain text search if necessary. Our approach combines and generalizes several available techniques to obtain precise and coherent results.
Trivikram ImmaneniEmail: URL: http://www.cs.wright.edu/~tkprasad
  相似文献   

6.
Exploiting maximal redundancy to optimize SQL queries   总被引:1,自引:1,他引:0  
Detecting and dealing with redundancy is an ubiquitous problem in query optimization, which manifests itself in many areas of research such as materialized views, multi-query optimization, and query-containment algorithms. In this paper, we focus on the issue of intra-query redundancy, redundancy present within a query. We present a method to detect the maximal redundancy present between a main (outer) query block and a subquery block. We then use the method for query optimization, introducing query plans and a new operator that take full advantage of the redundancy discovered. Our approach can deal with redundancy in a wider spectrum of queries than existing techniques. We show experimental evidence that our approach works under certain conditions, and compares favorably to existing optimization techniques when applicable.
Antonio BadiaEmail:
  相似文献   

7.
In spite of significant improvements in video data retrieval, a system has not yet been developed that can adequately respond to a user’s query. Typically, the user has to refine the query many times and view query results until eventually the expected videos are retrieved from the database. The complexity of video data and questionable query structuring by the user aggravates the retrieval process. Most previous research in this area has focused on retrieval based on low-level features. Managing imprecise queries using semantic (high-level) content is no easier than queries based on low-level features due to the absence of a proper continuous distance function. We provide a method to help users search for clips and videos of interest in video databases. The video clips are classified as interesting and uninteresting based on user browsing. The attribute values of clips are classified by commonality, presence, and frequency within each of the two groups to be used in computing the relevance of each clip to the user’s query. In this paper, we provide an intelligent query structuring system, called I-Quest, to rank clips based on user browsing feedback, where a template generation from the set of interesting and uninteresting sets is impossible or yields poor results.
Ramazan Savaş Aygün (Corresponding author)Email:
  相似文献   

8.
The k-nearest neighbors (k-NN) classifier is one of the most popular supervised classification methods. It is very simple, intuitive and accurate in a great variety of real-world domains. Nonetheless, despite its simplicity and effectiveness, practical use of this rule has been historically limited due to its high storage requirements and the computational costs involved. On the other hand, the performance of this classifier appears strongly sensitive to training data complexity. In this context, by means of several problem difficulty measures, we try to characterize the behavior of the k-NN rule when working under certain situations. More specifically, the present analysis focuses on the use of some data complexity measures to describe class overlapping, feature space dimensionality and class density, and discover their relation with the practical accuracy of this classifier.
J. S. SánchezEmail:
  相似文献   

9.
Mental image search by boolean composition of region categories   总被引:1,自引:0,他引:1  
Existing content-based image retrieval paradigms almost never address the problem of starting the search, when the user has no starting example image but rather a mental image. We propose a new image retrieval system to allow the user to perform mental image search by formulating boolean composition of region categories. The query interface is a region photometric thesaurus which can be viewed as a visual summary of salient regions available in the database. It is generated from the unsupervised clustering of regions with similar visual content into categories. In this thesaurus, the user simply selects the types of regions which should and should not be present in the mental image (boolean composition). The natural use of inverted tables on the region category labels enables powerful boolean search and very fast retrieval in large image databases. The process of query and search of images relates to that of documents with Google. The indexing scheme is fully unsupervised and the query mode requires minimal user interaction (no example image to provide, no sketch to draw). We demonstrate the feasibility of such a framework to reach the user mental target image with two applications: a photo-agency scenario on Corel Photostock and a TV news scenario. Perspectives will be proposed for this simple and innovative framework, which should motivate further development in various research areas.
Nozha BoujemaaEmail: URL: http://www-rocq.inria.fr/imedia/
  相似文献   

10.
SPARQL graph pattern rewriting for OWL-DL inference queries   总被引:1,自引:1,他引:0  
This paper focuses on the issue of OWL-DL ontology queries implemented in SPARQL. Currently, ontology repositories construct inference ontology models, and match SPARQL queries to the models, to derive inference results. Because an inference model uses much more storage space than the original model, and cannot be reused as inference requirements vary, this method is not suitable for large-scale deployment. To solve this problem, this paper proposes a novel method that passes rewritten SPARQL queries to the original ontology model, to retrieve inference results. We define OWL-DL inference rules and apply them to rewriting Graph Patterns in queries. The paper classifies the inference rules and discusses how these rules affect query rewriting. To illustrate the advantages of our proposal, we present a prototype system based on Jena, and address query optimization, to eliminate the disadvantages of augmented query sentences. We perform a set of query tests and compare the results with related works. The results show that the proposed method results in significantly improved query efficiency, without compromising completeness or soundness.
Doo-Kwon BaikEmail:
  相似文献   

11.
In mobile ad hoc peer-to-peer (M-P2P) networks, economic models become a necessity for enticing non-cooperative mobile peers to provide service. M-P2P users may issue queries with varying constraints on query response time, data quality of results and trustworthiness of the data source. Hence, we propose ConQuer, which is an economic incentive model for the efficient processing of constraint queries in M-P2P networks. ConQuer also provides incentives for peer collaboration in order to improve data availability. The main contributions of ConQuer are three-fold. First, it uses a broker-based economic M-P2P model for processing constraint queries via a Vickrey auction mechanism. Second, it proposes the CR*-tree, a dynamic multidimensional R-tree-based index for constraints of data quality, trust and price of data to determine target peers efficiently. The CR*-tree is hosted by brokers, who can sell it to other peers, thereby encouraging the creation of multiple copies of the index for facilitating routing. Third, it provides incentives for peers to form collaborative peer groups for maximizing data availability and revenues by mutually allocating and deallocating data items using royalty-based revenue-sharing. Such reallocations facilitate better data quality, thereby further increasing peer revenues. Our performance study shows that ConQuer is indeed effective in answering constraint queries with improved response time, success rate and data quality, and querying hop-counts.
Masaru KitsuregawaEmail:
  相似文献   

12.
13.
In this paper, we study the important issues in the design of an efficient wireless real-time visual surveillance system (WISES). Two important considerations are to minimize: (1) the video workload on the wireless network; and (2) the processing workload at the front-end video capturing unit. To achieve the first objective, we propose a cooperative framework for semantic filtering of video frames instead of forwarding every video frame to the back-end server for analysis and monitoring query evaluation. To minimize the processing workload at the front-end unit, a hierarchical object model (HOM) is designed to model the status of the objects, and their temporal and spatial properties in the video scene. With the information provided from the back-end server, the front-end unit pre-analyses the current status of the objects in the HOM by comparing the selection conditions in the submitted monitoring queries following the adaptive object-based evaluation (APOBE) scheme which is proposed to reduce the processing workload at the front-end unit. In APOBE, a higher evaluation frequency is given to the object which is closer to satisfy the condition in the monitoring queries. The performance of WISES has been studied to demonstrate the efficiency of the proposed scheme.
Calvin K. H. ChiuEmail:
  相似文献   

14.
Bayesian network based business information retrieval model   总被引:3,自引:3,他引:0  
The quality of business information can significantly affect the operation level of enterprise. This paper analyses the problem of business information retrieval (BIR). A Bayesian Network Based business information retrieval model (BN-BIRM) is proposed by means of Bayesian network (BN) and information retrieval (IR) theory and a method for query adaptation is presented. In this model the customized query requirement of enterprise (CQR) is expressed in terms of the predefined illustrative documents related to business domain. The similarities between the documents and the query are evaluated with the conditional probabilities among the nodes in the BN. In the experiments, BN-BIRM is compared with the Belief Network model based on vector space model (VSM) ranking strategy and the Inference Network model based on TF-IDF ranking strategy. The experimental results show that BN-BIRM is effective for collecting business information on a large scale.
Zheng WangEmail:
  相似文献   

15.
Due to the increase of XML-based applications, XML schema design has become an important task. One approach is to consider conceptual schemas as a basis for generating XML documents compliant to consensual information of specific domains. However, the conversion of conceptual schemas to XML schemas is not a straightforward process and inconvenient design decisions can lead to a poor query processing on XML documents generated. This paper presents a conversion approach which considers data and query workload estimated for XML applications, in order to generate an XML schema from a conceptual schema. Load information is used to produce XML schemas which can respond well to the main queries of an XML application. We evaluate our approach through a case study carried out on a native XML database. The experimental results demonstrate that the XML schemas generated by our methodology contribute to a better query performance than related approaches.
Ronaldo dos Santos MelloEmail:
  相似文献   

16.
Advances in wireless communications, positioning technologies, and consumer electronics combine to enable a range of applications that use a mobile user’s geo-spatial location to deliver on-line, location-enhanced services, often referred to as location-based services. This paper assumes that the service users are constrained to a transportation network, and it delves into the modeling of such networks, points of interest, and the service users with the objective of supporting location-based services. In particular, the paper presents a framework that encompasses two interrelated models—a two-dimensional, spatial representation and a multi-graph presentation. The former, high-fidelity model may be used for the positioning of content and users in the infrastructure (e.g., using map matching). The latter type of model is recognized as an ideal basis for a variety of query processing tasks, e.g., route and distance computations. Together, the two models capture central aspects of the problem domain needed in order to support the different types of queries that underlie location-based services. Notably, the framework is capable of capturing roads with lanes, lane shift and u-turn regulations, and turn restrictions. As part of the framework, the paper constructively demonstrates how it is possible map instances of the semantically rich two-dimensional model to instances of the graph model that preserve the topology of the two-dimensional model instances. In doing so, the paper demonstrates how a wealth of previously proposed query processing techniques based on graphs are applicable even in the context of complex transportation networks. The paper also presents means of compacting graphs while preserving aspects of the graphs that are important for the intended applications.
Christian S. JensenEmail:
  相似文献   

17.
In this paper we present a framework for unified, personalized access to heterogeneous multimedia content in distributed repositories. Focusing on semantic analysis of multimedia documents, metadata, user queries and user profiles, it contributes to the bridging of the gap between the semantic nature of user queries and raw multimedia documents. The proposed approach utilizes as input visual content analysis results, as well as analyzes and exploits associated textual annotation, in order to extract the underlying semantics, construct a semantic index and classify documents to topics, based on a unified knowledge and semantics representation model. It may then accept user queries, and, carrying out semantic interpretation and expansion, retrieve documents from the index and rank them according to user preferences, similarly to text retrieval. All processes are based on a novel semantic processing methodology, employing fuzzy algebra and principles of taxonomic knowledge representation. The first part of this work presented in this paper deals with data and knowledge models, manipulation of multimedia content annotations and semantic indexing, while the second part will continue on the use of the extracted semantic information for personalized retrieval.
Stefanos KolliasEmail:
  相似文献   

18.
With the rapid advancements in positioning technologies such as the Global Positioning System (GPS) and wireless communications, the tracking of continuously moving objects has become more convenient. However, this development poses new challenges to database technology since maintaining up-to-date information regarding the location of moving objects incurs an enormous amount of updates. Existing indexes can no longer keep up with the high update rate while providing speedy retrieval at the same time. This study aims to improve k nearest neighbor (kNN) query performance while reducing update costs. Our approach is based on an important observation that queries usually occur around certain places or spatial landmarks of interest, called reference points. We propose the Reference-Point-based tree (RP-tree), which is a two-layer index structure that indexes moving objects according to reference points. Experimental results show that the RP-tree achieves significant improvement over the TPR-tree.
Aoying ZhouEmail:
  相似文献   

19.
The technique of relevance feedback has been introduced to content-based 3D model retrieval, however, two essential issues which affect the retrieval performance have not been addressed. In this paper, a novel relevance feedback mechanism is presented, which effectively makes use of strengths of different feature vectors and perfectly solves the problem of small sample and asymmetry. During the retrieval process, the proposed method takes the user’s feedback details as the relevant information of query model, and then dynamically updates two important parameters of each feature vector, narrowing the gap between high-level semantic knowledge and low-level object representation. The experiments, based on the publicly available 3D model database Princeton Shape Benchmark (PSB), show that the proposed approach not only precisely captures the user’s semantic knowledge, but also significantly improves the retrieval performance of 3D model retrieval. Compared with three state-of-the-art query refinement schemes for 3D model retrieval, it provides superior retrieval effectiveness only with a few rounds of relevance feedback based on several standard measures.
Biao LengEmail:
  相似文献   

20.
Answering linear optimization queries with an approximate stream index   总被引:2,自引:2,他引:0  
We propose a SAO index to approximately answer arbitrary linear optimization queries in a sliding window of a data stream. It uses limited memory to maintain the most “important” tuples. At any time, for any linear optimization query, we can retrieve the approximate top-K tuples in the sliding window almost instantly. The larger the amount of available memory, the better the quality of the answers is. More importantly, for a given amount of memory, the quality of the answers can be further improved by dynamically allocating a larger portion of the memory to the outer layers of the SAO index.
Philip S. YuEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号