首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Similarity matching in video databases is of growing importance in many new applications such as video clustering and digital video libraries. In order to provide efficient access to relevant data in large databases, there have been many research efforts in video indexing with diverse spatial and temporal features. However, most of the previous works relied on sequential matching methods or memory-based inverted file techniques, thus making them unsuitable for a large volume of video databases. In order to resolve this problem, this paper proposes an effective and scalable indexing technique using a trie, originally proposed for string matching, as an index structure. For building an index, we convert each frame into a symbol sequence using a window order heuristic and build a disk-resident trie from a set of symbol sequences. For query processing, we perform a depth-first traversal on the trie and execute a temporal segmentation. To verify the superiority of our approach, we perform several experiments with real and synthetic data sets. The results reveal that our approach consistently outperforms the sequential scan method, and the performance gain is maintained even with a large volume of video databases.  相似文献   

2.
In genomic databases, approximate string matching with k errors is often applied when searching genomic sequences, where k errors can be caused by substitution, insertion, or deletion operations. In this paper, we propose a new method, the hash trie filter, to efficiently support approximate string matching in genomic databases. First, we build a hash trie for indexing the genomic sequence stored in a database in advance. Then, we utilize an efficient technique to find the ordered subpatterns in the sequence, which could reduce the number of candidates by pruning some unreasonable matching positions. Moreover, our method will dynamically decide the number of ordered matching grams, resulting in the increase of precision. The simulation results show that the hash trie filter outperforms the well-known (k+s) q-samples filter in terms of the response time, the number of verified candidates, and the precision, under different lengths of the query patterns and different error levels.  相似文献   

3.
基于有序二叉树的快速多模式字符串匹配算法   总被引:1,自引:0,他引:1       下载免费PDF全文
周燕  侯整风  何玲 《计算机工程》2010,36(17):42-44
将有序二叉树和QS算法相结合,提出一种快速多模式字符串匹配算法,实现在多模式匹配过程中不匹配字符的连续跳跃。为提高匹配速度,利用已匹配的字符串信息进行跳跃式的比较,避免文本扫描指针的回溯。实验结果表明,与SMA算法相比,该算法在预处理阶段构造速度和匹配速度更快,在模式串较长的情况下,性能更优越。  相似文献   

4.
As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing. In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy.  相似文献   

5.
A real-time matching system for large fingerprint databases   总被引:11,自引:0,他引:11  
With the current rapid growth in multimedia technology, there is an imminent need for efficient techniques to search and query large image databases. Because of their unique and peculiar needs, image databases cannot be treated in a similar fashion to other types of digital libraries. The contextual dependencies present in images, and the complex nature of two-dimensional image data make the representation issues more difficult for image databases. An invariant representation of an image is still an open research issue. For these reasons, it is difficult to find a universal content-based retrieval technique. Current approaches based on shape, texture, and color for indexing image databases have met with limited success. Further, these techniques have not been adequately tested in the presence of noise and distortions. A given application domain offers stronger constraints for improving the retrieval performance. Fingerprint databases are characterized by their large size as well as noisy and distorted query images. Distortions are very common in fingerprint images due to elasticity of the skin. In this paper, a method of indexing large fingerprint image databases is presented. The approach integrates a number of domain-specific high-level features such as pattern class and ridge density at higher levels of the search. At the lowest level, it incorporates elastic structural feature-based matching for indexing the database. With a multilevel indexing approach, we have been able to reduce the search space. The search engine has also been implemented on Splash 2-a field programmable gate array (FPGA)-based array processor to obtain near-ASIC level speed of matching. Our approach has been tested on a locally collected test data and on NIST-9, a large fingerprint database available in the public domain  相似文献   

6.
We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way.  相似文献   

7.
Providing built-in keyword search capabilities in RDBMS   总被引:2,自引:0,他引:2  
A common approach to performing keyword search over relational databases is to find the minimum Steiner trees in database graphs transformed from relational data. These methods, however, are rather expensive as the minimum Steiner tree problem is known to be NP-hard. Further, these methods are independent of the underlying relational database management system (RDBMS), thus cannot benefit from the capabilities of the RDBMS. As an alternative, in this paper we propose a new concept called Compact Steiner Tree (CSTree), which can be used to approximate the Steiner tree problem for answering top-k keyword queries efficiently. We propose a novel structure-aware index, together with an effective ranking mechanism for fast, progressive and accurate retrieval of top-k highest ranked CSTrees. The proposed techniques can be implemented using a standard relational RDBMS to benefit from its indexing and query-processing capability. We have implemented our techniques in MYSQL, which can provide built-in keyword-search capabilities using SQL. The experimental results show a significant improvement in both search efficiency and result quality comparing to existing state-of-the-art approaches.  相似文献   

8.
A novel online approach to exact string matching and filtering of large databases is presented. String matching/filtering is based on artificial neural networks and operates in two stages: initially, a self‐organizing map retrieves the cluster of database strings that are most similar to the query string; subsequently, a harmony theory network compares the retrieved strings with the query string and determines whether an exact match exists. The similarity measure is configured to the specific characteristics of the database so as to expose overall string similarity rather than character coincidence at homologous string locations. The experimental results demonstrate foolproof, fast, and practically database‐size independent operation that is especially robust to database modifications. The proposed approach is put forward for general‐purpose (directory, catalogue, glossary search) as well as Internet‐oriented (e‐mail blocking, URL, username classification) applications. © 2010 Wiley Periodicals, Inc.  相似文献   

9.
In retrieval from image databases, evaluation of similarity, based both on the appearance of spatial entities and on their mutual relationships, depends on content representation based on attributed relational graphs. This kind of modeling entails complex matching and indexing, which presently prevents its usage within comprehensive applications. In this paper, we provide a graph-theoretical formulation for the problem of retrieval based on the joint similarity of individual entities and of their mutual relationships and we expound its implications on indexing and matching. In particular, we propose the usage of metric indexing to organize large archives of graph models, and we propose an original look-ahead method which represents an efficient solution for the (sub)graph error correcting isomorphism problem needed to compute object distances. Analytic comparison and experimental results show that the proposed lookahead improves the state-of-the-art in state-space search methods and that the combined use of the proposed matching and indexing scheme permits for the management of the complexity of a typical application of retrieval by spatial arrangement  相似文献   

10.
Shape management is an important functionality in multimedia databases. Shape information can be used in both image acquisition and image retrieval. Several approaches have been proposed to deal with shape representation and matching. Among them, the data-driven approach supports searches for shapes based on indexing techniques. Unfortunately, efficient data-driven approaches are often defined only for specific types of shape. This is not sufficient in contexts in which arbitrary shapes should be represented. Constraint databases use mathematical theories to finitely represent infinite sets of relational tuples. They have been proved to be very useful in modeling spatial objects. In this paper, we apply constraint-based data models to the problem of shape management in multimedia databases. We first present the constraint model and some constraint languages. Then, we show how constraints can be used to model general shapes. The use of a constraint language as an internal specification and execution language for querying shapes is also discussed. Finally, we show how a constraint database system can be used to efficiently retrieve shapes, retaining the advantages of the already defined approaches.  相似文献   

11.
An experimental VLSI accelerator research prototype designed and fabricated for relational data filtering is described. It performs high-speed associative search and aggregation operations for formatted databases. For unformatted databases, an experimental fast string search VLSI accelerator that provides a few orders-of-magnitude improvement in text search speed has been developed. The VLSI accelerators have precise instruction sets and hardware interfaces that facilitate their integration into general-purpose computer systems and dedicated systems. Application of the accelerators to both general-purpose and dedicated computer systems is discussed. Related work is examined  相似文献   

12.
String similarity search and join are two important operations in data cleaning and integration, which extend traditional exact search and exact join operations in databases by tolerating the errors and inconsistencies in the data. They have many real-world applications, such as spell checking, duplicate detection, entity resolution, and webpage clustering. Although these two problems have been extensively studied in the recent decade, there is no thorough survey. In this paper, we present a comprehensive survey on string similarity search and join. We first give the problem definitions and introduce widely-used similarity functions to quantify the similarity. We then present an extensive set of algorithms for string similarity search and join. We also discuss their variants, including approximate entity extraction, type-ahead search, and approximate substring matching. Finally, we provide some open datasets and summarize some research challenges and open problems.  相似文献   

13.
We consider the problem of similarity search in databases with costly metric distance measures. Given limited main memory, our goal is to develop a reference-based index that reduces the number of comparisons in order to answer a query. The idea in reference-based indexing is to select a small set of reference objects that serve as a surrogate for the other objects in the database. We consider novel strategies for selection of references and assigning references to database objects. For dynamic databases with frequent updates, we propose two incremental versions of the selection algorithm. Our experimental results show that our selection and assignment methods far outperform competing methods. This work is partially supported by the National Science Foundation under Grant No. 0347408.  相似文献   

14.
Hypertext systems provide an appealing mechanism for informally browsing databases by traversing selectable links. However, in many fact finding situations string search is an effective complement to browsing. This paper describes the application of the signature file method to achieve rapid and convenient string search in small personal computer hypertext environments. The method has been implemented in a prototype, as well as in a commercial product. Performance data for search times and storage space are presented from a commercial hypertext database. User interface issues are then discussed. Experience with the string search interface indicates that it was used successfully by novice users.  相似文献   

15.
With ever growing databases containing multimedia data, indexing has become a necessity to avoid a linear search. We propose a novel technique for indexing multimedia databases in which entries can be represented as graph structures. In our method, the topological structure of a graph as well as that of its subgraphs are represented as vectors whose components correspond to the sorted laplacian eigenvalues of the graph or subgraphs. Given the laplacian spectrum of graph G, we draw from recently developed techniques in the field of spectral integral variation to generate the laplacian spectrum of graph G+e without computing its eigendecomposition, where G+e is a graph obtained by adding edge e to graph G. This process improves the performance of the system for generating the subgraph signatures for 1.8% and 6.5% in datasets of size 420 and 1400, respectively. By doing a nearest neighbor search around the query spectra, similar but not necessarily isomorphic graphs are retrieved. Given a query graph, a voting schema ranks database graphs into an indexing hypothesis to which a final matching process can be applied. The novelties of the proposed method come from the powerful representation of the graph topology and successfully adopting the concept of spectral integral variation in an indexing algorithm. To examine the fitness of the new indexing framework, we have performed a number of experiments using an extensive set of recognition trials in the domain of 2D and 3D object recognition. The experiments, including a comparison with a competing indexing method using two different graph-based object representations, demonstrate both the robustness and efficacy of the overall approach.  相似文献   

16.
As more information sources become available in multimedia systems, the development of abstract semantic models for video, audio, text, and image data is becoming very important. An abstract semantic model has two requirements: it should be rich enough to provide a friendly interface of multimedia presentation synchronization schedules to the users and it should be a good programming data structure for implementation in order to control multimedia playback. An abstract semantic model based on an augmented transition network (ATN) is presented. The inputs for ATNs are modeled by multimedia input strings. Multimedia input strings provide an efficient means for iconic indexing of the temporal/spatial relations of media streams and semantic objects. An ATN and its subnetworks are used to represent the appearing sequence of media streams and semantic objects. The arc label is a substring of a multimedia input string. In this design, a presentation is driven by a multimedia input string. Each subnetwork has its own multimedia input string. Database queries relative to text, image, and video can be answered via substring matching at subnetworks. Multimedia browsing allows users the flexibility to select any part of the presentation they prefer to see. This means that the ATN and its subnetworks can be included in multimedia database systems which are controlled by a database management system (DBMS). User interactions and loops are also provided in an ATN. Therefore, ATNs provide three major capabilities: multimedia presentations, temporal/spatial multimedia database searching, and multimedia browsing  相似文献   

17.
Musical scores are traditionally retrieved by title, composer or subject classification. Just as multimedia computer systems increase the range of opportunities available for presenting musical information, so they also offer new ways of posing musically-oriented queries. This paper shows how scores can be retrieved from a database on the basis of a few notes sung or hummed into a microphone. The design of such a facility raises several interesting issues pertaining to music retrieval. We first describe an interface that transcribes acoustic input into standard music notation. We then analyze string matching requirements for ranked retrieval of music and present the results of an experiment which tests how accurately people sing well known melodies. The performance of several string matching criteria are analyzed using two folk song databases. Finally, we describe a prototype system which has been developed for retrieval of tunes from acoustic input and evaluate its performance.  相似文献   

18.
An approach to designing very fast algorithms for approximate string matching in a dictionary is proposed. Multiple spelling errors corresponding to insert, delete, change, and transpose operations on character strings are considered in the fault model. The design of very fast approximate string matching algorithms through a four-step reduction procedure is described. The final and most effective step uses hashing techniques to avoid comparing the given word with words at large distances. The technique has been applied to a library book catalog textbase. The experiments show that performing approximate string matching for a large dictionary in real-time on an ordinary sequential computer under our multiple fault model is feasible  相似文献   

19.
面向对象数据库系统中有序集合的索引技术   总被引:2,自引:0,他引:2  
本文首先讨论了面向对象数据库系统中的索引技术,分析了传统的基于值的索引技术不适合于用来索引有序集合的原因,然后提出了一种新的适合于有序集合的索引机制-P+树,同时本文也设计了一个用于测试有序集合索引机制的评价基准,根据该测试基准对本文提出的索引机制进行了系统的分析与评价。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号