首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
Graph matching and graph edit distance have become important tools in structural pattern recognition. The graph edit distance concept allows us to measure the structural similarity of attributed graphs in an error-tolerant way. The key idea is to model graph variations by structural distortion operations. As one of its main constraints, however, the edit distance requires the adequate definition of edit cost functions, which eventually determine which graphs are considered similar. In the past, these cost functions were usually defined in a manual fashion, which is highly prone to errors. The present paper proposes a method to automatically learn cost functions from a labeled sample set of graphs. To this end, we formulate the graph edit process in a stochastic context and perform a maximum likelihood parameter estimation of the distribution of edit operations. The underlying distortion model is learned using an Expectation Maximization algorithm. From this model we finally derive the desired cost functions. In a series of experiments we demonstrate the learning effect of the proposed method and provide a performance comparison to other models.  相似文献   

2.
Although graph matching and graph edit distance computation have become areas of intensive research recently, the automatic inference of the cost of edit operations has remained an open problem. In the present paper, we address the issue of learning graph edit distance cost functions for numerically labeled graphs from a corpus of sample graphs. We propose a system of self-organizing maps (SOMs) that represent the distance measuring spaces of node and edge labels. Our learning process is based on the concept of self-organization. It adapts the edit costs in such a way that the similarity of graphs from the same class is increased, whereas the similarity of graphs from different classes decreases. The learning procedure is demonstrated on two different applications involving line drawing graphs and graphs representing diatoms, respectively.  相似文献   

3.
Graph edit distance from spectral seriation   总被引:3,自引:0,他引:3  
This paper is concerned with computing graph edit distance. One of the criticisms that can be leveled at existing methods for computing graph edit distance is that they lack some of the formality and rigor of the computation of string edit distance. Hence, our aim is to convert graphs to string sequences so that string matching techniques can be used. To do this, we use a graph spectral seriation method to convert the adjacency matrix into a string or sequence order. We show how the serial ordering can be established using the leading eigenvector of the graph adjacency matrix. We pose the problem of graph-matching as a maximum a posteriori probability (MAP) alignment of the seriation sequences for pairs of graphs. This treatment leads to an expression in which the edit cost is the negative logarithm of the a posteriori sequence alignment probability. We compute the edit distance by finding the sequence of string edit operations which minimizes the cost of the path traversing the edit lattice. The edit costs are determined by the components of the leading eigenvectors of the adjacency matrix and by the edge densities of the graphs being matched. We demonstrate the utility of the edit distance on a number of graph clustering problems.  相似文献   

4.
The spectrum of a graph has been widely used in graph theory to characterise the properties of a graph and extract information from its structure. It has also been employed as a graph representation for pattern matching since it is invariant to the labelling of the graph. There are, however, a number of potential drawbacks in using the spectrum as a representation of a graph. Firstly, more than one graph may share the same spectrum. It is well known, for example, that very few trees can be uniquely specified by their spectrum. Secondly, the spectrum may change dramatically with a small change structure.There are a wide variety of graph matrix representations from which the spectrum can be extracted. Among these are the adjacency matrix, combinatorial Laplacian, normalised Laplacian and unsigned Laplacian. Spectra can also be derived from the heat kernel matrix and path length distribution matrix. The choice of matrix representation clearly has a large effect on the suitability of spectrum in a number of pattern recognition tasks.In this paper we investigate the performance of the spectra as a graph representation in a variety of situations. Firstly, we investigate the cospectrality of the various matrix representations over large graph and tree sets, extending the work of previous authors. We then show that the Euclidean distance between spectra tracks the edit distance between graphs over a wide range of edit costs, and we analyse the accuracy of this relationship. We then use the spectra to both cluster and classify the graphs and demonstrate the effect of the graph matrix formulation on error rates. These results are produced using both synthetic graphs and trees and graphs derived from shape and image data.  相似文献   

5.
In this paper, we propose a novel framework, called Dinkelbach NCUT (DNCUT), which efficiently solves the normalized graph cut (NCUT) problem under general, convex constraints, as well as, under given priors on the nodes of the graph. Current NCUT methods use generalized eigen-decomposition, which poses computational issues especially for large graphs, and can only handle linear equality constraints. By using an augmented graph and the iterative Dinkelbach method for fractional programming (FP), we formulate the DNCUT framework to efficiently solve the NCUT problem under general convex constraints and given data priors. In this framework, the initial problem is converted into a sequence of simpler sub-problems (i.e. convex, quadratic programs (QP’s) subject to convex constraints). The complexity of finding a global solution for each sub-problem depends on the complexity of the constraints, the convexity of the cost function, and the chosen initialization. However, we derive an initialization, which guarantees that each sub-problem is a convex QP that can be solved by available convex programming techniques. We apply this framework to the special case of linear constraints, where the solution is obtained by solving a sequence of sparse linear systems using the conjugate gradient method. We validate DNCUT by performing binary segmentation on real images both with and without linear/nonlinear constraints, as well as, multi-class segmentation. When possible, we compare DNCUT to other NCUT methods, in terms of segmentation performance and computational efficiency. Even though the new formulation is applied to the problem of spectral graph-based, low-level image segmentation, it can be directly applied to other applications (e.g. clustering).  相似文献   

6.
Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching   总被引:1,自引:0,他引:1  
In a way similar to the string-to-string correction problem, we address discrete time series similarity in light of a time-series-to-time-series-correction problem for which the similarity between two time series is measured as the minimum cost sequence of edit operations needed to transform one time series into another. To define the edit operations, we use the paradigm of a graphical editing process and end up with a dynamic programming algorithm that we call time warp edit distance (TWED). TWED is slightly different in form from dynamic time warping (DTW), longest common subsequence (LCSS), or edit distance with real penalty (ERP) algorithms. In particular, it highlights a parameter that controls a kind of stiffness of the elastic measure along the time axis. We show that the similarity provided by TWED is a potentially useful metric in time series retrieval applications since it could benefit from the triangular inequality property to speed up the retrieval process while tuning the parameters of the elastic measure. In that context, a lower bound is derived to link the matching of time series into down sampled representation spaces to the matching into the original space. The empiric quality of the TWED distance is evaluated on a simple classification task. Compared to edit distance, DTW, LCSS, and ERP, TWED has proved to be quite effective on the considered experimental task.  相似文献   

7.
Graph edit distance is a powerful and flexible method for error-tolerant graph matching. Yet it can only be calculated for small graphs in practice due to its exponential time complexity when considering unconstrained graphs. In this paper we propose a quadratic time approximation of graph edit distance based on Hausdorff matching. In a series of experiments we analyze the performance of the proposed Hausdorff edit distance in the context of graph classification and compare it with a cubic time algorithm based on the assignment problem. Investigated applications include nearest neighbor classification of graphs representing letter drawings, fingerprints, and molecular compounds as well as hidden Markov model classification of vector space embedded graphs representing handwriting. In many cases, a substantial speedup is achieved with only a minor loss in accuracy or, in one case, even with a gain in accuracy. Overall, the proposed Hausdorff edit distance shows a promising potential in terms of flexibility, efficiency, and accuracy.  相似文献   

8.
A general-purpose browser for directed graphs is described. The browser provides operations to examine and edit graphs and to generate a layout for a graph automatically that minimizes edge crossings. Two layout algorithms were implemented. A hierarchical graph layout algorithm was found to be best for directed graphs. The graph browser also has facilities that allow it to be integrated with other applications (e.g. a program browser). These facilities and our experiences building a program call-graph browser are described.  相似文献   

9.
Traditional normalized tree edit distances do not satisfy the triangle inequality. We present a metric normalization method for tree edit distance, which results in a new normalized tree edit distance fulfilling the triangle inequality, under the condition that the weight function is a metric over the set of elementary edit operations with all costs of insertions/deletions having the same weight. We prove that the new distance, in the range [0, 1], is a genuine metric as a simple function of the sizes of two ordered labeled trees and the tree edit distance between them, which can be directly computed through tree edit distance with the same complexity. Based on an efficient algorithm to represent digits as ordered labeled trees, we show that the normalized tree edit metric can provide slightly better results than other existing methods in handwritten digit recognition experiments using the approximating and eliminating search algorithm (AESA) algorithm.  相似文献   

10.
《Pattern recognition letters》2001,22(6-7):753-758
The relationship between two important problems in pattern recognition using attributed relational graphs, the maximum common subgraph and the minimum common supergraph of two graphs, is established by means of simple constructions, which allow to obtain the maximum common subgraph from the minimum common supergraph, and vice versa. On this basis, a new graph distance metric is proposed for measuring similarities between objects represented by attributed relational graphs. The proposed metric can be computed by a straightforward extension of any algorithm that implements error-correcting graph matching, when run under an appropriate cost function, and the extension only takes time linear in the size of the graphs.  相似文献   

11.
Graph similarity is an important notion with many applications. Graph edit distance is one of the most flexible graph similarity measures available. The main problem with this measure is that in practice it can only be computed for small graphs due to its exponential time complexity. This paper addresses the high complexity of graph edit distance computations. Specifically, we present CSI_GED, a novel edge-centric approach for computing graph edit distance through common sub-structure isomorphisms enumeration. CSI_GED utilizes backtracking search combined with a number of heuristics to reduce memory requirements and quickly prune away a large portion of the mapping search space. Experiments show that CSI_GED is highly efficient for computing graph edit distance; it outperforms the state-of-the-art methods by over three orders of magnitude. It also shows that CSI_GED scales the computation gracefully to larger and distant graphs on which current methods fail to run. Moreover, we evaluated CSI_GED as a stand-alone graph edit similarity search query method. The experiments show that CSI_GED is effective and scalable, and outperforms the state-of-the-art indexing-based methods by over two orders of magnitude.  相似文献   

12.
In this paper we consider the minimal doubly resolving sets problem (MDRSP) of graphs. We prove that the problem is NP-hard and give its integer linear programming formulation. The problem is solved by a genetic algorithm (GA) that uses binary encoding and standard genetic operators adapted to the problem. Experimental results include three sets of ORLIB test instances: crew scheduling, pseudo-boolean and graph coloring. GA is also tested on theoretically challenging large-scale instances of hypercubes and Hamming graphs. Optimality of GA solutions on smaller size instances has been verified by total enumeration. For several larger instances optimality follows from the existing theoretical results. The GA results for MDRSP of hypercubes are used by a dynamic programming approach to obtain upper bounds for the metric dimension of large hypercubes up to 290290 nodes, that cannot be directly handled by the computer.  相似文献   

13.
A survey of graph edit distance   总被引:1,自引:0,他引:1  
Inexact graph matching has been one of the significant research foci in the area of pattern analysis. As an important way to measure the similarity between pairwise graphs error-tolerantly, graph edit distance (GED) is the base of inexact graph matching. The research advance of GED is surveyed in order to provide a review of the existing literatures and offer some insights into the studies of GED. Since graphs may be attributed or non-attributed and the definition of costs for edit operations is various, the existing GED algorithms are categorized according to these two factors and described in detail. After these algorithms are analyzed and their limitations are identified, several promising directions for further research are proposed.  相似文献   

14.
A linear programming (LP) approach is proposed for the weighted graph matching problem. A linear program is obtained by formulating the graph matching problem in L1 norm and then transforming the resulting quadratic optimization problem to a linear one. The linear program is solved using a simplex-based algorithm. Then, approximate 0-1 integer solutions are obtained by applying the Hungarian method on the real solutions of the linear program. The complexity of the proposed algorithm is polynomial time, and it is O(n 6L) for matching graphs of size n. The developed algorithm is compared to two other algorithms. One is based on an eigendecomposition approach and the other on a symmetric polynomial transform. Experimental results showed that the LP approach is superior in matching graphs than both other methods  相似文献   

15.
We propose a convex-concave programming approach for the labeled weighted graph matching problem. The convex-concave programming formulation is obtained by rewriting the weighted graph matching problem as a least-square problem on the set of permutation matrices and relaxing it to two different optimization problems: a quadratic convex and a quadratic concave optimization problem on the set of doubly stochastic matrices. The concave relaxation has the same global minimum as the initial graph matching problem, but the search for its global minimum is also a hard combinatorial problem. We, therefore, construct an approximation of the concave problem solution by following a solution path of a convex-concave problem obtained by linear interpolation of the convex and concave formulations, starting from the convex relaxation. This method allows to easily integrate the information on graph label similarities into the optimization problem, and therefore, perform labeled weighted graph matching. The algorithm is compared with some of the best performing graph matching methods on four data sets: simulated graphs, QAPLib, retina vessel images, and handwritten Chinese characters. In all cases, the results are competitive with the state of the art.  相似文献   

16.
Graph-based representations have been proved powerful in computer vision. The challenge that arises with large amounts of graph data is that of computationally burdensome edit distance computation. Graph kernels can be used to formulate efficient algorithms to deal with high dimensional data, and have been proved an elegant way to overcome this computational bottleneck. In this paper, we investigate whether the Jensen-Shannon divergence can be used as a means of establishing a graph kernel. The Jensen-Shannon kernel is nonextensive information theoretic kernel, and is defined using the entropy and mutual information computed from probability distributions over the structures being compared. To establish a Jensen-Shannon graph kernel, we explore two different approaches. The first of these is based on the von Neumann entropy associated with a graph. The second approach uses the Shannon entropy associated with the probability state vector for a steady state random walk on a graph. We compare the two resulting graph kernels for the problem of graph clustering. We use kernel principle components analysis (kPCA) to embed graphs into a feature space. Experimental results reveal that the method gives good classification results on graphs extracted both from an object recognition database and from an application in bioinformation.  相似文献   

17.
Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.  相似文献   

18.
19.
Cover1     
Although a number of normalized edit distances presented so far may offer good performance in some applications, none of them can be regarded as a genuine metric between strings because they do not satisfy the triangle inequality. Given two strings X and Y over a finite alphabet, this paper defines a new normalized edit distance between X and Y as a simple function of their lengths (|X| and |Y|) and the generalized Levenshtein distance (GLD) between them. The new distance can be easily computed through GLD with a complexity of O(|X| middot |Y|) and it is a metric valued in [0,1] under the condition that the weight function is a metric over the set of elementary edit operations with all costs of insertions/deletions having the same weight. Experiments using the AESA algorithm in handwritten digit recognition show that the new distance can generally provide similar results to some other normalized edit distances and may perform slightly better if the triangle inequality is violated in a particular data set  相似文献   

20.
编辑距离是一种距离测量法,源于将一个字符串变换为另一个字符串所需要的编辑操作数,该方法能够自动将语言进行分类,最近这些年在西方很受关注,被证明测量语言或方言间距离是有效的。运用编辑距离算法对侗台语族语言做出计量分类以及亲缘关系程度的描述。结果表明编辑距离分类结果与历史语言学的分类结果是基本一致的,为计量法提供了新思路。编辑距离可以应用于东亚语言的研究中。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号