首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
A family of graphs is a k-bounded-hole family if every graph in the family has no holes with more than k vertices. The problem of finding in a graph a maximum weight induced path has applications in large communication and neural networks when worst case communication time needs to be evaluated; unfortunately this problem is NP-hard even when restricted to bipartite graphs. We show that this problem has polynomial time algorithms for k-bounded-hole families of graphs, for interval-filament graphs and for graphs decomposable by clique cut-sets or by splits into prime subgraphs for which such algorithms exist.  相似文献   

2.
Finding dense subgraphs is an important problem in graph mining and has many practical applications. At the same time, while large real-world networks are known to have many communities that are not well-separated, the majority of the existing work focuses on the problem of finding a single densest subgraph. Hence, it is natural to consider the question of finding the top-k densest subgraphs. One major challenge in addressing this question is how to handle overlaps: eliminating overlaps completely is one option, but this may lead to extracting subgraphs not as dense as it would be possible by allowing a limited amount of overlap. Furthermore, overlaps are desirable as in most real-world graphs there are vertices that belong to more than one community, and thus, to more than one densest subgraph. In this paper we study the problem of finding top-k overlapping densest subgraphs, and we present a new approach that improves over the existing techniques, both in theory and practice. First, we reformulate the problem definition in a way that we are able to obtain an algorithm with constant-factor approximation guarantee. Our approach relies on using techniques for solving the max-sum diversification problem, which however, we need to extend in order to make them applicable to our setting. Second, we evaluate our algorithm on a collection of benchmark datasets and show that it convincingly outperforms the previous methods, both in terms of quality and efficiency.  相似文献   

3.
In this paper, we present parallel multilevel algorithms for the hypergraph partitioning problem. In particular, we describe for parallel coarsening, parallel greedy k-way refinement and parallel multi-phase refinement. Using an asymptotic theoretical performance model, we derive the isoefficiency function for our algorithms and hence show that they are technically scalable when the maximum vertex and hyperedge degrees are small. We conduct experiments on hypergraphs from six different application domains to investigate the empirical scalability of our algorithms both in terms of runtime and partition quality. Our findings confirm that the quality of partition produced by our algorithms is stable as the number of processors is increased while being competitive with those produced by a state-of-the-art serial multilevel partitioning tool. We also validate our theoretical performance model through an isoefficiency study. Finally, we evaluate the impact of introducing parallel multi-phase refinement into our parallel multilevel algorithm in terms of the trade off between improved partition quality and higher runtime cost.  相似文献   

4.
In this paper, we present and study a class of graph partitioning algorithms that reduces the size of the graph by collapsing vertices and edges, we find ak-way partitioning of the smaller graph, and then we uncoarsen and refine it to construct ak-way partitioning for the original graph. These algorithms compute ak-way partitioning of a graphG= (V,E) inO(|E|) time, which is faster by a factor ofO(logk) than previously proposed multilevel recursive bisection algorithms. A key contribution of our work is in finding a high-quality and computationally inexpensive refinement algorithm that can improve upon an initialk-way partitioning. We also study the effectiveness of the overall scheme for a variety of coarsening schemes. We present experimental results on a large number of graphs arising in various domains including finite element methods, linear programming, VLSI, and transportation. Our experiments show that this new scheme produces partitions that are of comparable or better quality than those produced by the multilevel bisection algorithm and requires substantially smaller time. Graphs containing up to 450,000 vertices and 3,300,000 edges can be partitioned in 256 domains in less than 40 s on a workstation such as SGI's Challenge. Compared with the widely used multilevel spectral bisection algorithm, our new algorithm is usually two orders of magnitude faster and produces partitions with substantially smaller edge-cut.  相似文献   

5.
We study the set-cover problem, i.e. given a collection C of subsets of a finite set U, find a minimum size subset C′⊆C such that every element in U belongs to at least one member of C. An instance (C,U) of the set-cover problem is k-bounded if the number of occurrences in C of any element is bounded by a constant k?2.We present an approximation algorithm for the k-bounded set-cover problem, that achieves the ratio , where ε is defined as . If ε is relatively high, we say that the problem is dense, and this ratio in this case is better than k, which is the best known constant ratio for this problem. In the case that the number of occurrences in C of any element is exactly k=2 the problem is known as the vertex-cover problem. For dense graphs, our algorithm achieves an approximation ratio better than that of Nagamochi and Ibaraki (Japan J. Indust. Appl. Math. 16 (1999) 369), and the same approximation ratios as Karpinski and Zelikovsky (Proceedings of DIMACS Workshop on Network Design: Connectivity and Facilities Location, Vol. 40, Princeton, 1998, pp. 169-178). In our algorithm we use a combinatorial property of the set-cover problem, which is based on the classical greedy algorithm for the set-cover problem. We use this property to define a “greedy-sequence”, which is defined over a given instance of the set-cover problem and its cover.In addition, we show evidence that the ratio we achieve for the ε-dense k-bounded set-cover problem is the best constant ratio one can expect. We do this by showing that finding a better constant ratio is as hard as finding a constant ratio better than k for the k-bounded set-cover problem in which the optimal cover is known to be of size at least . (k is the best known constant ratio for this version of the k-bounded set-cover problem.) We show a similar lower bound for the approximation ratio for the vertex-cover problem in ε-dense graphs.  相似文献   

6.
In this paper, a new approach called ‘instance variant nearest neighbor’ approximates a regression surface of a function using the concept of k nearest neighbor. Instead of fixed k neighbors for the entire dataset, our assumption is that there are optimal k neighbors for each data instance that best approximates the original function by fitting the local regions. This approach can be beneficial to noisy datasets where local regions form data characteristics that are different from the major data clusters. We formulate the problem of finding such k neighbors for each data instance as a combinatorial optimization problem, which is solved by a particle swarm optimization. The particle swarm optimization is extended with a rounding scheme that rounds up or down continuous-valued candidate solutions to integers, a number of k neighbors. We apply our new approach to five real-world regression datasets and compare its prediction performance with other function approximation algorithms, including the standard k nearest neighbor, multi-layer perceptron, and support vector regression. We observed that the instance variant nearest neighbor outperforms these algorithms in several datasets. In addition, our new approach provides consistent outputs with five datasets where other algorithms perform poorly.  相似文献   

7.
This paper is concerned with the design and analysis of improved algorithms for determining the optimal length resolution refutation (OLRR) of a system of difference constraints over an integral domain. The problem of finding short explanations for unsatisfiable Difference Constraint Systems (DCS) finds applications in a number of design domains including program verification, proof theory, real-time scheduling, and operations research. These explanations have also been called “certificates” and “refutations” in the literature. This problem was first studied in Subramani (J Autom Reason 43(2):121–137, 2009), wherein the first polynomial time algorithm was proposed. In this paper, we propose two new strongly polynomial algorithms which improve on the existing time bound. Our first algorithm, which we call the edge progression approach, runs in O(n 2 · k + m · n · k) time, while our second algorithm, which we call the edge relaxation approach, runs in O(m · n · k) time, where m is the number of constraints in the DCS, n is the number of program variables, and k denotes the length of the shortest refutation. We conducted an extensive empirical analysis of the three OLRR algorithms discussed in this paper. Our experiments indicate that in the case of sparse graphs, the new algorithms discussed in this paper are superior to the algorithm in Subramani (J Autom Reason 43(2):121–137, 2009). Likewise, in the case of dense graphs, the approach in Subramani (J Autom Reason 43(2):121–137, 2009) is superior to the algorithms described in this paper. One surprising observation is the superiority of the edge relaxation algorithm over the edge progression algorithm in all cases, although both algorithms have the same asymptotic time complexity.  相似文献   

8.
We study the power of four query models in the context of property testing in general graphs, where our main case study is the problem of testing k-colorability. Two query types, which have been studied extensively in the past, are pair queries and neighbor queries. The former corresponds to asking whether there is an edge between any particular pair of vertices, and the latter to asking for the i th neighbor of a particular vertex. We show that while for pair queries testing k-colorability requires a number of queries that is a monotone decreasing function in the average degree d, the query complexity in the case of neighbor queries remains roughly the same for every density and for large values of k. We also consider a combined model that allows both types of queries, and we propose a new, stronger, query model, related to the field of Group Testing. We give upper and lower bounds on the query complexity for one-sided error in all the models, where the bounds are nearly tight for three of the models. In some of the cases, our lower bounds extend to two-sided error algorithms. The problem of testing k-colorability was previously studied in the contexts of dense graphs and of sparse graphs, and in our proofs we unify approaches from those cases, and also provide some new tools and techniques that may be of independent interest.  相似文献   

9.
We introduce a new approach for finding overlapping clusters given pairwise similarities of objects. In particular, we relax the problem of correlation clustering by allowing an object to be assigned to more than one cluster. At the core of our approach is an optimization problem in which each data point is mapped to a small set of labels, representing membership in different clusters. The objective is to find a mapping so that the given similarities between objects agree as much as possible with similarities taken over their label sets. The number of labels can vary across objects. To define a similarity between label sets, we consider two measures: (i) a 0–1 function indicating whether the two label sets have non-zero intersection and (ii) the Jaccard coefficient between the two label sets. The algorithm we propose is an iterative local-search method. The definitions of label set similarity give rise to two non-trivial optimization problems, which, for the measures of set-intersection and Jaccard, we solve using a greedy strategy and non-negative least squares, respectively. We also develop a distributed version of our algorithm based on the BSP model and implement it using a Pregel framework. Our algorithm uses as input pairwise similarities of objects and can thus be applied when clustering structured objects for which feature vectors are not available. As a proof of concept, we apply our algorithms on three different and complex application domains: trajectories, amino-acid sequences, and textual documents.  相似文献   

10.
Although the problem of data server placement in parallel and distributed systems has been studied extensively, most of the existing work assumes there is no competition between servers. Hence, their goal is to minimize read, update and storage cost. In this paper, we study the server placement problem in which a new server has to compete with existing servers for user requests. Therefore, in addition to minimizing cost, we also need to maximize the benefit of building a new server.Our major results include three parts. First, for tree-structured systems, we propose an O(|V|3k) time dynamic programming algorithm to find the optimal placement of k extra servers that maximizes the benefit in a tree with |V| nodes. We also propose an O(|V|3) time dynamic programming algorithm to find the optimal placement of extra servers that maximizes the benefit, without any constraint on the number of extra servers. Second, for general connected graphs, we prove that the server placement problems are NP-complete, and present three greedy heuristic algorithms, called Greedy Add, Greedy Remove and Greedy Add-Remove, to solve them. Third, we show that if the number of requests a server can handle (i.e., server capacity) is bounded, the server placement problem is NP-complete even for tree networks. We then derive a variation of the same set of greedy heuristic algorithms, with consideration of server capacity constraint, to solve the problem.Our experiment results demonstrate that the greedy algorithms achieve good results, when compared with the upper bounds found by a linear programming algorithm. Greedy Add performs best in the unconstrained model, yielding a benefit within 12% difference from the theoretical upper bound in average. For the constrained model, Greedy Remove performs best for smaller network sizes, while Greedy Add-Remove performs best for larger network sizes. On average, the heuristic algorithms yield a benefit within 13% difference from the theoretical upper bound in the constrained model.  相似文献   

11.
Choosing the best location for starting a business or expanding an existing enterprize is an important issue. A number of location selection problems have been discussed in the literature. They often apply the Reverse Nearest Neighbor as the criterion for finding suitable locations. In this paper, we apply the Average Distance as the criterion and propose the so-called k-most suitable locations (k-MSL) selection problem. Given a positive integer k and three datasets: a set of customers, a set of existing facilities, and a set of potential locations. The k-MSL selection problem outputs k locations from the potential location set, such that the average distance between a customer and his nearest facility is minimized. In this paper, we formally define the k-MSL selection problem and show that it is NP-hard. We first propose a greedy algorithm which can quickly find an approximate result for users. Two exact algorithms are then proposed to find the optimal result. Several pruning rules are applied to increase computational efficiency. We evaluate the algorithms’ performance using both synthetic and real datasets. The results show that our algorithms are able to deal with the k-MSL selection problem efficiently.  相似文献   

12.
In many applications, the properties of an object being modeled are stored as labels on vertices or edges of a graph. In this paper, we consider succinct representation of labeled graphs. Our main results are the succinct representations of labeled and multi-labeled graphs (we consider planar triangulations, planar graphs and k-page graphs) to support various label queries efficiently. The additional space cost to store the labels is essentially the information-theoretic minimum. As far as we know, our representations are the first succinct representations of labeled graphs. We also have two preliminary results to achieve the main contribution. First, we design a succinct representation of unlabeled planar triangulations to support the rank/select of edges in ccw (counter clockwise) order in addition to the other operations supported in previous work. Second, we design a succinct representation for a k-page graph when k is large to support various navigational operations more efficiently. In particular, we can test the adjacency of two vertices in O(lg?k) time, while previous work uses O(k) time.  相似文献   

13.
The goal in multi-label classification is to tag a data point with the subset of relevant labels from a pre-specified set. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations. Our objective, in this paper, is to design efficient algorithms for multi-label classification when the labels are densely correlated. In particular, we are interested in the zero-shot learning scenario where the label correlations on the training set might be significantly different from those on the test set. We propose a max-margin formulation where we model prior label correlations but do not incorporate pairwise label interaction terms in the prediction function. We show that the problem complexity can be reduced from exponential to linear while modelling dense pairwise prior label correlations. By incorporating relevant correlation priors we can handle mismatches between the training and test set statistics. Our proposed formulation generalises the effective 1-vs-All method and we provide a principled interpretation of the 1-vs-All technique. We develop efficient optimisation algorithms for our proposed formulation. We adapt the Sequential Minimal Optimisation (SMO) algorithm to multi-label classification and show that, with some book-keeping, we can reduce the training time from being super-quadratic to almost linear in the number of labels. Furthermore, by effectively re-utilizing the kernel cache and jointly optimising over all variables, we can be orders of magnitude faster than the competing state-of-the-art algorithms. We also design a specialised algorithm for linear kernels based on dual co-ordinate ascent with shrinkage that lets us effortlessly train on a million points with a hundred labels.  相似文献   

14.
We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u, v) is labeled either + or ? depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of ? edges between clusters (equivalently, minimizes the number of disagreements: the number of ? edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of “agnostic learning” problem. An interesting feature of this clustering formulation is that one does not need to specify the number of clusters k as a separate parameter, as in measures such as k-median or min-sum or min-max clustering. Instead, in our formulation, the optimal number of clusters could be any value between 1 and n, depending on the edge labels. We look at approximation algorithms for both minimizing disagreements and for maximizing agreements. For minimizing disagreements, we give a constant factor approximation. For maximizing agreements we give a PTAS, building on ideas of Goldreich, Goldwasser, and Ron (1998) and de la Veg (1996). We also show how to extend some of these results to graphs with edge labels in [?1, +1], and give some results for the case of random noise.  相似文献   

15.
We present a concurrent face routing CFR algorithm. We formally prove that the worst case latency of our algorithm is asymptotically optimal. Our simulation results demonstrate that, on average, the path stretch, i.e., the speed of message delivery, achieved by CFR is significantly better than by other known geometric routing algorithms. In fact, it approaches the shortest possible path. CFR maintains its advantage over the other algorithms in pure form as well as in combination with greedy routing. CFR displays this performance superiority both on planar and non-planar graphs.  相似文献   

16.
Maximal clique enumeration is a fundamental problem in graph theory and has been extensively studied. However, maximal clique enumeration is time-consuming in large graphs and always returns enormous cliques with large overlaps. Motivated by this, in this paper, we study the diversified top-k clique search problem which is to find top-k cliques that can cover most number of nodes in the graph. Diversified top-k clique search can be widely used in a lot of applications including community search, motif discovery, and anomaly detection in large graphs. A naive solution for diversified top-k clique search is to keep all maximal cliques in memory and then find k of them that cover most nodes in the graph by using the approximate greedy max k-cover algorithm. However, such a solution is impractical when the graph is large. In this paper, instead of keeping all maximal cliques in memory, we devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight \(\mathsf {PNP}\)-\(\mathsf {Index}\), based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. Besides, for the massive input graph, we develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. We conduct extensive performance studies on real graphs and synthetic graphs. One of the real graphs contains 1.02 billion edges. The results demonstrate the high efficiency and effectiveness of our approach.  相似文献   

17.
We establish a refined search tree technique for the parameterized DOMINATING SET problem on planar graphs. Here, we are given an undirected graph and we ask for a set of at most k vertices such that every other vertex has at least one neighbor in this set. We describe algorithms with running times O(8kn) and O(8kk+n3), where n is the number of vertices in the graph, based on bounded search trees. We describe a set of polynomial time data-reduction rules for a more general “annotated” problem on black/white graphs that asks for a set of k vertices (black or white) that dominate all the black vertices. An intricate argument based on the Euler formula then establishes an efficient branching strategy for reduced inputs to this problem. In addition, we give a family examples showing that the bound of the branching theorem is optimal with respect to our reduction rules. Our final search tree algorithm is easy to implement; its analysis, however, is involved.  相似文献   

18.
We study in this paper the problem of finding in a graph a subset of k edges whose deletion causes the largest increase in the weight of a minimum spanning tree. We propose for this problem an explicit enumeration algorithm whose complexity, when compared to the current best algorithm, is better for general k but very slightly worse for fixed k. More interestingly, unlike in the previous algorithms, we can easily adapt our algorithm so as to transform it into an implicit enumeration algorithm based on a branch and bound scheme. We also propose a mixed integer programming formulation for this problem. Computational results show a clear superiority of the implicit enumeration algorithm both over the explicit enumeration algorithm and the mixed integer program.  相似文献   

19.
We consider a special subgraph of a weighted directed graph: one comprising only the k heaviest edges incoming to each vertex. We show that the maximum weight branching in this subgraph closely approximates the maximum weight branching in the original graph. Specifically, it is within a factor of k/(k+1). Our interest in finding branchings in this subgraph is motivated by a data compression application in which calculating edge weights is expensive but estimating which are the heaviest k incoming edges is easy. An additional benefit is that since algorithms for finding branchings run in time linear in the number of edges our results imply faster algorithms although we sacrifice optimality by a small factor. We also extend our results to the case of edge-disjoint branchings of maximum weight and to maximum weight spanning forests.  相似文献   

20.
We say a vertex v in a graph G covers a vertex w if v=w or if v and w are adjacent. A subset of vertices of G is a dominating set if it collectively covers all vertices in the graph. The dominating set problem, which is NP-hard, consists of finding a smallest possible dominating set for a graph. The straightforward greedy strategy for finding a small dominating set in a graph consists of successively choosing vertices which cover the largest possible number of previously uncovered vertices. Several variations on this greedy heuristic are described and the results of extensive testing of these variations is presented. A more sophisticated procedure for choosing vertices, which takes into account the number of ways in which an uncovered vertex may be covered, appears to be the most successful of the algorithms which are analyzed. For our experimental testing, we used both random graphs and graphs constructed by test case generators which produce graphs with a given density and a specified size for the smallest dominating set. We found that these generators were able to produce challenging graphs for the algorithms, thus helping to discriminate among them, and allowing a greater variety of graphs to be used in the experiments. Received October 27, 1998; revised March 25, 2001.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号