首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user’s query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k retrieval problem in the framework of facility location analysis and prove the submodularity of that objective function which provides a theoretical approximation guarantee of factor 1?\(\frac{1}{e}\) for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first obtains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.  相似文献   

2.
The k-in-a-Path problem is to test whether a graph contains an induced path spanning k given vertices. This problem is NP-complete in general graphs, already when k=3. We show how to solve it in polynomial time on claw-free graphs, when k is an arbitrary fixed integer not part of the input. As a consequence, also the k-Induced Disjoint Paths and the k-in-a-Cycle problem are solvable in polynomial time on claw-free graphs for any fixed k. The first problem has as input a graph G and k pairs of specified vertices (s i ,t i ) for i=1,…,k and is to test whether G contain k mutually induced paths P i such that P i connects s i and t i for i=1,…,k. The second problem is to test whether a graph contains an induced cycle spanning k given vertices. When k is part of the input, we show that all three problems are NP-complete, even for the class of line graphs, which form a subclass of the class of claw-free graphs.  相似文献   

3.
A matrix M is said to be k-anonymous if for each row r in M there are at least k ? 1 other rows in M which are identical to r. The NP-hard k-Anonymity problem asks, given an n × m-matrix M over a fixed alphabet and an integer s > 0, whether M can be made k-anonymous by suppressing (blanking out) at most s entries. Complementing previous work, we introduce two new “data-driven” parameterizations for k-Anonymity—the number t in of different input rows and the number t out of different output rows—both modeling aspects of data homogeneity. We show that k-Anonymity is fixed-parameter tractable for the parameter t in , and that it is NP-hard even for t out = 2 and alphabet size four. Notably, our fixed-parameter tractability result implies that k-Anonymity can be solved in linear time when t in is a constant. Our computational hardness results also extend to the related privacy problems p-Sensitivity and ?-Diversity, while our fixed-parameter tractability results extend to p-Sensitivity and the usage of domain generalization hierarchies, where the entries are replaced by more general data instead of being completely suppressed.  相似文献   

4.
The top-k similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. Given two sets of objects $\mathcal U$ and $\mathcal V$ , a top-k similarity join returns k pairs of most similar objects from $\mathcal U \times \mathcal V$ . In the conventional model of top-k similarity join processing, an object is usually regarded as a point in a multi-dimensional space and the similarity is measured by some simple distance metrics like Euclidean distance. However, in many applications an object may be described by multiple values (instances) and the conventional model is not applicable since it does not address the distributions of object instances. In this paper, we study top-k similarity join over multi-valued objects. We apply two types of quantile based distance measures, ?-quantile distance and ?-quantile group-base distance, to explore the relative instance distribution among the multiple instances of objects. Efficient and effective techniques to process top-k similarity joins over multi-valued objects are developed following a filtering-refinement framework. Novel distance, statistic and weight based pruning techniques are proposed. Comprehensive experiments on both real and synthetic datasets demonstrate the efficiency and effectiveness of our techniques.  相似文献   

5.
The (undirected) Rooted Survivable Network Design (Rooted SND) problem is: given a complete graph on node set V with edge-costs, a root sV, and (node-)connectivity requirements , find a minimum cost subgraph G that contains r(t) internally-disjoint st-paths for all tT. For large values of k=maxtTr(t) Rooted SND is at least as hard to approximate as Directed Steiner Tree [Y. Lando, Z. Nutov, Inapproximability of survivable networks, Theoret. Comput. Sci. 410 (21–23) (2009) 2122–2125]. For Rooted SND, [J. Chuzhoy, S. Khanna, Algorithms for single-source vertex-connectivity, in: FOCS, 2008, pp. 105–114] gave recently an approximation algorithm with ratio O(k2logn). Independently, and using different techniques, we obtained at the same time a simpler primal–dual algorithm with the same ratio.  相似文献   

6.
Maximal clique enumeration is a fundamental problem in graph theory and has been extensively studied. However, maximal clique enumeration is time-consuming in large graphs and always returns enormous cliques with large overlaps. Motivated by this, in this paper, we study the diversified top-k clique search problem which is to find top-k cliques that can cover most number of nodes in the graph. Diversified top-k clique search can be widely used in a lot of applications including community search, motif discovery, and anomaly detection in large graphs. A naive solution for diversified top-k clique search is to keep all maximal cliques in memory and then find k of them that cover most nodes in the graph by using the approximate greedy max k-cover algorithm. However, such a solution is impractical when the graph is large. In this paper, instead of keeping all maximal cliques in memory, we devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight \(\mathsf {PNP}\)-\(\mathsf {Index}\), based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. Besides, for the massive input graph, we develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. We conduct extensive performance studies on real graphs and synthetic graphs. One of the real graphs contains 1.02 billion edges. The results demonstrate the high efficiency and effectiveness of our approach.  相似文献   

7.
8.
9.
How do the k-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we exploit them for applications? A k-core is the maximal subgraph in which all vertices have degree at least k. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. Here, we explore pervasive patterns related to k-cores and emerging in graphs from diverse domains. Our discoveries are: (1) Mirror Pattern: coreness (i.e., maximum k such that each vertex belongs to the k-core) is strongly correlated with degree. (2) Core-Triangle Pattern: degeneracy (i.e., maximum k such that the k-core exists) obeys a 3-to-1 power-law with respect to the count of triangles. (3) Structured Core Pattern: degeneracy–cores are not cliques but have non-trivial structures such as core–periphery and communities. Our algorithmic contributions show the usefulness of these patterns. (1) Core-A, which measures the deviation from Mirror Pattern, successfully spots anomalies in real-world graphs, (2) Core-D, a single-pass streaming algorithm based on Core-Triangle Pattern, accurately estimates degeneracy up to 12 \(\times \) faster than its competitor. (3) Core-S, inspired by Structured Core Pattern, identifies influential spreaders up to 17 \(\times \) faster than its competitors with comparable accuracy.  相似文献   

10.
Approximation Algorithms for the Directed k-Tour and k-Stroll Problems   总被引:1,自引:0,他引:1  
We consider two natural generalizations of the Asymmetric Traveling Salesman problem: the k-Stroll and the k-Tour problems. The input to the k-Stroll problem is a directed n-vertex graph with nonnegative edge lengths, an integer k, as well as two special vertices s and t. The goal is to find a minimum-length s-t walk, containing at least k distinct vertices (including the endpoints s,t). The k-Tour problem can be viewed as a special case of k-Stroll, where s=t. That is, the walk is required to be a tour, containing some pre-specified vertex s. When k=n, the k-Stroll problem becomes equivalent to Asymmetric Traveling Salesman Path, and k-Tour to Asymmetric Traveling Salesman. Our main result is a polylogarithmic approximation algorithm for the k-Stroll problem. Prior to our work, only bicriteria (O(log2 k),3)-approximation algorithms have been known, producing walks whose length is bounded by 3OPT, while the number of vertices visited is Ω(k/log2 k). We also show a simple O(log2 n/loglogn)-approximation algorithm for the k-Tour problem. The best previously known approximation algorithms achieved min(O(log3 k),O(log2 n?logk/loglogn)) approximation in polynomial time, and O(log2 k) approximation in quasipolynomial time.  相似文献   

11.
12.
13.
14.
15.

We consider the numerical solution of a phase field model for polycrystallization in the solidification of binary mixtures in a domain \( \varOmega \subset \mathbb {R}^2\). The model is based on a free energy in terms of three order parameters: the local orientation \(\varTheta \) of the crystals, the local crystallinity \(\phi \), and the concentration c of one of the components of the binary mixture. The equations of motion are given by an initial-boundary value problem for a coupled system of partial differential equations consisting of a regularized second order total variation flow in \( \varTheta \), an \(L^2\) gradient flow in \(\phi \), and a \(W^{1,2}(\varOmega )^*\) gradient flow in c. Based on an implicit discretization in time by the backward Euler scheme, we suggest a splitting method such that the three semidiscretized equations can be solved separately and prove existence of a solution. As far as the discretization in space is concerned, the fourth order Cahn–Hilliard type equation in c is taken care of by a \(\hbox {C}^0\) Interior Penalty Discontinuous Galerkin approximation which has the advantage that the same finite element space can be used as well for the spatial discretization of the equations in \( \varTheta \) and \( \phi \). The fully discretized equations represent parameter dependent nonlinear algebraic systems with the discrete time as a parameter. They are solved by a predictor corrector continuation strategy featuring an adaptive choice of the time-step. Numerical results illustrate the performance of the suggested numerical method.

  相似文献   

16.
Uncertain graph has been widely used to represent graph data with inherent uncertainty in structures. Reliability search is a fundamental problem in uncertain graph analytics. This paper investigates on a new problem with broad real-world applications, the top-k reliability search problem on uncertain graphs, that is, finding the k vertices v with the highest reliabilities of connections from a source vertex s to v. Note that the existing algorithm for the threshold-based reliability search problem is inefficient for the top-k reliability search problem. We propose a new algorithm to efficiently solve the top-k reliability search problem. The algorithm adopts two important techniques, namely the BFS sharing technique and the offline sampling technique. The BFS sharing technique exploits overlaps among different sampled possible worlds of the input uncertain graph and performs a single BFS on all possible worlds simultaneously. The offline sampling technique samples possible worlds offline and stores them using a compact structure. The algorithm also takes advantages of bit vectors and bitwise operations to improve efficiency. In addition, we generalize the top-k reliability search problem from single-source case to the multi-source case and show that the multi-source case of the problem can be equivalently converted to the single-source case of the problem. Moreover, we define two types of the reverse top-k reliability search problems with different semantics on uncertain graphs. We propose appropriate solutions for both of them. Extensive experiments carried out on both real and synthetic datasets verify that the optimized algorithm outperforms the baselines by 1–2 orders of magnitude in execution time while achieving comparable accuracy. Meanwhile, the optimized algorithm exhibits linear scalability with respect to the size of the input uncertain graph.  相似文献   

17.
18.
Dai  Yibo  Wang  Chao  Dong  Jian  Sun  Changyin 《Multimedia Tools and Applications》2020,79(47-48):35297-35313

Visual relationship detection is a task aiming at mining the information of interactions between the paired objects in the image, describing the image in the form of (subject ? predicate ? object). Most of the previous works regard it as a pure classification problem by taking the integrated triplets as the label of the image; however, the numerous combinations of objects and the diversity of predicates are the tough challenges for these studies. Hence, we propose a deep model based on a modified bidirectional recurrent neural network (BRNN) to classify object and predict predicate simultaneously. By using the BRNN, the hidden information of the relationship in the image is extracted and a feature-infusion method is proposed. Additionally, we improve the existing works by introducing a paired non-maximum suppression method. The experiments show that our approach is competitive with the state-of-the-art works.

  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号