首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Luo  Le  Liu  Yi  Yang  Hailong  Qian  Depei 《The Journal of supercomputing》2022,78(4):5650-5680

Graph analytics plays an important role in many areas such as big data and artificial intelligence. The vertex-centric programming model provides friendly interfaces to programmers and is extensively used in graph processing frameworks. However, it is prone to generate many irregular memory accesses and scheduling overhead due to vertex-based execution and scheduling of programs in the backend. Instead, the matrix-based model provides a different approach by using high-performance matrix operations in the backend to improve the efficiency of graph processing. Unfortunately, current matrix-based frameworks only support the synchronous parallel model, which constrains its application to various graph algorithms. To address these problems, this paper proposes a graph processing framework, which combines matrix operations with the asynchronous model while providing friendly programming interfaces similar to vertex-centric programming model. Firstly, we propose an approach to map the vertex-based graph processing to matrix operations in the asynchronous model. Then, we propose two asynchronous scheduling policies, Gauss–Seidel policy and relaxed Gauss–Seidel policy, for different graph algorithms. After that, our framework applies the batch scheduling and optimized in-memory data structure to reduce the scheduling overhead introduced by the asynchronous model. Experimental results show that our framework performs better than the popular vertex programming frameworks such as GraphLab and GRACE in both performance and speedup and achieves similar performance compared to the BSP-based matrix framework such as GraphMat.

  相似文献   

3.
4.
This paper studies certain transformations of XML schemas, which are widely used in algorithms of the XML data management. In view of the fact that properties and functional characteristics of the XML documents considerably differ from those of data of other type, the solutions of a number of typical data management problems (such as the XML data validation, schema inference, and data translation to/from other models) for them are more complicated. The general idea of our approach to solving these problems is to transform the original structure (i.e., structural schema constraints) into another structure without loss of information about properties of the original data that are important for applications. The suggested technique has been successfully used in various algorithms for solving problems of this kind. In this paper, a systematic approach to solving these problems is discussed. Methods for reducing the XML schemas to several canonical forms are presented, and algorithms of solving the management problems for data satisfying schemas represented in the canonical forms are examined.  相似文献   

5.
由于图模型能够准确地表示科学与工程领域中数据的关键特征,图挖掘逐渐成为了数据挖掘领域的热点研究内容.图分类是图挖掘的一个重要研究分支.提出了一种新的基于频繁闭显露模式的图分类方法CEP,其基本思想是首先挖掘频繁闭图模式,然后从闭图模式中得到显露模式,最后根据显露模式构造一系列分类规则.实验结果显示:在对化合物数据分类时,CEP在分类性能上优于目前最好的图分类方法.而且,领域专家容易理解和利用CEP产生的分类规则.  相似文献   

6.
In recent years, the MapReduce framework has become one of the most popular parallel computing platforms for processing big data. MapReduce is used by companies such as Facebook, IBM, and Google to process or analyze massive data sets. Since the approach is frequently used for industrial solutions, the algorithms based on the MapReduce framework gained significant attention within the scientific community. The subgraph isomorphism is a fundamental graph theory problem. Finding small patterns in large graphs is a core challenge in the analysis of applications with big data sets. This paper introduces two novel algorithms, which are capable of finding matching patterns in arbitrary large graphs. The algorithms are designed for utilizing the easy parallelization technique offered by the MapReduce framework. The approaches are evaluated regarding their space and memory requirements. The paper also provides the applied data structure and presents formal analysis of the algorithms.  相似文献   

7.
Exploiting locality for irregular scientific codes   总被引:2,自引:0,他引:2  
Irregular scientific codes experience poor cache performance due to their irregular memory access patterns. In this paper, we present two new locality improving techniques for irregular scientific codes. Our techniques exploit geometric structures hidden in data access patterns and computation structures. Our new data reordering (GPART) finds the graph structure within data accesses and applies hierarchical clustering. Quality partitions are constructed quickly by clustering multiple neighbor nodes with priority on nodes with high degree and repeating a few passes. Overhead is kept low by clustering multiple nodes in each pass and considering only edges between partitions. Our new computation reordering (Z-SORT) treats the values of index arrays as coordinates and reorders corresponding computations in Z-curve order. Applied to dense inputs, Z-SORT achieves performance close to data reordering combined with other computation reordering but without the overhead involved in data reordering. Experiments on irregular scientific codes for a variety of meshes show locality optimization techniques are effective for both sequential and parallelized codes, improving performance by 60-87 percent. GPART achieved within 1-2 percent of the performance of more sophisticated partitioning algorithms, but with one third of the overhead. Z-SORT also yields the performance improvement of 64 percent for dense inputs, which is comparable with data reordering combined with computation reordering.  相似文献   

8.
This paper surveys some of the recent theoretical work on data structures. Work employing abstract graph models is covered as well as work on specific structures such as arrays and string patterns. Also included is work treating algorithms on complex structures and proving properties of programs that manipulate such structures. The paper concludes with a section on results using program schemata models to compare the utility of various data structures.  相似文献   

9.
Web prefetching is a technique aimed at reducing user-perceived latencies in the World Wide Web. The spatial locality shown by user accesses makes it possible to predict future accesses from the previous ones. A prefetching engine uses these predictions to prefetch web objects before the user demands them. The existing prediction algorithms achieved an acceptable performance when they were proposed but the high increase in the number of embedded objects per page has reduced their effectiveness in the current web. In this paper, we show that most of the predictions made by the existing algorithms are not useful to reduce the user-perceived latency because these algorithms do not take into account the structure of the current web pages, i.e., an HTML object with several embedded objects. Thus, they predict the accesses to the embedded objects in an HTML after reading the HTML itself. For this reason, the prediction is not made early enough to prefetch the objects and, therefore, there is no latency reduction. In this paper we present the double dependency graph (DDG) algorithm that distinguishes between container objects (HTML) and embedded objects to create a new prediction model according to the structure of the current web. Results show that, for the same number of extra requests to the server, DDG reduces the perceived latency, on average, 40% more than the existing algorithms. Moreover, DDG distributes latency reductions more homogeneously among users.  相似文献   

10.
Coined quantum walks (QWs) are being used in many contexts with the goal of understanding quantum systems and building quantum algorithms for quantum computers. Alternative models such as Szegedy’s and continuous-time QWs were proposed taking advantage of the fact that quantum theory seems to allow different quantized versions based on the same classical model, in this case the classical random walk. In this work, we show the conditions upon which coined QWs are equivalent to Szegedy’s QWs. Those QW models have in common a large class of instances, in the sense that the evolution operators are equal when we convert the graph on which the coined QW takes place into a bipartite graph on which Szegedy’s QW takes place, and vice versa. We also show that the abstract search algorithm using the coined QW model can be cast into Szegedy’s searching framework using bipartite graphs with sinks.  相似文献   

11.
基于图结构的候选序列生成算法   总被引:3,自引:1,他引:3  
郭平  刘潭仁 《计算机科学》2004,31(1):136-139
先生成候选序列再判断候选序列是否为频繁序列,最后获得频繁序列是序列数据挖掘中基于候选序列挖掘算法的一般结构,如Apriori类算法,GSP算法,SPADE算法等。因此,研究候选序列生成算法具有普遍意义。本文首先研究了序列数据集(序列数据库)与图结构间的关系,证明了一个序列是频繁序列的必要条件是该序列对应于一个完全子图。以此为基础提出了基于图结构的候选序列生成算法,文中给出了算法正确性证明。在T25110D10K和T25120D100K数据集上的挖掘实验表明在本文提出的候选序列生成算法上进行挖掘比用Apriori算法进行挖掘的效率更高。  相似文献   

12.
将语义数据流处理引擎与知识图谱嵌入表示学习相结合,可以有效提高实时数据流推理查询性能,但是现有的知识表示学习模型更多关注静态知识图谱嵌入,忽略了知识图谱的动态特性,导致难以应用于实时动态语义数据流推理任务。为了使知识表示学习模型适应知识图谱的在线更新并能够应用于语义数据流引擎,建立一种基于改进多嵌入空间的动态知识图谱嵌入模型PUKALE。针对传递闭包等复杂推理场景,提出3种嵌入空间生成算法。为了在进行增量更新时更合理地选择嵌入空间,设计2种嵌入空间选择算法。基于上述算法实现PUKALE模型,并将其嵌入数据流推理引擎CSPARQL-engine中,以实现实时语义数据流推理查询。实验结果表明,与传统的CSPARQL和KALE推理相比,PUKALE模型的推理查询时间分别约降低85%和93%,其在支持动态图谱嵌入的同时能够提升实时语义数据流推理准确率。  相似文献   

13.
The query space of a similarity query is usually narrowed down by pruning inactive query subspaces which contain no query results and keeping active query subspaces which may contain objects corre-sponding to the request. However,some active query subspaces may contain no query results at all,those are called false active query subspaces. It is obvious that the performance of query processing degrades in the presence of false active query subspaces. Our experiments show that this problem becomes seriously when the data are high dimensional and the number of accesses to false active sub-spaces increases as the dimensionality increases. In order to solve this problem,this paper proposes a space mapping approach to reducing such unnecessary accesses. A given query space can be re-fined by filtering within its mapped space. To do so,a mapping strategy called maxgap is proposed to improve the efficiency of the refinement processing. Based on the mapping strategy,an index structure called MS-tree and algorithms of query processing are presented in this paper. Finally,the performance of MS-tree is compared with that of other competitors in terms of range queries on a real data set.  相似文献   

14.
Attributed directed graphs are directed graphs in which nodes are associated with sets of attributes. Many data from the real world can be naturally represented by this type of structure, but few algorithms are able to directly handle these complex graphs. Mining attributed graphs is a difficult task because it requires combining the exploration of the graph structure with the identification of frequent itemsets. In addition, due to the combinatorics on itemsets, subgraph isomorphisms (which have a significant impact on performances) are much more numerous than in labeled graphs. In this paper, we present a new data mining method that can extract frequent patterns from one or more directed attributed graphs. We show how to reduce the combinatorial explosion induced by subgraph isomorphisms thanks to an appropriate processing of automorphic patterns.  相似文献   

15.
近年来,面向确定性知识图谱的嵌入模型在知识图谱补全等任务中取得了长足的进展,但如何设计和训练面向非确定性知识图谱的嵌入模型仍然是一个重要挑战。不同于确定性知识图谱,非确定性知识图谱的每个事实三元组都有着对应的置信度,因此,非确定性知识图谱嵌入模型需要准确地计算出每个三元组的置信度。现有的非确定性知识图谱嵌入模型结构较为简单,只能处理对称关系,并且无法很好地处理假负(false-negative)样本问题。为了解决上述问题,该文首先提出了一个用于训练非确定性知识图谱嵌入模型的统一框架,该框架使用基于多模型的半监督学习方法训练非确定性知识图谱嵌入模型。为了解决半监督学习中半监督样本噪声过高的问题,我们还使用蒙特卡洛Dropout计算出模型对输出结果的不确定度,并根据该不确定度有效地过滤了半监督样本中的噪声数据。此外,为了更好地表示非确定性知识图谱中实体和关系的不确定性以处理更复杂的关系,该文还提出了基于Beta分布的非确定性知识图谱嵌入模型UBetaE,该模型将实体、关系均表示为一组相互独立的Beta分布。在公开数据集上的实验结果表明,结合该文所提出的半监督学习方法和UBetaE模型,不仅极大地缓解了假负样本问题,还在多个任务中明显优于UKGE等当前最优的非确定性知识图谱嵌入模型。  相似文献   

16.
It is increasingly common to find graphs in which edges are of different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity of a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubic-time algorithms for evaluating graph pattern queries, as opposed to the NP-completeness of graph pattern matching via subgraph isomorphism. (4) The effectiveness and efficiency of these algorithms are experimentally verified using real-life data and synthetic data.  相似文献   

17.
Maximal biclique (also known as complete bipartite) subgraphs can model many applications in Web mining, business, and bioinformatics. Enumerating maximal biclique subgraphs from a graph is a computationally challenging problem, as the size of the output can become exponentially large with respect to the vertex number when the graph grows. In this paper, we efficiently enumerate them through the use of closed patterns of the adjacency matrix of the graph. For an undirected graph G without self-loops, we prove that 1) the number of closed patterns in the adjacency matrix of G is even, 2) the number of the closed patterns is precisely double the number of maximal biclique subgraphs of G, and 3) for every maximal biclique subgraph, there always exists a unique pair of closed patterns that matches the two vertex sets of the subgraph. Therefore, the problem of enumerating maximal bicliques can be solved by using efficient algorithms for mining closed patterns, which are algorithms extensively studied in the data mining field. However, this direct use of existing algorithms causes a duplicated enumeration. To achieve high efficiency, we propose an O(mn) time delay algorithm for a nonduplicated enumeration, in particular, for enumerating those maximal bicliques with a large size, where m and n. are the number of edges and vertices of the graph, respectively. We evaluate the high efficiency of our algorithm by comparing it to state- of-the-art algorithms on three categories of graphs: randomly generated graphs, benchmarks, and a real-life protein interaction network. In this paper, we also prove that if self-loops are allowed in a graph, then the number of closed patterns in the adjacency matrix is not necessarily even, but the maximal bicliques are exactly the same as those of the graph after removing all the self-loops.  相似文献   

18.
Secure Data Objects Replication in Data Grid   总被引:1,自引:0,他引:1  
Secret sharing and erasure coding-based approaches have been used in distributed storage systems to ensure the confidentiality, integrity, and availability of critical information. To achieve performance goals in data accesses, these data fragmentation approaches can be combined with dynamic replication. In this paper, we consider data partitioning (both secret sharing and erasure coding) and dynamic replication in data grids, in which security and data access performance are critical issues. More specifically, we investigate the problem of optimal allocation of sensitive data objects that are partitioned by using secret sharing scheme or erasure coding scheme and/or replicated. The grid topology we consider consists of two layers. In the upper layer, multiple clusters form a network topology that can be represented by a general graph. The topology within each cluster is represented by a tree graph. We decompose the share replica allocation problem into two subproblems: the Optimal Intercluster Resident Set Problem (OIRSP) that determines which clusters need share replicas and the Optimal Intracluster Share Allocation Problem (OISAP) that determines the number of share replicas needed in a cluster and their placements. We develop two heuristic algorithms for the two subproblems. Experimental studies show that the heuristic algorithms achieve good performance in reducing communication cost and are close to optimal solutions.  相似文献   

19.
In an effort to achieve lower bandwidth requirements, video compression algorithms have become increasingly complex. Consequently, the deployment of these algorithms on field programmable gate arrays (FPGAs) is becoming increasingly desirable, because of the computational parallelism on these platforms as well as the measure of flexibility afforded to designers. Typically, video data are stored in large and slow external memory arrays, but the impact of the memory access bottleneck may be reduced by buffering frequently used data in fast on-chip memories. The order of the memory accesses, resulting from many compression algorithms are dependent on the input data (Jain in Proceedings of the IEEE, pp. 349–389, 1981). These data-dependent memory accesses complicate the exploitation of data re-use, and subsequently reduce the extent to which an application may be accelerated. In this paper, we present a hybrid memory sub-system which is able to capture data re-use effectively in spite of data-dependent memory accesses. This memory sub-system is made up of a custom parallel cache and a scratchpad memory. Further, the framework is capable of exploiting 2D spatial locality, which is frequently exhibited in the access patterns of image processing applications. In a case study involving the quad-tree structured pulse code modulation (QSDPCM) application, the impact of data dependence on memory accesses is shown to be significant. In comparison with an implementation which only employs an SPM, performance improvements of up to 1.7× and 1.4× are observed through actual implementation on two modern FPGA platforms. These performance improvements are more pronounced for image sequences exhibiting greater inter-frame movements. In addition, reductions of on-chip memory resources by up to 3.2× are achievable using this framework. These results indicate that, on custom hardware platforms, there is substantial scope for improvement in the capture of data re-use when memory accesses are data dependent.
Su-Shin AngEmail: Email:
  相似文献   

20.
Zhang  Fan  Zou  Lei  Zeng  Li  Gou  Xiangyang 《World Wide Web》2020,23(2):873-903

A streaming graph is a graph formed by a sequence of incoming edges with time stamps. Unlike the static graphs, the streaming graph is highly dynamic and time-related. Streaming graphs in the real world, which are of the high volume and velocity, can be challenging to the classic graph data structures: data of internet traffic, social network communication, and financial transections, etc. The traditional graph storage models like the adjacency matrix and the adjacency list are no longer sufficient for the large amount data and high frequency updates. And most the streaming graph structures are only supports the specific graph algorithms. Here a new data structure is presented to meet the challenge: a double orthogonal list in hash table (Dolha) as a high speed and high memory efficiency graph structure. Dolha has constant time cost for single edge processing, and near-linear space cost. Moreover, time cost for neighborhood queries in Dolha is linear, which enables it to support most algorithms of graphs without extra cost. A persistent structure based on Dolha is also presented, to handle the sliding window update and time related queries.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号