首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Compile-time techniques for storage allocation of scalar values into memory modules that limit run-time memory-access conflicts are presented. The allocation approach is applicable to those operands in instructions that can be predicted at compile-time, where an instruction is composed of the multiple operations and corresponding operands that execute in parallel. Algorithms to schedule data transfers among memory modules to avoid conflicts that cannot be eliminated by the distribution of values alone are developed. The techniques have been implemented as part of a compiler for a reconfigurable long instruction word architecture. Results of experiments are presented demonstrating that a very high percentage of memory access conflicts can be avoided by scheduling a very low number of data transfers  相似文献   

2.
Data distribution has been one of the most important research topics in parallelizing compilers for distributed memory parallel computers. Good data distribution schema should consider both the computation load balance and the communication overhead. In this paper, we show that data redistribution is necessary for executing a sequence of Do-loops if the communication cost due to performing this sequence of Do-loops is larger than a threshold value. Based on this observation, we can prune the searching space and derive efficient dynamic programming algorithms for determining effective data distribution schema to execute a sequence of Do-loops with a general structure. Experimental studies on a 32-node nCUBE-2 computer are also presented  相似文献   

3.
连接操作是影响分布式查询性能的关键因素,数据存储是影响连接操作的重要因素.为了提高分布式系统的查询性能,通过研究数据之间的关系,提出一个关联数据分布树.利用该关联数据分布树来构造一系列的关联元组集合,然后按照各个站点的负载能力,把这些关联数据集合分配给相关站点.实验结果表明,当多个关系频繁的进行连接操作时,关联数据分布树能有效地提高整个分布式系统的查询性能.  相似文献   

4.
This paper presents a new method that can be applied by a parallelizing compiler to find, without user intervention, the iteration and data decompositions that minimize communication and load imbalance overheads in parallel programs targeted at NUMA architectures. One of the key ingredients in our approach is the representation of locality as a locality-communication graph (ICG) and the formulation of the compiler technique as a mixed integer nonlinear programming (MINLP) optimization problem on this graph. The objective function and constraints of the optimization problem model communication costs and load imbalance. The solution to this optimization problem is a decomposition that minimizes the parallel execution overhead. This paper summarizes the process of how the compiler extracts the locality information from a nonannotated code and focuses on how this compiler can derive the optimization problem, solve it, and generate the parallel code with the automatically selected iteration and data distributions. In addition, we include a discussion about our model and the solutions - the decompositions - that it provides. The approach presented in the paper is evaluated using several benchmarks. The experimental results demonstrate that the MINLP formulation does not increase compilation time significantly and that our framework generates very efficient iteration/data distributions for a variety of NUMA machines.  相似文献   

5.
Parallel architectures with physically distributed memory provide a cost-effective scalability to solve many large scale scientific problems. However, these systems are very difficult to program and tune. In these systems, the choice of a good data mapping and parallelization strategy can dramatically improve the efficiency of the resulting program. In this paper, we present a framework for automatic data mapping in the context of distributed memory multiprocessor systems. The framework is based on a new approach that allows the alignment, distribution, and redistribution problems to be solved together using a single graph representation. The Communication Parallelism Graph (CPG) is the structure that holds symbolic information about the potential data movement and parallelism inherent to the whole program. The CPG is then particularized for a given problem size and target system and used to find a minimal cost path through the graph using a general purpose linear 0-1 integer programming solver. The data layout strategy generated is optimal according to our current cost and compilation models  相似文献   

6.
7.
8.
在众多提高数据挖掘效率的方法中,并行数据挖掘是一个从根本上解决该问题的有效途径.首先指出在数据挖掘过程中,不论采用顺序挖掘还是并行挖掘,都必须以数据挖掘的最终目的为前提,即尽可能多地发现数据中所含有的有用的知识,然后在此基础上提高数据挖掘的较率.在该想法基础上,提出了面向数据特征的数据划分过程,并进一步提出了加权式的并行数据挖掘基本方法.在这种数据挖掘过程中,可以得到相对于部分数据的知识,在很大程度上提高了数据挖掘的动态性能.  相似文献   

9.
The paper describes a simple compiler analysis method for determining the “weight” of procedures in parallel logic programming languages. Using Flat Guarded Horn Clauses (FGHC) as an example, the analysis algorithm is described. Consideration of weights has been incorporated in the scheduler of a real-parallel FGHC emulator running on the Sequent Symmetry multiprocessor. Alternative demand-distribution methods are discussed, includingoldest-first andheaviest-first distributions. Performance measurements, collected from a group of non-trivial benchmarks on eight processors, show that the new schemes donot perform significantly faster than conventional distribution methods. This result is attributed to a combination of factors overshadowing the benefits of the new method: high system overheads, the low cost of spawning a goal on a shared memory multiprocessor, and the increase in synchronization caused by the new methods. Directions of further research are discussed, indicating where further speedup can be attained.  相似文献   

10.
The continuous growth of biodiversity databases has led to a search for techniques that can assist researchers. This paper presents a method for the analysis of occurrences of pairs and groups of species that aims to identify patterns in co-occurrences through the application of association rules of data mining. We propose, implement and evaluate a tool to help ecologists formulate and validate hypotheses regarding co-occurrence between two or more species. To validate our approach, we analyzed the occurrence of species with a dataset from the 50-ha Forest Dynamics Project on Barro Colorado Island (BCI). Three case studies were developed based on this tropical forest to evaluate patterns of positive and negative correlation. Our tool can be used to point co-occurrence in a multi-scale form and for multi-species, simultaneously, accelerating the identification process for the Spatial Point Pattern Analysis. This paper demonstrates that data mining, which has been used successfully in applications such as business and consumer profile analysis, can be a useful resource in ecology.  相似文献   

11.
The problem of entity resolution over probabilistic data (ERPD) arises in many applications that have to deal with probabilistic data. In many of these applications, probabilistic data is distributed among a number of nodes. The simple, centralized approach to the ERPD problem does not scale well as large amounts of data need to be sent to a central node. In this paper, we present FD (Fully Distributed), a decentralized algorithm for dealing with the ERPD problem over distributed data, with the goal of minimizing bandwidth usage and reducing processing time. FD is completely distributed and does not depend on the existence of certain nodes. We validated FD through implementation over a 75-node cluster and simulation using the PeerSim simulator. We used both synthetic and real-world data in our experiments. Our performance evaluation shows that FD can achieve major performance gains in terms of bandwidth usage and response time.  相似文献   

12.
为了改善数据重均衡的效果及减小数据迁移对系统性能的影响,提出一种上下文感知的数据重均衡方法.构建迁移时间预测模型,以刻画虚拟机环境上下文对数据迁移的影响,据此提出基于细粒度资源监测的上下文感知的数据重均衡算法CADR.实验结果表明,该迁移时间预测模型具有较低的错误率;CADR算法与传统数据重均衡算法相比,能够提供更好的均衡效果及更短的迁移时间.  相似文献   

13.
分布式数据挖掘中间层   总被引:3,自引:0,他引:3  
对如何简化机群系统上分布式数据挖掘系统的开发和维护,给出了一个完整的解决方案,并对数据挖掘系统的非算法部分进行深入的研究,给出了数据分布式存储、数据缓冲机制和负载平衡策略3个关键优化技术,并在实际应用中加以实现。  相似文献   

14.
The Parallel Programming Interface for Distributed Data (PPIDD) library provides an interface, suitable for use in parallel scientific applications, that delivers communications and global data management. The library can be built either using the Global Arrays (GA) toolkit, or a standard MPI-2 library. This abstraction allows the programmer to write portable parallel codes that can utilise the best, or only, communications library that is available on a particular computing platform.Program summaryProgram title: PPIDDCatalogue identifier: AEEF_v1_0Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEEF_1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 17 698No. of bytes in distributed program, including test data, etc.: 166 173Distribution format: tar.gzProgramming language: Fortran, CComputer: Many parallel systemsOperating system: VariousHas the code been vectorised or parallelized?: Yes. 2–256 processors usedRAM: 50 MbytesClassification: 6.5External routines: Global Arrays or MPI-2Nature of problem: Many scientific applications require management and communication of data that is global, and the standard MPI-2 protocol provides only low-level methods for the required one-sided remote memory access.Solution method: The Parallel Programming Interface for Distributed Data (PPIDD) library provides an interface, suitable for use in parallel scientific applications, that delivers communications and global data management. The library can be built either using the Global Arrays (GA) toolkit, or a standard MPI-2 library. This abstraction allows the programmer to write portable parallel codes that can utilise the best, or only, communications library that is available on a particular computing platform.Running time: Problem dependent. The test provided with the distribution takes only a few seconds to run.  相似文献   

15.
16.
17.
Solving nonlinear constraints over real numbers is a complex problem. Hence constraint logic programming languages like CLPR or Prolog III solve only linear constraints and delay nonlinear constraints until they become linear. This efficient implementation method has the disadvantage that sometimes computed answers are unsatisfiable or infinite loops occur due to the unsatisfiability of delayed nonlinear constraint These problems could be solved by using a more powerful constraint solver which can deal with nonlinear constraints like in RISC-CLP(Real). Since such powerful constraint solvers are not very efficient, we propose a compromise between these two extremes. We characterize a class of CLPR programs for which all delayed nonlinear constraints become linear at run time. Programs belonging to this class can be safely executed with the efficient CLPR method while the remaining programs need a more powerful constraint solver. This paper is an extended and revised version of Ref. 12). The research described in this paper was made during the author’s stay at the Max-planck-Institut für Informatik in Saarbrücken, Germany. It was supported in part by the German Ministry for Research and Technology (BMFT) under grant ITS 9103 and by the ESPRIT Basic Research Working Group 6028 (Construction of Computational Logics). The responsibility for the contents of this publication lies with the author.  相似文献   

18.
An evaluation is made of a way to reduce the cost of program restructuring by having a compiler determine the program's packing in virtual address space from an analysis of its source code. Two features of this method are the duplication of code modules in virtual address space and the inline substitution of the code for a called procedure. This compile-time restructuring algorithm is evaluated using the instruction-only address traces from a collection of programs. In a simulation of a virtual memory system using disks as secondary storage devices, the method is not successful, since it leads to a higher optimum space-time execution cost than that of the unrestructured program. The algorithm did reduce program space-time execution cost for some arbitrarily chosen memory allocations smaller than the optimum. This could be useful in a multiuser, multiprogrammed environment  相似文献   

19.
Scalable shared-memory multiprocessor systems are typically NUMA (nonuniform memory access) machines, where the exploitation of the memory hierarchy is critical to achieving high performance. Iterative data parallel loops with near-neighbor communication account for many important numerical applications. In such loops, the communication of partial results stresses the memory system performance. In this paper, we develop data placement schemes that minimize communication time where the near-neighbor interaction is determined by a stencil. Under a given loop partition, our compile-time algorithm partitions global data into four classes for each processor, with each class requiring specific consistency maintenance requirements. The ADAPT (Automatic Data Allocation and Partitioning Tool) system was implemented to automatically partition parallel code segments for the BBN TC2000, a scalable shared-memory multiprocessor. ADAPT caches global arrays and maintains data consistency in software through instructions that flush data from private caches. Restructuring of a fluid flow code segment by ADAPT improved performance by a factor of more than 3 on the BBN TC2000. Features in current generation pipelined processors with multiple functional units permit the overlap of memory accesses with computation. Our experiments on the BBN TC2000 show that the degree of overlap is limited by architectural parameters, such as the number of CPU registers.  相似文献   

20.
在编译辅助的垃圾收集系统中,由于方法的动态绑定等特性,编译时的分析算法保守处理虚方法调用,从而难以识别并显式回收在虚方法调用期间变为不活跃的对象.提出一种改进Java库方法调用的生命期分析策略:它以指向逃逸图为基础来抽象描述库方法对堆中对象活跃状态的改变模式,并将堆变化模式应用到Java程序的对象生命期分析中,从而增强现有对象回收技术对库调用分析的精确性.通过将本文方法应用到Jolden测试程序集中对String相关类的库调用的分析,结果表明新方法以平均增加12%的编译时间开销将显式回收对象的大小提升了33%-37%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号

京公网安备 11010802026262号