首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Compile-time techniques for storage allocation of scalar values into memory modules that limit run-time memory-access conflicts are presented. The allocation approach is applicable to those operands in instructions that can be predicted at compile-time, where an instruction is composed of the multiple operations and corresponding operands that execute in parallel. Algorithms to schedule data transfers among memory modules to avoid conflicts that cannot be eliminated by the distribution of values alone are developed. The techniques have been implemented as part of a compiler for a reconfigurable long instruction word architecture. Results of experiments are presented demonstrating that a very high percentage of memory access conflicts can be avoided by scheduling a very low number of data transfers  相似文献   

2.
This paper outlines the design of an automatic source-to-source translator which can accept a program written in a subset of Fortran90 and determine a data distribution scheme (including, where beneficial, dynamic redistributions) for the arrays of the program. The translator generates a semantically equivalent Fortran90 program incorporating the distribution scheme in the form of language extensions. © 1998 John Wiley & Sons, Ltd.  相似文献   

3.
Data distribution has been one of the most important research topics in parallelizing compilers for distributed memory parallel computers. Good data distribution schema should consider both the computation load balance and the communication overhead. In this paper, we show that data redistribution is necessary for executing a sequence of Do-loops if the communication cost due to performing this sequence of Do-loops is larger than a threshold value. Based on this observation, we can prune the searching space and derive efficient dynamic programming algorithms for determining effective data distribution schema to execute a sequence of Do-loops with a general structure. Experimental studies on a 32-node nCUBE-2 computer are also presented  相似文献   

4.
连接操作是影响分布式查询性能的关键因素,数据存储是影响连接操作的重要因素.为了提高分布式系统的查询性能,通过研究数据之间的关系,提出一个关联数据分布树.利用该关联数据分布树来构造一系列的关联元组集合,然后按照各个站点的负载能力,把这些关联数据集合分配给相关站点.实验结果表明,当多个关系频繁的进行连接操作时,关联数据分布树能有效地提高整个分布式系统的查询性能.  相似文献   

5.
A large class of loop programs applied in solving differential equations, Fourier transforms, image processing and neural processing can be translated or rewritten into a vector execution form with a π-block dependence graph. In the paper we propose a multithreading strategy to partition such vectorized loops into multithread execution form. Each partitioned thread consists of instances of statements with localities in vector registers. The multithreading scheme gives a novel combination of loop unrolling, statement instances reordering, index shifting, vector register reuse exploiting and multithreading. For some cases of loop program with π-block dependence graph, experimental results show that our scheme assists vector compilers of the Convex C38 series to reduce the number of memory accesses and synchronizations among CPUs.  相似文献   

6.
This paper presents a new method that can be applied by a parallelizing compiler to find, without user intervention, the iteration and data decompositions that minimize communication and load imbalance overheads in parallel programs targeted at NUMA architectures. One of the key ingredients in our approach is the representation of locality as a locality-communication graph (ICG) and the formulation of the compiler technique as a mixed integer nonlinear programming (MINLP) optimization problem on this graph. The objective function and constraints of the optimization problem model communication costs and load imbalance. The solution to this optimization problem is a decomposition that minimizes the parallel execution overhead. This paper summarizes the process of how the compiler extracts the locality information from a nonannotated code and focuses on how this compiler can derive the optimization problem, solve it, and generate the parallel code with the automatically selected iteration and data distributions. In addition, we include a discussion about our model and the solutions - the decompositions - that it provides. The approach presented in the paper is evaluated using several benchmarks. The experimental results demonstrate that the MINLP formulation does not increase compilation time significantly and that our framework generates very efficient iteration/data distributions for a variety of NUMA machines.  相似文献   

7.

区块链技术近年来受到了广泛关注,并应用于各个领域,数据查询是其在应用过程的一个重要技术,如物流链中的数据溯源等. 随着区块链系统中交易数据量的持续增长,支持高并发事务处理的图式区块链成为区块链技术的研究热点. 图式区块链的高并发区块使得数据查询难以像传统链式结构依次遍历,可以根据图式结构采用广度优先或深度优先遍历策略,但这种查询方式存在效率低、验证难等问题. 针对图式区块链数据查询的效率和可验证性问题,提出了一种基于学习索引的高效可验证的图式区块链查询机制Lever. 该机制通过引入学习索引技术对图式区块链中时序数据分布特征进行学习以实现对索引过程的优化,旨在提高图式区块链查询的效率和可验证性. 学习索引是通过学习数据分布来减少索引存储空间和查询时间的新型索引技术,将学习索引应用于图式区块链的纪元高度与时间戳的映射关系中,通过函数运算的方式定位查询数据,提高查询速度和效率. 同时,为了加快纪元内多个区块数据的过滤速度,在每个区块头部添加布隆过滤器,并为每个纪元生成一个聚合布隆过滤器,从而提高纪元内的数据遍历速度. 此外,为保证查询结果的正确性和完整性,该机制结合布隆过滤器和排序默克尔树生成可验证对象,通过部分默克尔树分支实现对布隆过滤器假阳性的不存在证明,有效减小验证对象的规模,从而提高图式区块链查询过程的数据传输效率. 实验结果表明,Lever能有效提高基于DAG的图式区块链查询效率和可验证性,与Conflux的基本查询机制相比,该机制的查询性能最高提升了10倍,可验证对象大小开销可以降低90%.

  相似文献   

8.
Parallel architectures with physically distributed memory provide a cost-effective scalability to solve many large scale scientific problems. However, these systems are very difficult to program and tune. In these systems, the choice of a good data mapping and parallelization strategy can dramatically improve the efficiency of the resulting program. In this paper, we present a framework for automatic data mapping in the context of distributed memory multiprocessor systems. The framework is based on a new approach that allows the alignment, distribution, and redistribution problems to be solved together using a single graph representation. The Communication Parallelism Graph (CPG) is the structure that holds symbolic information about the potential data movement and parallelism inherent to the whole program. The CPG is then particularized for a given problem size and target system and used to find a minimal cost path through the graph using a general purpose linear 0-1 integer programming solver. The data layout strategy generated is optimal according to our current cost and compilation models  相似文献   

9.
针对OpenCL(open computing language)编译时期的特有模式, 提出了一种新的针对异构计算平台的编译期优化方法。该方法根据设备端和主机端的各自特点, 将设备端的一些冗余操作提到主机端或者新的设备端kernel中去执行, 以达到降低存储器读写的目的。这种方法充分利用了异构计算平台的特点, 较传统优化方法相对灵活。大多数情况下能有效提高OpenCL的运行速度, 测试用例中在应用原有编译器优化的基础上最快提高了270%。  相似文献   

10.
The paper describes a simple compiler analysis method for determining the “weight” of procedures in parallel logic programming languages. Using Flat Guarded Horn Clauses (FGHC) as an example, the analysis algorithm is described. Consideration of weights has been incorporated in the scheduler of a real-parallel FGHC emulator running on the Sequent Symmetry multiprocessor. Alternative demand-distribution methods are discussed, includingoldest-first andheaviest-first distributions. Performance measurements, collected from a group of non-trivial benchmarks on eight processors, show that the new schemes donot perform significantly faster than conventional distribution methods. This result is attributed to a combination of factors overshadowing the benefits of the new method: high system overheads, the low cost of spawning a goal on a shared memory multiprocessor, and the increase in synchronization caused by the new methods. Directions of further research are discussed, indicating where further speedup can be attained.  相似文献   

11.
12.
This paper analyzes a problem that occurs in distributed database systems. The problem is how to minimize the communication cost incurred by inter-site data transfers that are associated with on-line retrieval requests. A mathematical model of the problem is developed and results in an NP-Complete integer linear program. A lower bounding procedure, based on a Lagrangean relaxation and sub-gradient optimization, is devised and heuristic procedures are proposed. The results of computational experiments reveal good performance of the heuristic procedures.  相似文献   

13.
The continuous growth of biodiversity databases has led to a search for techniques that can assist researchers. This paper presents a method for the analysis of occurrences of pairs and groups of species that aims to identify patterns in co-occurrences through the application of association rules of data mining. We propose, implement and evaluate a tool to help ecologists formulate and validate hypotheses regarding co-occurrence between two or more species. To validate our approach, we analyzed the occurrence of species with a dataset from the 50-ha Forest Dynamics Project on Barro Colorado Island (BCI). Three case studies were developed based on this tropical forest to evaluate patterns of positive and negative correlation. Our tool can be used to point co-occurrence in a multi-scale form and for multi-species, simultaneously, accelerating the identification process for the Spatial Point Pattern Analysis. This paper demonstrates that data mining, which has been used successfully in applications such as business and consumer profile analysis, can be a useful resource in ecology.  相似文献   

14.
15.
在众多提高数据挖掘效率的方法中,并行数据挖掘是一个从根本上解决该问题的有效途径.首先指出在数据挖掘过程中,不论采用顺序挖掘还是并行挖掘,都必须以数据挖掘的最终目的为前提,即尽可能多地发现数据中所含有的有用的知识,然后在此基础上提高数据挖掘的较率.在该想法基础上,提出了面向数据特征的数据划分过程,并进一步提出了加权式的并行数据挖掘基本方法.在这种数据挖掘过程中,可以得到相对于部分数据的知识,在很大程度上提高了数据挖掘的动态性能.  相似文献   

16.
Scalable shared-memory multiprocessor systems are typically NUMA (nonuniform memory access) machines, where the exploitation of the memory hierarchy is critical to achieving high performance. Iterative data parallel loops with near-neighbor communication account for many important numerical applications. In such loops, the communication of partial results stresses the memory system performance. In this paper, we develop data placement schemes that minimize communication time where the near-neighbor interaction is determined by a stencil. Under a given loop partition, our compile-time algorithm partitions global data into four classes for each processor, with each class requiring specific consistency maintenance requirements. The ADAPT (Automatic Data Allocation and Partitioning Tool) system was implemented to automatically partition parallel code segments for the BBN TC2000, a scalable shared-memory multiprocessor. ADAPT caches global arrays and maintains data consistency in software through instructions that flush data from private caches. Restructuring of a fluid flow code segment by ADAPT improved performance by a factor of more than 3 on the BBN TC2000. Features in current generation pipelined processors with multiple functional units permit the overlap of memory accesses with computation. Our experiments on the BBN TC2000 show that the degree of overlap is limited by architectural parameters, such as the number of CPU registers.  相似文献   

17.
在编译辅助的垃圾收集系统中,由于方法的动态绑定等特性,编译时的分析算法保守处理虚方法调用,从而难以识别并显式回收在虚方法调用期间变为不活跃的对象.提出一种改进Java库方法调用的生命期分析策略:它以指向逃逸图为基础来抽象描述库方法对堆中对象活跃状态的改变模式,并将堆变化模式应用到Java程序的对象生命期分析中,从而增强现有对象回收技术对库调用分析的精确性.通过将本文方法应用到Jolden测试程序集中对String相关类的库调用的分析,结果表明新方法以平均增加12%的编译时间开销将显式回收对象的大小提升了33%-37%.  相似文献   

18.
The problem of entity resolution over probabilistic data (ERPD) arises in many applications that have to deal with probabilistic data. In many of these applications, probabilistic data is distributed among a number of nodes. The simple, centralized approach to the ERPD problem does not scale well as large amounts of data need to be sent to a central node. In this paper, we present FD (Fully Distributed), a decentralized algorithm for dealing with the ERPD problem over distributed data, with the goal of minimizing bandwidth usage and reducing processing time. FD is completely distributed and does not depend on the existence of certain nodes. We validated FD through implementation over a 75-node cluster and simulation using the PeerSim simulator. We used both synthetic and real-world data in our experiments. Our performance evaluation shows that FD can achieve major performance gains in terms of bandwidth usage and response time.  相似文献   

19.
为了改善数据重均衡的效果及减小数据迁移对系统性能的影响,提出一种上下文感知的数据重均衡方法.构建迁移时间预测模型,以刻画虚拟机环境上下文对数据迁移的影响,据此提出基于细粒度资源监测的上下文感知的数据重均衡算法CADR.实验结果表明,该迁移时间预测模型具有较低的错误率;CADR算法与传统数据重均衡算法相比,能够提供更好的均衡效果及更短的迁移时间.  相似文献   

20.
An evaluation is made of a way to reduce the cost of program restructuring by having a compiler determine the program's packing in virtual address space from an analysis of its source code. Two features of this method are the duplication of code modules in virtual address space and the inline substitution of the code for a called procedure. This compile-time restructuring algorithm is evaluated using the instruction-only address traces from a collection of programs. In a simulation of a virtual memory system using disks as secondary storage devices, the method is not successful, since it leads to a higher optimum space-time execution cost than that of the unrestructured program. The algorithm did reduce program space-time execution cost for some arbitrarily chosen memory allocations smaller than the optimum. This could be useful in a multiuser, multiprogrammed environment  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号