首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 109 毫秒
1.
以类OpenMP的并行程序为研究对象,在满足性能约束的条件下,结合异构系统并行循环调度和处理器动态电压调节技术优化系统功耗.首先建立了异构系统功耗感知的并行循环调度问题基本模型;然后,通过分析方法给出异构系统并行循环调度的能耗下界,该下界可用于评估功耗优化方法的实际效率;进而将异构系统并行循环调度问题归纳为整数规划问题,在此基础上,提出了处理器内循环再调度方法进一步降低功耗.最后,以CPU-GPU异构系统为平台评测了10个典型kernel程序.实验结果表明,该方法可以有效降低系统功耗,提高系统效能.  相似文献   

2.
本文用计算机数字模拟方法研究了电活性分子多层Z型L-B膜修饰电极的循环伏安行为,计算了L-B膜分子氧化态(还原态)的相互作用能θ_o(θ_r)及异相电荷转移反应的阳(阴)极传递系数α_a(α_c)对峰电位差、归一化峰面积、峰电流及半高宽的影响。应用数字模拟方法可以直接获得在实验中难以控制和变化的一些参数,如θ_o(θ_r),α_a(α_c)等参数的有关信息,显示了明显的优越性。  相似文献   

3.
软件流水是一种通过发掘循环的不同迭代的不同部分的指令间并行性,使这些指令并行执行,从而提高循环的执行效率的优化技术.但该技术在提高指令并行性的同时也增加了寄存器压力,而寄存器溢出技术正是解决寄存器压力的有效方法.摆动模调度是一种在进行近似最优化调度的同时尽力减小寄存器压力的软件流水算法,该算法已经作为一个新的优化遍出现在GCC的最新版本中.本文以GCC为平台,论述了摆动模调度中的寄存器溢出技术及其工程实现,从而使摆动模调度算法进一步增强了对寄存器压力的处理能力.  相似文献   

4.
AutoCADgn FOXPRO2.5 for DOS的接口   总被引:1,自引:0,他引:1  
本文介绍了采用FOXPRO2.5for Dos编程直接提取AutoCAD图形软件生成的几何图形实体信息,并在FOXPRO2.5 for Dos环境下显示AutoCAD几何图形的原理和方法,这些方法在建立工程图文数据库和CAD/CAM方面有十分重要的价值。  相似文献   

5.
本文介绍了采用FOXPRO2.5forDos编程直接提取AutoCAD图形软件生成的几何图形实体信息,并在FOXPRO2.5forDos环境下显示AutoCAD几何图形的原理和方法,这些方法在建立工程图文数据库和CAD/CAM方面有十分重要的价值。  相似文献   

6.
针对现有分布式循环自调度方案在异构云平台中存在负载不平衡等问题,提出一种基于多层架构的分层分布式动态循环调度方案。首先,通过HPLS算法来评估计算环境中各Worker节点的计算速度。然后,在传统自调度方案中融入节点计算速度,构建一种能够处理异构环境的调度方案,提高负载平衡能力。最后,将计算系统构建成一个由SuperMaster,Master和Worker节点组成的多层架构,利用层次化方法来解决传统Master-Worker架构中单个Master节点的瓶颈问题,用来提高任务分配效率。仿真实验结果表明,提出的方案能够有效提高云平台的计算效率。  相似文献   

7.
分析了I_Deas的通用格式输出文件中轮廓线的表示方法,简述了轮廓线的构造方法及其在几何特征造型中的作用。还介绍了在开发基于I_Deas的产品定义系统时成功地用轮廓线确定特征类型的方法。  相似文献   

8.
Oracle数据库利用SYSTEM用户下的PRODUCT_USER_PROFILE表来提供产品级的安全(Product-levelsecurity),用以补充用户级安全(User-levelsecurity)的不足。DBA能够利用PRODUCT_USER_PROFILE限制SQL*Plus环境下的一定的SQL和SQL*Plus命令。当一个用户登录到SQL*Plus时,SQL*Plus从PRODUCT_USER_PROFILE中读取约束并在此进程执行这些约束。本文介绍Oracle数据库SYST…  相似文献   

9.
在VB程序开发过程中,有时需要调用Windows下的控制面板中的各项。可以通过Win32API函数来实现此功能,但声明API函数比较麻烦。本文使用Shell函数,只需短短一行代码,即可完成对控制面板中各项的调用。以下所有代码均在Windows95/98VB5.0和VB6.0下调试通过。【鼠标】Shell″rundll32.exeshell32.dll,Control_RunDLLmain.cpl@0″,vbNormalFocus【键盘】Shell″rundll32.exeshell32.dll,Control_RunDL…  相似文献   

10.
冗余磁盘阵列RAID(RAID─RedundantArrayofInexpensiveDisks),是利用资源重复的并行性系统。我们认为它的特性可以从可靠性、容量冗余度和时间冗余度三方面来全面刻画,本文给出了表征时间冗余度的参数计算方法,并用三个综合指标对几种RAID进行了分析比较,结果表明,RAID2是一种综合性能指标较理想的阵列结构。  相似文献   

11.
Computing systems should be designed to exploit parallelism in order to improve performance. In general, a GPU (Graphics Processing Unit) can provide more parallelism than a CPU (Central Processing Unit), resulting in the wide usage of heterogeneous computing systems that utilize both the CPU and the GPU together. In the heterogeneous computing systems, the efficiency of the scheduling scheme, which selects the device to execute the application between the CPU and the GPU, is one of the most critical factors in determining the performance. This paper proposes a dynamic scheduling scheme for the selection of the device between the CPU and the GPU to execute the application based on the estimated-execution-time information. The proposed scheduling scheme enables the selection between the CPU and the GPU to minimize the completion time, resulting in a better system performance, even though it requires the training period to collect the execution history. According to our simulations, the proposed estimated-execution-time scheduling can improve the utilization of the CPU and the GPU compared to existing scheduling schemes, resulting in reduced execution time and enhanced energy efficiency of heterogeneous computing systems.  相似文献   

12.
赵姗  杨秋松  李明树 《软件学报》2019,30(4):1164-1190
为了满足应用程序的多样化需求,异构多核处理器出现并逐渐进入市场,其中的处理核心(core)具有不同的微架构或者指令集架构(ISA),为应用提供多样化特性支持,比如指令级并行(ILP)、内存级并行(MLP),这些核心协同工作满足整个计算系统的优化目标,比如高性能、低功耗或者良好的能效.然而,目前主流的调度技术主要是针对传统同构处理器架构设计,没有考虑异构硬件能力的差异性.在异构多核处理器环境下,调度技术如何感知硬件的异构特性,为不同类型的应用程序提供更加合适和匹配的硬件资源,这是值得探索的问题.对近年来在该研究领域的成果进行了综述研究,特别是在性能非对称多核处理器架构下,异构调度技术面临的优化目标、分析模型、调度决策和算法评估等主要问题进行了分析和描述,并依次对相关技术进行了系统的总结,最后从软硬件融合的角度对今后的研究工作进行了展望.  相似文献   

13.
《Parallel Computing》2007,33(10-11):700-719
We explore runtime mechanisms and policies for scheduling dynamic multi-grain parallelism on heterogeneous multi-core processors. Heterogeneous multi-core processors integrate conventional cores that run legacy codes with specialized cores that serve as computational accelerators. The term multi-grain parallelism refers to the exposure of multiple dimensions of parallelism from within the runtime system, so as to best exploit a parallel architecture with heterogeneous computational capabilities between its cores and execution units. We investigate user-level schedulers that dynamically “rightsize” the dimensions and degrees of parallelism on the cell broadband engine. The schedulers address the problem of mapping application-specific concurrency to an architecture with multiple hardware layers of parallelism, without requiring programmer intervention or sophisticated compiler support. We evaluate recently introduced schedulers for event-driven execution and utilization-driven dynamic multi-grain parallelization on Cell. We also present a new scheduling scheme for dynamic multi-grain parallelism, S-MGPS, which uses sampling of dominant execution phases to converge to the optimal scheduling algorithm. We evaluate S-MGPS on an IBM Cell BladeCenter with two realistic bioinformatics applications that infer large phylogenies. S-MGPS performs within 2–10% of the optimal scheduling algorithm in these applications, while exhibiting low overhead and little sensitivity to application-dependent parameters.  相似文献   

14.
With the rapid advance of computing technologies, it becomes more and more common to construct high-performance computing environments with heterogeneous commodity computers. Previous loop scheduling schemes were not designed for this kind of environments. Therefore, better loop scheduling schemes are needed to further increase the performance of the emerging heterogeneous PC cluster environments. In this paper, we propose a new heuristic for the performance-based approach to partition loop iterations according to the performance weighting of cluster/grid nodes. In particular, a new parameter is proposed to consider HPCC benchmark results as part of performance estimation. A heterogeneous cluster and grid were built to verify the proposed approach, and three kinds of application program were implemented for execution on cluster testbed. Experimental results show that the proposed approach performs better than the previous schemes on heterogeneous computing environments.  相似文献   

15.
Wavefront parallelism, in which parallelism is limited to hyperplanes in an iteration space, can arise when compilers apply tiling to loop nests to enhance locality. Previous approaches for scheduling wavefront parallelism focused on maximizing parallelism; balancing workloads, and reducing synchronization. In this paper, we show that on large-scale shared-memory multiprocessors, locality is a crucial factor. We make the distinction between intratile and intertile locality and show that as the number of processors grows, intertile locality becomes more important. We consider and experimentally evaluate existing strategies for scheduling wavefront parallelism. We show that dynamic self-scheduling can be efficiently used on a small number of processors, but performs poorly at large scale because it does not enhance intertile locality. By contrast, static scheduling strategies enhance intertile locality for small tiles, maintaining parallelism and resulting in better performance at large scale. Results from a Convex SPP1000 multiprocessor demonstrate the importance of taking intertile locality into account. Static scheduling outperforms dynamic self-scheduling by a factor of up to 2.3 on 30 processors  相似文献   

16.
Loops are the richest source of parallelism in scientific applications. A large number of loop scheduling schemes have therefore been devised for loops with and without data dependencies (modeled as dependence distance vectors) on heterogeneous clusters. The loops with data dependencies require synchronization via cross‐node communication. Synchronization requires fine‐tuning to overcome the communication overhead and to yield the best possible overall performance. In this paper, a theoretical model is presented to determine the granularity of synchronization that minimizes the parallel execution time of loops with data dependencies when these are parallelized on heterogeneous systems using dynamic self‐scheduling algorithms. New formulas are proposed for estimating the total number of scheduling steps when a threshold for the minimum work assigned to a processor is assumed. The proposed model uses these formulas to determine the synchronization granularity that minimizes the estimated parallel execution time. The accuracy of the proposed model is verified and validated via extensive experiments on a heterogeneous computing system. The results show that the theoretically optimal synchronization granularity, as determined by the proposed model, is very close to the experimentally observed optimal synchronization granularity, with no deviation in the best case, and within 38.4% in the worst case. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

17.
周静  曾国荪 《计算机工程》2007,33(20):15-17
并行编译的两大工作是程序代码划分和调度。对于调度问题,目前已有大量的解决方案,但是针对代码划分提取并行性的研究工作却非常少。该文提出了通过合并结点来划分DAG图的新的划分算法。实例分析证明,该算法是一种有效的、低复杂度的自适应代码划分解决方案,并且适用于异构计算的任务图划分。  相似文献   

18.
Parallel loop self‐scheduling on parallel and distributed systems has been a critical problem and it is becoming more difficult to deal with in the emerging heterogeneous cluster computing environments. In the past, some self‐scheduling schemes have been proposed as applicable to heterogeneous cluster computing environments. In recent years, multicore computers have been widely included in cluster systems. However, previous researches into parallel loop self‐scheduling did not consider certain aspects of multicore computers; for example, it is more appropriate for shared‐memory multiprocessors to adopt Open Multi‐Processing (OpenMP) for parallel programming. In this paper, we propose a performance‐based approach using hybrid OpenMP and MPI parallel programming, which partition loop iterations according to the performance weighting of multicore nodes in a cluster. Because iterations assigned to one MPI process are processed in parallel by OpenMP threads run by the processor cores in the same computational node, the number of loop iterations allocated to one computational node at each scheduling step depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

19.
The effectiveness of loop self-scheduling schemes has been shown on traditional multiprocessors in the past and computing clusters in the recent years. However, parallel loop scheduling has not been widely applied to computing grids, which are characterized by heterogeneous resources and dynamic environments. In this paper, a performance-based approach, taking the two characteristics above into consideration, is proposed to schedule parallel loop iterations on grid environments. Furthermore, we use a parameter, SWR, to estimate the proportion of the workload which can be scheduled statically, thus alleviating the effect of irregular workloads. Experimental results on a grid testbed show that the proposed approach can reduce the completion time for applications with regular or irregular workloads. Consequently, we claim that parallel loop scheduling can benefit applications on grid environments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号