期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李远成阴培培赵银亮《计算机学报》2014,(3):3580-3592

推测多线程(Speculative Multithreading,SpMT)技术是一种实现非规则程序自动并行化的有效途径.然而,如何有效评估由诸如控制、数据依赖等因素导致的多种并行开销并实现最优线程划分一直是制约加速比性能提升的关键问题.基于启发式规则的传统划分方法虽然可以取得一定的加速效果,但由于启发式规则只能对多种并行开销进行定性评估,因而导致只能得到经验上较优的线程划分.针对传统划分方法的局限性,文中首次提出并实现了一种基于模糊聚类的线程划分方法.在该方法中,作者首先提出一种评估模型来定量评估各种并行开销,然后通过深入分析各种并行开销来确定最佳的线程解搜索空间,最终利用聚类方法实现有效线程解空间搜索以求取更优的线程划分.基于Olden程序集的测试结果表明,文中提出的线程划分方法可以有效地对非规则程序进行划分,其平均加速比可达到1.85. 相似文献

2.

基于人工免疫算法的推测多线程线程划分参数的优化（英文）

Yu-xiang LI Yin-liang ZHAO Bin LIU Shuo JI 《浙江大学学报:C卷英文版》2015,(3):205-216

目的:用人工免疫算法优化线程划分过程的主要影响因素,使不同应用获得最优划分方案。创新点:将智能算法应用到推测多线程技术,实现该技术在线程划分过程中的优化。方法:首先,根据启发式规则提取影响线程划分的五个参数,分别是DT,TSL,TSU,SDL,SDU。五个参数根据启发式规则确定取值范围,步长变化是随机的。将加速比设定为目标,五个参数变化形成解空间,优化目标是在解空间中寻找最优解(图6),即找出各个应用最优的划分策略。利用人工免疫算法搜索解空间,找到最优解(表4)。结论:针对Olden测试集中每个测试函数获得最优划分参数值(图10-20),测试集中的函数在四核平台上的测试性能较之机器学习方法线程划分算法提高3.00%,较之启发式规则线程划分方法性提高8.92%。相似文献

3.

面向线程级前瞻的线程划分方法浅析 总被引：1，自引：0，他引：1

鲁建壮王志英张春元《计算机科学》2006,33(5):270-272

正确合理的线程划分方法是提取线程级并行性的必要前提，线程级前瞻技术是简化线程划分复杂度提高系统性能的重要手段。本文讨论了几种支持线程级前瞻的典型线程划分方法，在此基础上提出了线程级划分需要解决的关键问题，并蛄合一典型自动线程划分算法进行了具体分析，提出了线程划分需要进一步研究的问题。相似文献

4.

Prophet推测多线程系统设计与实现

李钟赵银亮杜延宁《计算机科学》2011,38(2):296-301

推测多线程技术通过推测执行的方式开发应用程序的线程级并行性,以提高程序执行性能。该技术一般通过执行模型来检测运行时可能的线程推测错误情况,并采取合适的机制恢复程序正确运行。描述的Prophet是一种基于硬件实现的推测多线程执行模型。重点描述了Prophet执行模型针对执行模型设计的关键问题的解决方案,包括Prophet的线程状态控制和多版本的Cach。系统,Prophet的多版本Cache系统提供了推测数据缓存功能,并使用基于总线监听的Cache协议实现了数据依赖违规检测。还给出了使用Olden基准程序对Prophet执行模型进行功能和性能测试的结果,并分析说明了Prophet系统可以有效地开发应用程序的线程级并行性。相似文献

5.

一种面向非干扰的线程程序逻辑

李沁曾庆凯袁志祥《软件学报》2014,25(6):1143-1153

目前,针对线程信息流的验证研究主要着重于时间信道.然而,由于线程程序中线程控制原语存在函数副作用,对此类原语的不恰当调用亦可引起非法信息流,有意或无意地破坏程序的非干扰属性.因此,提出以验证线程程序信息流为目的依赖逻辑,其可表达线程程序的数据流、控制流以及线程控制函数的副作用,推理程序变量和线程标识符之间的依赖关系,进而判定是否存在高机密性变量对低机密性变量的干扰. 相似文献

6.

基于线程调度顺序控制的多线程程序测试

李婧《计算机与现代化》2017,(6):50

随着多核技术越来越普及,多线程程序的编程也越来越流行。但是多线程程序的正确性问题已经严重影响软件可靠性,且现有的测试技术不能很好地满足多线程程序的需求。本文重点研究多线程程序中最常见的一种bug,即数据竞争,提出一种基于线程调度顺序控制的测试方法。该方法混合静态方法和动态方法,能够有效地找到多线程程序中存在的数据竞争,且能够区分出哪些数据竞争是有害的,需要程序员优先修复。实验结果显示,对于数据竞争的触发概率,本文的方法使其平均从0.53%提高到79.2%,且本文所引入的运行时开销平均只有80%,与相关方法所引入370%的开销相比更优。相似文献

7.

一种基于代表元的划分算法

张为华王鹏臧斌宇朱传琪《计算机学报》2008,31(3):400-410

划分是把程序中不同的计算和数据分配到并行处理系统的不同处理机来充分利用并行系统的计算资源、提高程序处理速度的一种优化技术.划分的效果对程序在并行系统上的执行效率将产生至关重要的影响,因此划分问题一直是并行领域研究的一个热点.但是应用程序的一些特性,如非紧密嵌套循环、一条语句对非只读数组的多次引用间存在重叠、不同语句对同一数组不同步长的引用,给有效解决划分问题设置了极大的障碍.已有的划分算法无法对具有这些特征的程序进行自动划分.虽然在对具有这些特征的程序进行手工优化过程中,存在一些直观上的划分策略,但这些策略无法应用到编译器中来指导编译器完成对程序的自动划分.文中根据这类程序的特点,提出了一种基于代表元的划分算法.该算法通过使用程序中对划分计算产生实际影响的数组引用作为代表元素构造各种划分的限制条件,完成程序的划分.同时通过寻找最大一致性数据划分方向有效减少了程序划分过程中的数据重组织通信.该算法已经在AFT2004中实现,并对应用程序获得了很好的效果. 相似文献

8.

跟踪与匹配并行的增强现实注册方法 总被引：4，自引：1，他引：3

下载免费PDF全文

李扬孙超张明敏张文杰潘志庚《中国图象图形学报》2011,16(4):680-685

提出一种基于自然特征的增强现实并行注册方法。基于双核CPU双通道的注册策略,一个线程利用KLT算法进行跟踪注册,另一个线程使用Surf特征匹配算法进行宽基线注册校正。延迟单应阵概念的提出,解决了特征跟踪与匹配的线程同步问题。同时,提出特征关键帧的思想,提高了并行注册的准确性与稳定性。实验结果表明,该方法在实时的情况下,有效地解决了基于特征跟踪方法的误差累积问题,是一种准确、稳定、有效的自然特征注册方法。相似文献

9.

基于图划分的全基因组并行拼接算法

林皎陈文光栗强郑纬民张益民《计算机研究与发展》2006,43(8):1323-1329

提出了一种基于图划分的全基因组并行拼接算法．该算法巧妙地将数据划分问题转化成图划分的问题,解决了传统数据划分算法中存在的节点负载不平衡的问题．同时,算法在建立关系图时有效地利用了WGS测序中所提供reads之间的长度信息和配对信息,使reads关系图能更准确地反映出数据之间的关系特性,从而提高了数据划分的准确性．实验结果表明,该算法可以准确地划分各种模拟数据、真实数据的数据集,相对于传统数据划分算法划分质量有了明显改善．相似文献

10.

TACLeBench中内核程序循环级推测并行性分析

孟慧玲王耀彬李凌杨洋王欣夷刘志勤《计算机应用》2021,41(9):2652-2657

线程级推测（TLS）技术可挖掘程序并行执行潜能,提高多核资源利用率,但目前TACLeBench的内核基准仍未在TLS并行化中得到有效分析。针对该问题设计了循环级推测执行的剖析方案和剖析工具。选取7个代表性的TACLeBench内核基准程序,首先对程序进行初始化分析,选取程序热点片段插入循环标识;其次对这些片段进行交叉编译,记录程序推测线程与内存地址相关数据,剖析其循环级最大潜在并行性;最后综合探讨程序运行时的特征（线程粒度、可并行化覆盖率、依赖特征）以及源码对加速比的影响。实验结果表明：1）该类程序适合采用TLS加速,与串行执行结果相比,循环结构的推测执行下的大部分程序的加速比在2以上,其中最高加速比达到20.79;2）利用TLS加速TACLeBench内核程序时,多数应用可有效利用4核到16核的计算资源。相似文献

11.

A general compiler framework for speculative multithreaded processors

Bhowmik A. Franklin M. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(8):713-724

Speculative multithreading (SpMT) promises to be an effective mechanism for parallelizing nonnumeric programs, which tend to have irregular and pointer-intensive data structures and complex flows of control. Proper thread formation is crucial for obtaining good speedup in an SpMT system. This paper presents a compiler framework for partitioning a sequential program into multiple threads for parallel execution in an SpMT system. This framework is very general and supports speculative threads, nonspeculative threads, loop-centric threads, and out-of-order thread spawning. It is therefore useful for compiling for a wide variety of SpMT architectures. For effective partitioning of programs, the compiler uses profiling, interprocedural pointer analysis, data dependence information, and control dependence information. The compiler is implemented on the SUIF-MachSUIF platform. A simulation-based evaluation of the generated threads shows that the use of nonspeculative threads and nonloop speculative threads provides a significant increase in speedup for nonnumeric programs. 相似文献

12.

支持线程级猜测的存储体系结构设计

下载免费PDF全文

赖鑫刘聪王志英《计算机工程》2012,38(24):228-234

在线程级猜测中进行数据依赖相关检测时,存在Cache一致性协议无法容忍线程切换引起的Cache块替换等问题。为此,通过分析推测线程数据管理模型,结合推测线程切概率低的特点,提出一种分布-共享式恢复缓冲区结构。该结构在进行Cache一致性检验时结合作废向量和版本优先级寄存器进行数据依赖检测,利用L2 Cache进行推测数据缓冲和恢复以支持推测线程切换。修改SESC模拟器以验证和评估该存储体系结构。实验结果表明,在保持模拟器理想加速比的情况下,该存储体系结构可以较好地支持推测线程切换。相似文献

13.

面向多线程多道程序的加权共享Cache划分 总被引：5，自引：1，他引：4

所光杨学军《计算机学报》2008,31(11)

并行应用在共享Cache结构的多核处理器执行时,会因为对共享Cache的冲突访问而产生性能下降和执行时间不确定的现象.共享Cache划分技术可以把共享Cache互斥地分配给多个进程使用,是解决该问题的有效方法.由于线程间的数据共享,线程数目不同的应用对共享Cache的利用率不同,但传统的以失效率最低为目标的共享Cache划分算法(例如UCP)没有区分应用线程数目的不同.文中设计了一种面向多线程多道程序的加权共享Cache划分框架(Weighted Cache Partitioning,WCP),包括面向应用的失效率监控器和加权Cache划分算法.失效率监控器以进程为单位动态监控在不同的Cache容量下应用的失效率;而加权Cache划分算法扩展了传统的失效率最优的Cache划分算法,根据应用线程数目的不同在进行Cache划分时给应用赋予不同的权值,以使具有更多线程的应用获得更多的共享Cache,从而提高系统的整体性能.实验结果表明:加权Cache划分算法虽然失效率有所增高,但却改进了IPC吞吐率、加权加速比和公平性.在由科学和工程计算应用组成的多道程序测试用例中,WCP-1的IPC吞吐率比以失效率最低为目标函数的共享Cache划分算法最高高出10.8%,平均高出5.5%. 相似文献

14.

基于交互信息的投机并行化方法*

李莹孙煦雪袁新宇徐印成《计算机应用研究》2010,27(6):2123-2126

针对投机并行化中如何权衡策略并确定合适的执行模型来获取理想性能的问题,提出了一种基于交互信息的投机并行化方法,利用交互信息来确定投机并行化的执行模型,建立相关评价模型,并着重从线程抽取创建角度提出了相应的策略及对应的性能评价。通过实验表明,基于交互信息进行“按需”并行化,可以达到所需的性能要求。相似文献

15.

Priority-Driven Acoustic Modeling for Virtual Environments

Patrick Min & Thomas Funkhouser 《Computer Graphics Forum》2000,19(3):179-188

Geometric acoustic modeling systems spatialize sounds according to reverberation paths from a sound source to a receiver to give an auditory impression of a virtual 3D environment. These systems are useful for concert hall design, teleconferencing, training and simulation, and interactive virtual environments. In many cases, such as in an interactive walkthrough program, the reverberation paths must be updated within strict timing constraints - e.g., as the sound receiver moves under interactive control by a user. In this paper, we describe a geometric acoustic modeling algorithm that uses a priority queue to trace polyhedral beams representing reverberation paths in best-first order up to some termination criteria (e.g., expired time-slice). The advantage of this algorithm is that it is more likely to find the highest priority reverberation paths within a fixed time-slice, avoiding many geometric computations for lower-priority beams. Yet, there is overhead in computing priorities and managing the priority queue. The focus of this paper is to study the trade-offs of the priority-driven beam tracing algorithm with different priority functions. During experiments computing reverberation paths between a source and a receiver in a 3D building environment, we find that priority functions incorporating more accurate estimates of source-to-receiver path length are more likely to find early reverberation paths useful for spatialization, especially in situations where the source and receiver cannot reach each other through trivial reverberation paths. However, when receivers are added to the environment such that it becomes more densely and evenly populated, this advantage diminishes. 相似文献

16.

Control Flow Regeneration for Software Pipelined Loops with Conditions

Milicev Dragan Jovanovic Zoran 《International journal of parallel programming》2002,30(3):149-179

A new intermediate representation for software pipelined loops with conditions is proposed in the paper. The representation allows separation of operations from different paths and their conditional, as well as speculative scheduling, including speculative computation of conditions. An algorithm that transforms the representation into the executable code is presented. The algorithm uses the notion of finite automata to represent the execution of separate paths as threads of control that are canceled or approved by operations that actually compute the conditions. The approach may be used in conjunction with different scheduling techniques to reconstruct the control flow graph from the final schedule directly. It inherently solves the problems of overlapped predicate lifetimes and speculation. The approach provides also a novel formal model for loop execution. 相似文献

17.

应用于实时控制系统的线程池调度策略设计

下载免费PDF全文

刘毅刘传菊《计算机工程与应用》2010,46(32):71-73

在进行多任务实时控制系统设计时,采用线程池技术是一种有效的解决方法,但必须首先避免超时的发生。为了降低线程完成的超时发生率,采用Half-Sync/Half-Async线程池架构建立实时控制系统的线程池,利用最小二乘支持向量回归机（LSSVR）对线程执行时间进行预测估计,再基于估计结果对线程池线程的分配调度优先级别算法进行设计。性能测试以无线图像传感器网络节点为对象对所设计的LSSVR线程池及其他线程池在不同状态下的超时发生率做了比较,结果表明在大多数应用情况下LSSVR线程池在抑制超时方面具有明显的优越性。相似文献

18.

Shared write buffer to boost applications on SpMT architecture

Ming Chen John Ye Tianzhou Chen Hongjun Dai 《The Journal of supercomputing》2017,73(8):3508-3525

With the trend of growing number of integrated processing cores on Chip Multiprocessors, researchers are working hard to increase the available parallelism of software programs so as to efficiently harness the growing computing power. One noticeable direction among these efforts is speculative multi-threading (SpMT), a.k.a thread level speculation, which aims to extract thread level parallelism by splitting a sequential execution thread into several finer ones and execute them on parallel. A SpMT thread is in speculative status before it “knows” all its input data are correct. A speculative thread needs to write to the L1 cache, but its output might be discarded if the speculation eventually fails. However, another speculative thread may have already read in such speculative output. Therefore, some mechanism is needed to support speculative read and write. And because the SpMT threads are extracted from a single thread, they usually share lots of data, thus there might be intense data coherence among the L1 caches. It would be very complicated to support data coherence and speculation together. This Paper proposes a shared write buffer among the SpMT cores. We are able to confine the speculative read and write in the SWB, thus the speculation will not interference with coherence, and the L1 cache design could be drastically simplified. Experiments show that the SWB can capture a big portion of inter-core data sharing, reduce cache coherence, and drastically improve data access performance of SpMT threads. 相似文献

19.

Software Controlled Adaptive Pre-Execution for Data Prefetching

ákos Dudás Sándor Juhász Tamás Schrádi 《International journal of parallel programming》2012,40(4):381-396

Data prefetching mechanisms are widely used for hiding memory latency in data intensive applications. They mask the speed gap between CPUs and their memory systems by preloading data into the CPU caches, where accessing them is by at least one order of magnitude faster. Pre-execution is a combined prefetching method, which executes a slice of the original code preloading the code and its data at the same time. Pre-execution is often mentioned in the literature, but according to our knowledge, it has not been formally defined yet. We fill this void by presenting the formal definition of speculative and non-speculative pre-execution, and derive a lightweight software-based strategy which accelerates the main working thread by introducing an adaptive, non-speculative pre-execution helper thread. This helper thread acts as a perfect predictor, calculates memory addresses, prefetches the data and consumes cache misses early. The adaptive automatic control allows the helper thread to configure itself in run-time for best performance. The method is directly applicable to any data intensive application without requiring hardware modifications. Our method was able to achieve an average speedup of 10–30% in a real-life application. 相似文献

20.

Interprocedural probabilistic pointer analysis

Peng-Sheng Chen Yuan-Shin Hwang Ju R.D.-C. Lee J.K. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(10):893-907

相似文献