期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

任建安虹路放梁博《计算机科学》2006,33(3):239-243

同时多线程处理器（SMT）每个周期能够从多个线程中发射指令执行,从而大大地提高了超标量微处理器的指令吞吐量,但多个线程的同时执行也带来了许多硬件资源的共享冲突问题.其中,多个线程共享分支预测硬件的方案会对分支预测精度产生较大的影响.研究SMT处理器中分支处理方案对于处理器整体性能的影响,对于指导SMT处理器的设计是十分重要的.本文利用SMT处理器模拟器,针对各线程运行独立应用的SMT结构实验评估了几种著名的分支预测方案;给出了在单线程和多线程情况下,分支预测方案对分支预测精度和处理器整体性能的影响的分析;总结出在这样的SMT结构中,各线程拥有独立的预测器是一种较好的选择,并且由于各独立预测器可以采用小而简单的结构,所以不会带来太多的硬件开销. 相似文献

2.

多核同时多线程处理器的线程调度器设计

《电子技术应用》2016,(1):19-21

多核同时多线程处理器(SMT_PAAG)是用于图形、图像及数字信号处理的一种多核处理器。基于这种处理器提出了一种硬件线程调度器,该调度器采用同时多线程技术,最多可同时执行四个线程,支持八个线程阻塞模式下的快速上下文切换。这样避免了因阻塞带来的等待问题,能够有效提高处理器的工作效率和资源利用率。通过在处理器上运行图形处理算法进行性能评测。结果表明,SMT-PAAG处理器通过挖掘指令级并行和线程级并行,将处理器的性能提高了69.25%。相似文献

3.

基于硬件多线程网络处理器功耗可控无线局域网MAC协议实现

王磊张晓彤张艳丽王沁《计算机应用研究》2012,29(7):2624-2628

针对在如何在提高网络吞吐率并满足实时性需求的同时消耗更少的功耗的问题,以硬件多线程网络处理为平台,以IEEE 802.11MAC层协议为例,通过对MAC层数据流的模式、数据流上的操作行为以及时间约束进行建模并测试分析,提出一种多线程化网络协议的软件实现方法;配合动态功耗可控的多线程网络处理器能够根据流量和实时性自适应地调整系统的性能。实验结果证明,异构多线程结构程序在实时性任务时五个软件线程需四个硬件线程支持,而无实时性任务只需两个硬件线程支持。提出的多线程MAC层协议编程模型能够达到根据网络负载特征动态控制处理器性能的目的。相似文献

4.

同时多线程处理器共享资源的特性分析

下载免费PDF全文

黄彩霞《计算机工程与科学》2009,31(8)

同时多线程处理器中同时执行的线程共享处理器中的资源,而这些有限的共享资源在线程之间的分配状况将决定每个线程执行的性能和处理器的总体性能。如何根据不同类别共享资源的特性对它们进行合理有效分配成为同时多线程处理器研究的重要课题之一。本文对同时多线程处理器中各类共享资源的特性进行深入研究与分析,分析结果表明,队列类共享资源的分配方式对每个线程执行的性能和SMT处理器的总体性能具有至关重要的影响。因此,同时多线程处理器中共享资源分配的关键在于控制队列类共享资源的分配。相似文献

5.

时钟共享多线程处理器通信机制的设计与实现

《电子技术应用》2016,(3):42-46

多核多线程处理器~([1])是并行技术的一个发展方向,基于多核多线程处理器,提出了一种时钟共享多线程处理器。该处理器有近邻通信和线程间通信两种通信机制,近邻通信采用近邻共享FIFO来传递信息,线程间通信通过线程间共享存储来传递信息,这样可以提高处理器的资源利用率和并行执行能力。相似文献

6.

一种具有QoS特性的同时多线程处理器取指策略 总被引：4，自引：0，他引：4

何立强刘志勇《计算机研究与发展》2006,43(11):1980-1984

同时多线程处理器通过每时钟周期从多个运行的线程取指令执行,从而极大地提高了处理器的性能．建议了一种具有QoS特性的同时多线程处理器取指策略,并讨论了其在QoS管理方面的问题．该策略的核心思想是利用线程的优先级和流速来同时控制线程的取指过程,从而满足线程在执行速度上的QoS需求．与传统的基于纯优先级的取指策略相比,该策略不但具有QoS特性,同时还可以更加有效地分配取指带宽,从而能获得更高的处理器性能．该策略的物理实现非常简单．模拟实验的结果表明,该策略在提供Qos支持的基础上,可以在传统的基于优先级的取指策略ICOUNT的基础上提高15％的系统性能．相似文献

7.

一种有效的同时多线程处理器取指控制机制 总被引：1，自引：0，他引：1

何立强刘志勇《计算机学报》2006,29(4):535-543

同时多线程处理器通过每时钟周期从多个运行的线程取指令执行,极大地提高了处理器的性能.分支预测器的预测精度和取指策略的效率是影响同时多线程处理器性能的关键.通过将一个基于值的分支预测器和一个基于线程推进速度的取指策略相结合,提出一种新的取指控制机制.该结构的硬件开销较小,实现复杂度较低.实验结果表明,该取指控制机制有效地提高了处理器的性能,其相对于传统取指控制机制的性能加速比为28%且该加速比也高于目前基于流缓冲区和基于分支分类器的取指控制机制. 相似文献

8.

低功耗多线程编译优化技术 总被引：12，自引：1，他引：12

赵荣彩唐志敏张兆庆 Guang R. Gao 《软件学报》2002,13(6):1123-1129

提出了在多线程体系结构中通过降低执行频率有效减小功耗的理论模型和方法.首先研究识别可降频运行的线程的计算模型和降频因子的计算,然后给出在编译过程中基于对应用程序行为的分析,结合线程划分的低功耗编译优化算法和实现策略.该模型和方法可用于具有执行频率可动态调整的多处理器类多线程体系结构,既可开发TLP(thread level parallelism),又可有效减小功率消耗. 相似文献

9.

32位多线程包处理微引擎的设计 总被引：1，自引：0，他引：1

周昔平高德远樊晓桠张盛兵《小型微型计算机系统》2006,27(11):2072-2076

硬件多线程技术是网络处理器中的核心技术，本文介绍了一个专门面向网络协议处理的硬件多线程包处理微引擎NRS05的设计，详细介绍了其流水线的整体结构，提出了一种基于混合多线程的动态调度策略实现了长延时操作的隐藏，保证单线程性能能够满足应用需求的同时保证了各线程在执行核上运行的公平性，并将多线程技术和流水线技术进行了结合，解决了传统处理器中指令间因控制相关导致的流水线停顿问题，最后给出了设计的综合结果及包处理性能．相似文献

10.

龙芯2号处理器的同时多线程设计 总被引：1，自引：0，他引：1

李祖松许先超胡伟武唐志敏《计算机学报》2009,32(11)

提出了适合龙芯2号处理器的同时多线程处理器模型,并介绍了具体的微体系结构设计以及相应的Linux操作系统的实现方案.通过在设计的龙芯2号同时多线程处理器上启动Linux操作系统,并运行应用程序,例如SPEC CPU2000,进行性能评测.结果表明,龙芯2号同时多线程处理器通过挖掘线程级并行性,将龙芯2号处理器的性能提高了31.1%. 相似文献

11.

Thread-Sensitive Instruction Issue for SMT Processors

《Computer Architecture Letters》2004,3(1):5-5

Simultaneous Multi Threading (SMT) is a processor design method in which concurrent hardware threads share processor resources like functional units and memory. The scheduling complexity and performance of an SMT processor depend on the topology used in the fetch and issue stages. In this paper, we propose a thread sensitive issue policy for a partitioned SMT processor which is based on a thread metric. We propose the number of ready-to-issue instructions of each thread as priority metric. To evaluate our method, we have developed a reconfigurable SMT-simulator on top of the SimpleScalar Toolset. We simulated our modeled processor under several workloads composed of SPEC benchmarks. Experimental results show around 30% improvement compared to the conventional OLDEST_FIRST mixed topology issue policy. Additionally, the hardware implementation of our architecture with this metric in issue stage is quite simple. 相似文献

12.

Chip Multithreaded Consistency Model

下载免费PDF全文

Zu-Song Li Dan-Dan Huan Wei-Wu Hu and Zhi-Min Tang 《计算机科学技术学报》2008,23(2):298-ver

Multithreaded technique is the developing trend of high performance processor. Memory consistency model is essential to the correctness, performance and complexity of multithreaded processor. The chip multithreaded consistency model adapting to multithreaded processor is proposed in this paper. The restriction imposed on memory event ordering by chip multithreaded consistency is presented and formalized. With the idea of critical cycle built by Wei-Wu Hu, we prove that the proposed chip multithreaded consistency model satisfies the criterion of correct execution of sequential consistency model. Chip multithreaded consistency model provides a way of achieving high performance compared with sequential consistency model and easures the compatibility of software that the execution result in multithreaded processor is the same as the execution result in uniprocessor. The implementation strategy of chip multithreaded consistency model in Godson-2 SMT processor is also proposed. Godson-2 SMT processor supports chip multithreaded consistency model correctly by exception scheme based on the sequential memory access queue of each thread. 相似文献

13.

面向SMT体系结构的片上资源分配策略研究

张骏樊晓桠刘松鹤《计算机科学》2008,35(6):135-138

SMT处理器通过同时执行来自多个线程中的指令来提高性能,所有线程通过竞争共享的方式来最大化片上资源的利用率.然而,SMT处理器的集中控制结构所固有的线延迟约束和多个线程对片上资源持有的不均衡性使得设计者不得不考虑在线程间进行资源分配,来减少通信延迟和可能出现的线程饥饿.本文介绍了针对SMT体系结构片上资源分配的基本原理、研究内容;分析了片上资源分配对SMT体系结构造成的影响;从显式和隐式两个角度讨论了SMT体系结构片上资源分配策略的运行机制和设计方法;举例分析了POWER5处理器的动态资源平衡策略;最后,展望了SMT处理器片上资源分配的未来发展趋势. 相似文献

14.

Adaptive dynamic thread scheduling for simultaneous multithreaded architectures with a detector thread

《Journal of Parallel and Distributed Computing》2006,66(10):1304-1321

Simultaneous multithreading (SMT) is an architectural technique that improves resource utilization by allowing instructions from multiple threads to coexist in a processor and share resources. However, earlier studies have shown that the performance of an SMT architecture begins to saturate as the number of coexisting threads increases beyond four. We show that no single fetch policy can be the best solution during the entire execution time and that a significant performance improvement can be attained by dynamically switching the fetch policies. We propose an implementation method which includes an extremely lightweight thread to control fetch policies (a detector thread) and a processor architecture to run the detector thread without impact on the user application threads. We evaluate various heuristics for the detector thread to determine the best fetch policies. We show that, with eight threads running on our simulated SMT, the proposed approach can outperform fixed scheduling mechanisms by up to 30%. 相似文献

15.

MOSI:一种基于超长指令字处理器的同时多线程微体系结构

万江华陈书明《计算机学报》2006,29(3):378-383

描述了一种基于超长指令字处理器的同时多线程微体系结构——MOSI（MultiOp Splitting Issue,多操作①分离发射）．MOSI动态地发射同一多操作内的指令．并通过写回缓冲保证计算结果的写回顺序与编译器的视图一致,从而以较小的代价解决了SMT技术中的关键问题．文中详细描述了写回缓冲的结构及算法,给出了多个线程的硬件模型,最后对硬件支持线程的个数及Cache的组织结构进行了讨论．实验结果表明,基于MOSI结构的双线程处理器能够将吞吐率提高40％．相似文献

16.

Simultaneous multithreading: a platform for next-generationprocessors

Eggers S.J. Emer J.S. Leby H.M. Lo J.L. Stamm R.L. Tullsen D.M. 《Micro, IEEE》1997,17(5):12-19

Simultaneous multithreading is a processor design which consumes both thread-level and instruction-level parallelism. In SMT processors, thread-level parallelism can come from either multithreaded, parallel programs or individual, independent programs in a multiprogramming workload. Instruction-level parallelism comes from each single program or thread. Because it successfully (and simultaneously) exploits both types of parallelism, SMT processors use resources more efficiently, and both instruction throughput and speedups are greater 相似文献

17.

Software-directed register deallocation for simultaneousmultithreaded processors

Lo J.L. Parekh S.S. Eggers S.J. Levy H.M. Tullsen D.M. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(9):922-933

This paper proposes and evaluates software techniques that increase register file utilization for simultaneous multithreading (SMT) processors. SMT processors require large register files to hold multiple thread contexts that can issue instructions out of order every cycle. By supporting better interthread sharing and management of physical registers, an SMT processor can reduce the number of registers required and can improve performance for a given register file size. Our techniques specifically target register deal location. While out-of-order processors with register renaming are effective at knowing when a new physical register must be allocated, they have limited knowledge of when physical registers can be deallocated. We propose architectural extensions that permit the compiler and operating system to: 1) free registers immediately upon their last use, and 2) free registers allocated to idle thread contexts. Our results, based on detailed instruction-level simulations of an SMT processor, show that these techniques can increase performance significantly for register-intensive, multithreaded programs 相似文献