首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
片上多处理器中延迟和容量权衡的cache结构   总被引:1,自引:0,他引:1  
片上多处理器中二级cache的设计面临着延迟和容量不能同时满足的矛盾,私有结构有较小的命中延迟但是减少了cache的有效容量,共享结构能增加cache的有效容量但是有较长的命中延迟.提出了一种适用于CMP的cache结构--延迟和容量权衡的cache结构(TCLC).该结构是一种混合私有结构和共享结构的设计,核心思想是动态识别cache块的共享类型,根据不同共享类型分别对其进行优化,对私有cache块采用迁移的优化策略,对共享只读cache块采用复制的优化策略,对共享读写cache块采用中心放置的优化策略,以期达到访问延迟接近私有结构,有效容量接近共享结构的目的,从而缓解线延迟的影响,减少平均内存访问延迟.全系统模拟的实验结果表明,采用TCLC结构,相对于私有结构性能平均提高13.7%.相对于共享结构性能平均提高12%.  相似文献   

2.
曹非  刘志勇 《计算机科学》2012,39(8):304-310
片上多核处理器(CMP)通常采用私有或者共享的末级高速缓存(cache)结构,而共享末级cache一般使用静态地址映射机制。该机制将各处理器临时私有访问的数据映射于分布在其他处理器的末级cache中,使得各处理器对临时私有数据的访问延时增加。针对该问题,提出了一种动静结合的共享末级cache地址映射方法。该方法可将原来静态映射于其他处理器末级cache中的临时私有数据动态映射于访问者处理器的本地末级cache中,减少了大量静态映射所造成的长延时非本地末级cache访问,从而有效降低了整个共享末级cache的访问延时,在提高性能的同时降低了功耗和带宽使用。实验结果表明,动静结合的地址映射方式应用于采用环连接互连结构和侦听顺序环协议的CMP结构时,可获得的平均性能提升为9%,最大性能提升为38%。  相似文献   

3.
非一致Cache体系结构(NUCA)几乎已经成为未来片上大容量cache的发展方向。多核处理器的NUCA结构中,多个处理器核对共享数据的竞争访问,可能导致数据经常处于中部的cache Bank,增加NUCA的访问延迟。本文提出支持数据副本的Bank一致性技术,通过有选择地在NUCA中为访问的处理器核创建不同的数据副本,Bank一致性技术能够缓解多核处理器对共享数据的竞争问题。本文详细地介绍了Bank一致性协议的设计方法。最后,使用全系统模拟器对8个NPB基准测试程序进行了详细评测。实验结果表明,Bank一致性技术能够有效缓解多核处理器中共享数据的竞争访问问题。相比不支持Bank一致性技术的CMP-DNUCA结构,本文的方法能将系统IPC性能平均提升5.95%。  相似文献   

4.
多核处理器非一致Cache体系结构延迟优化技术研究综述   总被引:1,自引:0,他引:1  
非一致Cache体系结构(non-uniform cache architecture,NUCA)为解决多核处理器(chip multi-processor)"存储墙"难题提供了新的设计思路.重点关注面向CMP的NUCA延迟优化技术,在介绍若干典型NUCA模型的基础上,分析大容量Cache环境下共享/私有机制中的延迟-容量权衡问题,讨论映射、迁移、复制和搜索等数据管理机制在多核环境下的优缺点.最后,针对基于片上网络(network-on-chip,NoC)互连结构的可扩展CMP体系结构,从NUCA模型优化、数据管理和一致性维护机制3个方面讨论和预测未来CMP NUCA延迟优化领域的发展趋势及面临的挑战性问题.  相似文献   

5.
随着多核处理器规模的扩大,请求数据的处理器核到数据的宿主节点之间的平均距离相应增大,并且数据访问在分布式共享高速缓存块中的分布并不均衡引起了网络热点。这些情况导致一级高速缓存缺失延迟的增大。为了解决该问题,将每四个处理器核分为一组,在组内设计邻近数据探测器。邻近数据探测器通过确定一次缺失能否在邻近核的一级高速缓存中得到数据,从而利用了并行程序在多核处理器上执行时数据访问的核间局部性。另外,根据新的结构相应优化了高速缓存一致性协议。实验表明,该片上存储优化方法提高了系统性能,减少了片上网络流量,节省了能耗。  相似文献   

6.
针对目前主流的多核处理器,提出了共享cache敏感的数据库排序多线程执行框架(sharedcache sensitive multithreaded sorting framework,SCS-MSF).首先分析了多线程QuickSort排序在共享cache多核处理器中执行时面临的性能瓶颈,在此基础上针对SCS-MSF每个处理阶段的数据访问特点,提出了各自的多线程并行执行模式,并通过各种优化策略改善线程执行时的cache性能,特别是减少多线程访问共享cache时的访问冲突问题,以提高线程的cache性能.在实验中,基于内存数据库EaseDB实现了SCS-MSF.实验结果表明SCS-MSF具有良好cache访问性能,从而提高了多线程执行的效率,而且性能稳定,数据库排序性能得到了较大提高.  相似文献   

7.
多核处理器片上存储系统研究   总被引:1,自引:1,他引:0       下载免费PDF全文
针对多核处理器计算能力和访存速度间差异不断增大对多核系统性能提升的制约问题,分析几款典型多核处理器存储系统的设计特点,探讨多核处理器片上存储系统发展的关键技术,包括延迟造成的非一致cache访问、核与cache互连形式对访存性能的束缚以及片上cache设计的复杂化等。  相似文献   

8.
对于节点计算、通信与存储能力不同、节点由多个多核处理器(多个片上多处理器)组成且共享L3cache的机群系统,采取计算与传输重叠模式,提出了主节点以多进程方式并发发送数据给从节点的可分负载调度模型.该调度模型自适应节点具有不同的计算、通信和存储能力,动态计算、确定调度轮数和每轮调度分配给各从节点的负载块规模,以平衡各节点的计算负载、减少节点之间的通信开销,缩短任务调度长度.依据各节点中的L3cache,L2cache和L1cache的可用存储容量,提出了对节点主存中接收到的负载块进行多级缓存划分的数据分配方法,以确保分配给节点中各个多核处理器、各个内核的负载平衡.基于提出的多核机群节点间可分负载调度模型和节点内多级存储数据分配方法,设计实现了节点拥有多个多核处理器的异构机群上通信和存储高效的k-选择并行算法.在曙光TC5000A多核机群系统上,测试了主节点并行与串行发送数据给从节点的任务调度方式、各级缓存利用率、每个核心执行不同数目的线程对并行算法运行性能的影响.实验结果表明:基于主节点并发发送数据给从节点的调度模型设计的k-选择并行算法,其运行性能优于基于主节点串行发送数据给从节点的调度模型设计的k-选择并行算法;L3cache和L2cache利用率大小对算法运行性能影响较大;当L3cache,L2cache和L1cache利用率取其优化组合值、每个核心运行3个线程时,算法所需的运行时间最短.  相似文献   

9.
对于共享cache的多核处理器,如何管理好各个核对cache的利用,对于充分发挥多核处理器性能是很关键的问题.目前采用的cache替换方法程序间会出现性能干扰,cache静态划分技术则是通过为同时运行的程序分配不同的空间来解决性能干扰问题.为了给程序分配合适大小的cache空间,需要对程序进行性能profiling,即事先多遍运行收集程序在各种cache容量下的性能数据,这种性能profiling方法开销巨大,影响实用.为了解决性能profiling需要多遍运行程序的问题,提出了只需单遍运行的程序性能profiling优化技术.该技术利用在线的phase分析技术识别程序的运行阶段,避免对相同阶段的重复profiling;同时分析程序各phase的性能同cache容量变化的关系趋势,对于性能不敏感的容量变化则不进行profiling,降低开销.在程序运行结束后通过程序各phase在cache各种容量下的性能来估计程序在各容量下的整体性能,以指导cache静态划分.实验表明,该技术的开销仅为7%,而该方法指导的cache划分比未划分时有8%的性能改进,同多遍运行的程序性能profiling指导的cache划分性能相比仅有1%的下降.  相似文献   

10.
共享内存操作系统使用精心设计的锁来保护各种共享数据,对这些数据的访问需要首先获得对应的锁,当内核中同时有多个流程(系统调用、内核线程或中断处理程序等)试图获得同一个锁时会产生竞争,相关流程越多竞争就越激烈.随着系统中处理单元数目的增长,这些流程的数量也在不断增加,此时,对锁的竞争会影响系统的整体性能,甚至成为瓶颈.另一方面,操作系统与应用程序在同一处理器核上交替运行,因为硬件cache容量有限,导致操作系统的代码和数据经常替换掉应用程序的代码和数据.当应用程序重新被调度运行时,需从更慢速的cache,甚至从内存中读取这些代码和数据,从而降低了性能.通过在一台16核AMD节点上的相关测试,以上问题得到了量化验证,并针对这些问题提出了一种异构操作系统模型.在此模型下,应用程序和操作系统分别运行在不同的处理器核上,实验显示这种模式可以有效降低对锁的竞争和对cache的污染.  相似文献   

11.
European Community policy and the market   总被引:1,自引:0,他引:1  
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven.  相似文献   

12.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。  相似文献   

13.
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given.  相似文献   

14.
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical.  相似文献   

15.
基于复小波噪声方差显著修正的SAR图像去噪   总被引:4,自引:1,他引:3  
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。  相似文献   

16.
Abstract  This paper considers some results of a study designed to investigate the kinds of mathematical activity undertaken by children (aged between 8 and 11) as they learned to program in LOGO. A model of learning modes is proposed, which attempts to describe the ways in which children used and acquired understanding of the programming/mathematical concepts involved. The remainder of the paper is concerned with discussing the validity and limitations of the model, and its implications for further research and curriculum development.  相似文献   

17.
正The demands of a rapidly advancing technology for faster and more accurate controllers have always had a strong influence on the progress of automatic control theory.In recent years control problems have been arising with increasing frequency in widely different areas,which cannot be addressed using conventional control techniques.The principal reason for this is the fact that a highly competitive economy is forcing systems to operate in regimes where  相似文献   

18.
正Aim The Journals of Zhejiang University-SCIENCE(A/B/C)areedited by the international board of distinguished Chinese andforeign scientists,and are aimed to present the latest devel-opments and achievements in scientific research in China andoverseas to the world’s scientific circles,especially to stimulateand promote academic exchange between Chinese and for-eign scientists everywhere.  相似文献   

19.
The relative concentrations of different pigments within a leaf have significant physiological and spectral consequences. Photosynthesis, light use efficiency, mass and energy exchange, and stress response are dependent on relationships among an ensemble of pigments. This ensemble also determines the visible characteristics of a leaf, which can be measured remotely and used to quantify leaf biochemistry and structure. But current remote sensing approaches are limited in their ability to resolve individual pigments. This paper focuses on the incorporation of three pigments—chlorophyll a, chlorophyll b, and total carotenoids—into the LIBERTY leaf radiative transfer model to better understand relationships between leaf biochemical, biophysical, and spectral properties.Pinus ponderosa and Pinus jeffreyi needles were collected from three sites in the California Sierra Nevada. Hemispheric single-leaf visible reflectance and transmittance and concentrations of chlorophylls a and b and total carotenoids of fresh needles were measured. These data were input to the enhanced LIBERTY model to estimate optical and biochemical properties of pine needles. The enhanced model successfully estimated reflectance (RMSE = 0.0255, BIAS = 0.00477, RMS%E = 16.7%), had variable success estimating transmittance (RMSE = 0.0442, BIAS = 0.0294, RMS%E = 181%), and generated very good estimates of carotenoid concentrations (RMSE = 2.48 µg/cm2, BIAS = 0.143 µg/cm2, RMS%E = 20.4%), good estimates of chlorophyll a concentrations (RMSE = 10.7 µg/cm2, BIAS = − 0.992 µg/cm2, RMS%E = 21.1%), and fair estimates of chlorophyll b concentrations (RMSE = 7.49 µg/cm2, BIAS = − 2.12 µg/cm2, RMS%E = 43.7%). Overall root mean squared errors of reflectance, transmittance, and pigment concentration estimates were lower for the three-pigment model than for the single-pigment model. The algorithm to estimate three in vivo specific absorption coefficients is robust, although estimated values are distorted by inconsistencies in model biophysics. The capacity to invert the model from single-leaf reflectance and transmittance was added to the model so it could be coupled with vegetation canopy models to estimate canopy biochemistry from remotely sensed data.  相似文献   

20.
This article discusses the history and design of the special versions of the bombe key-finding machines used by Britain’s Government Code & Cypher School (GC&CS) during World War II to attack the Enigma traffic of the Abwehr (the German military intelligence service). These special bombes were based on the design of their more numerous counterparts used against the traffic of the German armed services, but differed from them in important ways that highlight the adaptability of the British bombe design, and the power and flexibility of the diagonal board. Also discussed are the changes in the Abwehr indicating system that drove the development of these machines, the ingenious ways in which they were used, and some related developments involving the bombes used by the U.S. Navy’s cryptanalytic unit (OP-20-G).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号