期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

陈利平高金华《微处理机》2010,31(3):6-10

多处理器片上系统在单芯片上集成了多种指令集处理器,可完成复杂完整的功能。通信架构是多处理器片上系统的瓶颈,而高效的仲裁器可以解决多个处理器同时访问共享资源引起的冲突和竞争,从而防止系统性能的下降。提出一种实时动态自适应仲裁器,它既可以考虑实时要求,又可以自动调节各个处理器占据的总线带宽,避免饥饿现象。基于多处理器仿真平台的实验结果显示它比传统的仲裁器减少了49％的延迟时间,并且能更好地控制各处理器的总线带宽。相似文献

2.

基于I2C总线技术的多处理器系统设计

史文奎赵敏孙棣华《自动化技术与应用》2007,26(8):86-89

本文主要研究了I2C总线技术在多处理器领域的应用.简要阐述I2C总线及其数据收发技术;重点解决了多个处理器共用总线时数据传送问题--包括总线上的通信仲裁和处理器间数据收发的软件实现;并将基于I2C总线的多处理器模型应用在单元控制器领域. 相似文献

3.

基于传输时间精确预测的片上总线仲裁算法 总被引：3，自引：0，他引：3

孟海波张志敏《计算机辅助设计与图形学学报》2008,20(7)

片上系统中各主设备有不同的实时性和带宽要求,它们竞争使用片上系统总线.总线仲裁器采用各种仲裁算法试图满足实时性和带宽要求,但已有算法很难同时满足这两方面的要求.提出一种基于传输时间精确预测的仲裁算法,采用该算法的仲裁器能够精确地预测在当前仲裁机制下各个请求的完成时间.因此能判断哪些主设备的实时性可能会被违反,从而提前改变总线仲裁策略以满足各主设备实时性要求.同时,采用该算法后仲裁器并行比较主设备的实际传输带宽和需求带宽的差别,及时调整优先权以实现对带宽的精确分配.实验结果表明,该算法比常见的5种算法在实时性要求满足百分比方面平均提高66.47％,很好地满足了各主设备在各种情况下的强实时要求. 相似文献

4.

i860多机系统中串行链循环优先权总线仲裁器 总被引：1，自引：0，他引：1

柳瑞恒黄国勇《小型微型计算机系统》1994,15(5):1-7

本文介绍了一种总线仲裁器的逻辑电路。它具有仲裁开销小，扩展性好，各模块公平占用总线等特点。很适合应用于共享总线的多处理器系统中。相似文献

5.

R10000多处理器簇中的外部冲突解决方案

易佳望《计算机时代》2012,(6):1-3,6

在基于MIPS R10000处理器构建采用簇总线的多处理器系统时,发现R10000用户手册给出的外部冲突解决方案只适用于采用专用EA的单或多处理器系统.鉴于此,介绍了R10000处理器的系统配置和系统接口的一致性,分析了R10000用户手册所给出的外部冲突解决方案的局限性,并基于该外部冲突解决方案,对采用簇总线的多处理器系统中的外部冲突进行了研究,给出了簇协调器可以采用的一个外部冲突解决方案. 相似文献

6.

多处理机系统的总线仲裁机构

屈玉贵《小型微型计算机系统》1991,12(4):33-40,46

多处理机系统的总线仲裁机构的设计和使用直接影响系统的效率.本文介绍了多处理机系统的总线仲裁机构的原理及串、并行两种方式的总线仲裁器.分析了总线仲裁机构可能发生的错误动作.最后给出一个系统总线接口的设计实例。相似文献

7.

基于LONWORKS网络的多处理器智能节点设计 总被引：7，自引：0，他引：7

梁阿磊赵玉源钟凯白英彩韩江洪《计算机研究与发展》2000,37(4):453-457

在基于ＬＯＮＷＯＲＫＳ网络的现场总线中,Ｎｅｕｒｏｎ芯片是节点的核心,但是其处理能力不足以胜任复杂的计算任务．为增强节点的计算能力,提出并实现了一种非对称多处理器（ＡＭＰ）结构的控制节点设计方案,多个处理器之间采用享总线相连,Ｎｅｕｒｏｎ芯片为主处理器,３个从处理器并行完成信号的高速采样计算．在具体实现中,提出了单缓冲、双通道总线、两级树状网络、通信线程细化等技术手段．按该设计方案实现的总线计轴器相似文献

8.

基于Bus—NoC的3DMPSoC的总线仲裁

姚放吾高明姬《计算机技术与发展》2009,19(7)

3D集成芯片与二维传统芯片相比能够提供更好的性能和组装密度,另一方面,在单个芯片上集成多个处理器(MPSOC)以提高芯片的整体性能已成为下一代集成电路设计趋势.MPSoC的总线和片上网络两种通讯架构各有利弊,如何将3D芯片设计和MPSaC的架构相结合,对Bus-NoC混合的3D MPSoC结构进行研究,提出了改善的总线仲裁算法dTDMA+.原有的dTDMA有着很好的带宽利用率但在实时性要求方面欠佳,实验结果表明,dTDMA+在一定程度上满足了系统的强实时要求. 相似文献

9.

基于共享总线的多处理器cache一致性的硬件实现*

李均晓张盛兵沈绪榜《计算机应用研究》2008,25(6):1890-1893

龙腾R2微处理器是西北工业大学航空微电子中心设计的采用PowerPC体系结构,具有自主知识产权的RISC微处理器。为了扩展其多处理器的功能,采用总线侦听的方法来维护多处理器环境下的cache一致性。首先介绍了共享总线侦听技术以及侦听协议,然后详细介绍了龙腾R2微处理器的总线侦听部件的实现方案,对几类cache一致性的实现方案以及性能进行了评析。FPGA实验结果表明,总线侦听部件能高效而准确地保证多处理器系统的cache一致性。相似文献

10.

基于共享总线的多处理器cache一致性的硬件实现*

李均晓张盛兵沈绪榜《计算机应用研究》2008,25(6):1890-1893

龙腾R2微处理器是西北工业大学航空微电子中心设计的采用PowerPC体系结构,具有自主知识产权的R ISC微处理器。为了扩展其多处理器的功能,采用总线侦听的方法来维护多处理器环境下的cache一致性。首先介绍了共享总线侦听技术以及侦听协议,然后详细介绍了龙腾R2微处理器的总线侦听部件的实现方案,对几类cache一致性的实现方案以及性能进行了评析。FPGA实验结果表明,总线侦听部件能高效而准确地保证多处理器系统的cache一致性。相似文献

11.

多核片上系统时钟网络结构模型与仿真分析

余乐王瑶陈岩吴超李洋洋李阳光《测控技术》2017,36(8):94-98

对多核片上系统(MPSoC)而言,随着集成度和性能的提升,时钟网络的结构愈发重要.研究了基于结构建模的多路全局/局域时钟网络的结构建模与分析.通过建立多级级联,分别从主干、支干和接入三层对时钟网络的结构进行建模.针对运算单元接入数、单行中肋排数目、运算单元中输入时钟数目以及时钟区域数等几方面,评估了时钟网络性能.以Stratix V E FPGA为例对时钟网络综合分析,分析结果表明,四象限的对称结构权衡了多项性能指标,是最优的时钟网络结构,可以作为一种通用结构应用在目前主流MPSoC上. 相似文献

12.

A star network approach in heterogeneous multiprocessors system on chip

Chao Wang Xi Li Junneng Zhang Xuehai Zhou Aili Wang 《The Journal of supercomputing》2012,62(3):1404-1424

Multiprocessor System on Chip (MPSoC) platform plays a vital role in parallel processor architecture design. However, with the growing number of processors, interconnect on chip is becoming one of the major bottlenecks of MPSoC architecture. In this paper, we propose a star network based on peer to peer links on FPGA. The star network utilizes fast simplex links (FSL) as basic structure to connect the scheduler with heterogeneous processing elements, including processors and hardware IP cores. Blocking and nonblocking application interfaces are provided for high level programming. We built a prototype system on FPGA to evaluate the transfer time and hardware cost of the proposed star network architecture. Experiment results demonstrated that the average transfer time for each word could be reduced to 7 cycles, which achieves 14× speedup against state-of-the-art shared memory literatures. Moreover, the star network cost only 1.2?% Flip Flops and 2.45?% LUTs of a single FPGA. 相似文献

13.

Silicon-aware distributed switch architecture for on-chip networks

《Journal of Systems Architecture》2013,59(7):505-515

It is well-known that current Chip MultiProcessor (CMP) and high-end MultiProcessor System-on-Chip (MPSoC) designs are growing in their number of components. Networks-on-Chip (NoC) provide the required connectivity for such CMP and MPSoC designs at reasonable costs. As technology advances, links become the critical component in the NoC due to their long delay and power consumption, becoming unacceptable for long global interconnects.In this paper we present a new switch architecture that reduces the negative impact of links on the NoC. We call our proposal distributed switch. The distributed switch spreads the circuitry of the switch onto the links. Thus, packets are buffered, routed, and forwarded at the same time they are crossing the link.Distributing a modular switch onto the link improves the trade off between the power consumption and the operating frequency of the entire network. On the contrary, area resources are increased. Additionally, the distributed switch presents better fault tolerance and process variation behavior with respect to a non-distributed switch. 相似文献

14.

System-level performance analysis of multiprocessor system-on-chips by combining analytical model and execution time variation

Sungchan Kim Soonhoi Ha 《Microprocessors and Microsystems》2014

As the impact of the communication architecture on performance grows in a Multiprocessor System-on-Chip (MPSoC) design, the need for performance analysis in the early stage in order to consider various communication architectures is also increasing. While a simulation is commonly performed for performance evaluation of an MPSoC, it often suffers from a lengthy run time as well as poor performance coverage due to limited input stimuli or their ad hoc applications. In this paper, we propose a novel system-level performance analysis method to estimate the performance distribution of an MPSoC. Our approach consists of two techniques: (1) analytical model of on-chip crossbar-based communication architectures and (2) enumeration of task-level execution time variations for a target application. The execution time variation of tasks is efficiently captured by a memory access workload model. Thus, the proposed approach leads to better performance coverage for an MPSoC application in a reasonable computation time than the simulation-based approach. The experimental results validate the accuracy, efficiency, and practical usage of the proposed approach. 相似文献

15.

Parallel photon-mapping rendering on a mesh-NoC-based MPSoC platform

Mehrdad Fallahpour Ming-Bo Lin Chang-Hong Lin 《Journal of Parallel and Distributed Computing》2014

High demand 3-D scenes on embedded systems draw the developers’ attention to use the whole resources of current low-power processors and add dedicated hardware as a graphic accelerator unit to deal with real-time realistic scene rendering. Photon mapping, as one of the most powerful techniques to render highly realistic 3-D images by high amounts of floating-point operations, is very time-consuming. To use the advantages of multiprocessor systems to make 3-D scenes, parallel photon-mapping rendering on a homogeneous multiprocessor SoC (MPSoC) platform along with a mesh NoC by an adaptive wormhole routing method to communicate packets among cores is proposed in this paper. To make efficient use of the MPSoC platform to carry out photon-mapping rendering, many methods concerning the increase of load balancing, the efficient use of memory, and the decrease of communication cost to achieve a scalable application are explored in this paper. The resulting MPSoC platform is verified and evaluated by cycle-accurate simulations for different sizes of the mesh NoC. As expected, the proposed methods can obtain excellent load balancing and achieve a maximum of 44.3 times faster on an 8-by-8 MPSoC platform than on a single-core MPSoC platform. 相似文献

16.

A compiler infrastructure for embedded heterogeneous MPSoCs

Weihua Sheng Stefan SchürmansMaximilian Odendahl Mark BertschVitaliy Volevach Rainer LeupersGerd Ascheid 《Parallel Computing》2014

Programming heterogeneous MPSoCs (Multi-Processor Systems on Chip) is a grand challenge for embedded SoC providers and users today. In this paper, we argue the need for and significance of positioning the language and tool design from the perspective of practicality to address this challenge. We motivate, describe and justify such a practical design of a compilation framework for heterogeneous MPSoCs targeting the domain of streaming applications, named MAPS (MPSoC Application Programming Studio). MAPS defines a clean, light-weight C language extension to capture streaming programming models. A retargetable source-to-source compiler is developed to provide key capabilities to construct practical compilation frameworks for real-world, complex MPSoC platforms. Our results have shown that MAPS is a promising compiler infrastructure that enables programming of heterogeneous MPSoCs and increases productivity of MPSoC software developers. 相似文献

17.

Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC

Tiantian Liu Yingchao Zhao Minming Li Chun Jason XueAuthor vitae 《Journal of Parallel and Distributed Computing》2011,71(11):1473-1483

Cache locking technique is often utilized to guarantee a tighter prediction of Worst-Case Execution Time (WCET) which is one of the most important performance metrics for embedded systems. However, in Multi-Processor Systems-on-Chip (MPSoC) systems with multi-tasks, Level 2 (L2) cache is often shared among different tasks and cores, which leads to extended unpredictability of cache. Task assignment has inherent relevancy for cache behavior, while cache behavior also affects the efficiency of task assignment. Task assignment and cache behavior have dramatic influences on the overall WCET of MPSoC. This paper proposes joint task assignment and cache partitioning techniques to minimize the overall WCET for MPSoC systems. Cache locking is applied to each task to guarantee a precise WCET. We prove that the joint problem is NP-hard and propose several efficient algorithms. Experimental results show that the proposed algorithms can consistently reduce the overall WCET compared to previous techniques. 相似文献

18.

Arbiter synthesis approach for SoC multi-processor systems

Abdelkrim Zitouni^{Author Vitae} Rached Tourki Author Vitae 《Computers & Electrical Engineering》2008,34(1):63-77

The increasing complexity of Multi-Processor System on Chip (MPSoC) is requiring communication infrastructures that will efficiently accommodate the communication needs of the integrated computation resources. Exploring the arbitration space is crucial for achieving low latency communication. This paper illustrates an arbiter synthesis approach that allows a high performance MPSoC communication for multi-bus and Network on Chip (NoC) architectures. A cost function has been formulated in order to affect the priority order to each component or each set of components in a manner that minimizes the communication latency and generates a multi-level arbiter. The performance of the proposed approach have been analyzed in a design of an 8 × 8 ATM switch subsystem and a MPEG4 decoder mapped onto a 2-D mesh NoC. The results demonstrate that the MPSoC arbiter is well suited to provide high priority communication traffic with low latencies by allowing a preemption of lower priority transport. The sum of the mean waiting time at the eight ports of the ATM switch is minimum under the MPSoC arbitration scheme (4.30 cycle per word) while it is 3.00 times larger under the poorer performance arbitration scheme. In the case of the MPEG4 decoder, the average packet latency of the MPSoC is about 480 cycles while it is 640 cycles in the poorer performance arbitration scheme under a 0.4 flits/cycle injection rate. 相似文献

19.

Hardware/software support for adaptive work-stealing in on-chip multiprocessor

Quentin Meunier Frédéric Pétrot Jean-Louis Roch 《Journal of Systems Architecture》2010,56(8):392-406

During the past few years, embedded digital systems have been requested to provide a huge amount of processing power and functionality. A very likely foreseeable step to pursue this computational and flexibility trend is the generalization of on-chip multiprocessor platforms (MPSoC). In that context, choosing a programming model and providing optimized hardware support to it on these platforms is a challenging task. To deal in a portable way with MPSoCs having a different number of processors running possibly at different frequencies, work-stealing (WS) based parallelization is a current research trend.The contribution of this paper is to evaluate the impact of some simple MPSoCs’ architecture characteristics on the performance of WS in the MPSoC context. The previous evaluations of WS, either theoretical or experimental, were done on fixed multicores architectures. This work extends these studies by exploring the use of WS for the codesign of embedded applications on MPSoC platforms with different hardware capabilities, thanks to cycle-accurate measures.We firstly study the architectural choices suited to WS algorithms and measure the benefit of these architectural modifications. To assert whether WS is suited to the MPSoC context, we experimentally measure its intrinsic implementation overhead on the most efficient architectural designs. Finally, we validate the performances of the approach on two real applications: a regular multimedia application (temporal noise reduction) and an irregular computation intensive application (frames of the Mandelbrot set).Our results show that enhancing MPSoC platforms having up to 16 processors with widespread hardware support mechanisms can lead to important performance improvements at acceptable hardware cost for the considered applications. 相似文献

20.

基于NoC的多核分布式操作系统

下载免费PDF全文

胡新安付方发孙俊喻明艳《计算机工程》2012,38(5):259-261

采用主从控制方式和消息传递通信相结合的非均衡设计方法,设计基于片上网络(NoC)的多核分布式操作系统。在该系统中,主控节点通过资源池统计全局资源信息,利用运行时任务调度完成相关任务分派。从节点以异步统计模式反馈资源信息,并使用虚拟内存技术实现并行应用子进程的创建、加载和执行。测试结果表明,该系统能有效支持基于消息传递接口的并行程序的调度、加载及执行。相似文献