期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

许彤张仕健吕涛《计算机工程》2010,36(20):19-21

为提高处理器核仿真模型的效率,提出基于SimpleScalar架构对龙芯1号处理器进行虚拟处理器模型行为建模,IPC平均误差为2.3%,速度达到每秒1 000 000条指令。基于可控随机事件机制实现的总线功能模型可以为片上系统(SoC)设计提供激励主动生成方案和片上互连验证功能。实验结果证明,该方法对处理器IP仿真建模具有普适意义,能够被无缝融入SoC流程中。相似文献

2.

Perceptron-Based 分支预测SimpleScalar中的实现

叶新栋唐志强涂时亮《计算机系统应用》2010,19(1):51-54

SimpIeScaIar是目前国际上常用的一种超标量处理器的性能模拟器。首先分析了SimpleScaIar模拟器的内部体系结构,并在此基础上深入剖析了其分支预测部件的实现机制。针对SimpIeScaIar模拟器分支预测部件只支持基于计数器预测器的局限性,通过深入研究Perceptron-based分支预测器的实现机制,提出并设计了如何在SimpleScaIar模拟器中实现Perceptron-based分支预测器的方案。对超标量处理器的性能模拟和研究有着实际的意义。相似文献

3.

基于网络处理器NP-2的NAT设备的实现

任东华《福建电脑》2010,26(6):108-109,112

本文主要介绍了NAT机制和网络处理器NP-2的体系结构,结合网络处理器NP-2的结构特点,提出了NAT在NP-2中的实现方案。相似文献

4.

基于众核处理器和GPU的视频快速检测方案

杨娟曾苗祥徐晶许炜《计算机科学》2015,42(3):266-270, 295

目前基于普通架构的视频检测速度较慢,难以满足网络视频实时监测的要求,为此提出一个基于众核处理器和图形处理单元(GPU)的视频检测方案。该方案基于众核处理器实现视频解码,基于GPU实现SURF(Speed Up Robust Features)和SVM(Support Vector Machine)的图像检测算法。与基于普通PC架构的视频检测方案相比,该方案的视频检测性能提升了10倍以上。相似文献

5.

安全关键实时操作系统时间隔离保护机制的设计与实现

杨仕平桑楠陈慧熊光泽《计算机研究与发展》2004,41(7):1306-1314

为了设计高可信的安全关键实时操作系统CRTOS2．0，在分析现有操作系统可信性保障机制的基础上，提出了基于时空隔离保护机制构建安全关键实时操作系统的新思想．空间隔离保护的目的是防止不同地址空间内的程序无意或恶意越界进行非法读写，而时间隔离保护的目的则是为了防止某程序长期独占或超时使用处理器而阻止或延迟其他程序的运行．为实现时间隔离保护机制，在改进传统处理器能力预留机制的基础上，基于两级调度的思想，提出了新的实现方法．时空隔离保护机制的提出，可从本质上增强安全关键实时操作系统的可信性．相似文献

6.

基于以太网的TMS320C6713并行系统设计

王刚王跃科乔纯捷《测控技术》2007,26(12):48-50

在某大型测试系统的设计中,为了实现多只处理器问灵活的数据交换及并行处理,提出了基于Ethernet的多处理器并行系统设计;通过FPGA实现了以太网交换机的介质无关接口与处理器同步串口的接口转换,从而实现了处理器接收和发送网络数据．在此基础上实现了多处理器的并行数据处理。为实现高效的对多处理器系统的开发调试,提出了基于Ethernet的多处理器网络调试方案,最后对系统的可扩展性进行了分析。相似文献

7.

基于Nios的通用编译码器的设计

聂伟杨晓青《电子技术应用》2006,32(8):86-89

提出了一种基于Nios的通用编译码器的设计,利用嵌入在FPGA中的Nios处理器,对多种编译码模块进行控制。详细论述了主要模块的设计和实现方案及整个系统的启动机制。该编译码器在通信原理教学实验系统中运行良好,体现了它的稳定性及可扩展性。相似文献

8.

面向RISC处理器的控制流认证方案

李扬戴紫彬李军伟《计算机工程》2019,45(12)

为使RISC处理器平台具备检测代码重用攻击的能力,将控制流完整性机制与可信计算中的动态远程证明协议相结合,提出面向RISC处理器的硬件辅助控制流认证方案。以开源RISC处理器为基础,扩展与处理器紧耦合的硬件监控单元,同时给出控制流认证方案的证明协议,设计用于跟踪执行路径的硬件编码方法以实现信息压缩。实验结果表明,与C-FLAT方案相比,该方案传输延时小且资源消耗少,能够保证RISC处理器控制流的可信安全。相似文献

9.

众核处理器片上同步机制和评估方法研究 总被引：1，自引：0，他引：1

徐卫志宋风龙刘志勇范东睿余磊张帅《计算机学报》2010,33(10)

同步机制是片上多核/众核处理器正确执行和协同通信的关键,其效率对处理器的性能非常重要.针对片上众核体系结构,提出并实现了两种粗粒度同步机制和一种细粒度同步机制,即片上专用硬件支持的同步机制、基于原语的片上互斥访问同步机制和基于满空标志位的细粒度同步机制;提出了粗粒度同步机制的评估标准和评估方法,并设计了量化评估程序.以片上同构众核处理器Godson-T模拟器和AMD Opteron商业片上多核处理器为平台,评估比较了提出的硬件支持的同步机制与基于原语的同步机制的性能.结果表明,硬件支持可以使得片上众核处理器的同步机制性能明显提高;在传统基于原语的同步机制中,大部分性能损失是由于负载不平衡和同步点的串行化操作而造成的等待时间. 相似文献

10.

QoS在网络处理器NP-1c上的实现

毛芳来学嘉《计算机应用与软件》2008,25(4):160-162

主要介绍了QoS机制和网络处理器NP-1c的体系结构,并参照DiffServ模型,结合网络处理器NP-1c的结构特点,提出了QoS在NP-1c中的实现方案. 相似文献

11.

Continuous-time decentralized smoothers based on two-filter form: identical local and global models

KEIGO WATANABE 《International journal of systems science》2013,44(7):1015-1028

This paper proposes some decentralized smoothing algorithms for a continuous-time linear estimation structure consisting of a central processor and of two local processors, in which the local models are assumed to be identical to the global model. The philosophy of the paper is to solve the problem in terms of the local forward and backward information (or Kalman) filters. The resulting algorithms are somewhat different from those based on the local smoothing estimates which have been studied by some other authors. Smoothing update and real-time smoothing algorithms are also presented, ft is shown that the present algorithms have some advantages: the global filtered estimates can be obtained in the course of computing the decentralized smoothing estimates and the central and local processors can be derived in a completely parallel fashion 相似文献

12.

DEUCON: Decentralized End-to-End Utilization Control for Distributed Real-Time Systems

Wang X. Jia D. Lu C. Koutsoukos X. 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(7):996-1009

Many real-time systems must control their CPU utilizations in order to meet end-to-end deadlines and prevent overload. Utilization control is particularly challenging in distributed real-time systems with highly unpredictable workloads and a large number of end-to-end tasks and processors. This paper presents the decentralized end-to-end utilization control (DEUCON) algorithm, which can dynamically enforce the desired utilizations on multiple processors in such systems. In contrast to centralized control schemes adopted in earlier works, DEUCON features a novel decentralized control structure that requires only localized coordination among neighbor processors. DEUCON is systematically designed based on advances in distributed model predictive control theory. Both control-theoretic analysis and simulations show that DEUCON can provide robust utilization guarantees and maintain global system stability despite severe variations in task execution times. Furthermore, DEUCON can effectively distribute the computation and communication cost to different processors and tolerate considerable communication delay between local controllers. Our results indicate that DEUCON can provide a scalable and robust utilization control for large-scale distributed real-time systems executing in unpredictable environments. 相似文献

13.

An Empirical Evaluation of the Convex SPP-1000 Hierarchical Shared Memory System

Thomas Sterling Daniel Savarese Phillip Merkey Kevin Olson 《International journal of parallel programming》1996,24(4):377-396

Cache coherency in a scalable parallel computer architecture requires mechanisms beyond the conventional common bus based snooping approaches which are limited to about 16 processors. The new Convex SPP-1000 achieves cache coherency across 128 processors through a two-level shared memory NUMA structure employing directory based and SCI protocol mechanisms. While hardware support for managing a common global name space minimizes overhead costs and simplifies programming, latency considerations for remote accesses may still dominate and can under unfavorable conditions constrain scalability. This paper provides the first published evaluation of the SP-1000 hierarchical cache coherency mechanisms from the perspective of measured latency and its impact on basic global How control mechanisms, scaling of a parallel science code, and sensitivity of cache miss rates to system scale. It is shown that global remote access latency is only a factor of seven greater than that of local cache miss penalty and that scaling of a challenging scientific application is not severely degraded by the hierarchical structure for achieving consistency across the system processor caches. 相似文献

14.

Computing global combine operations in the multiport postal model

Bar-Noy A. Bruck J. Ching-Tien Ho Kipnis S. Schieber B. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(8):896-900

Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n pieces of data and to make the result known to all the n processors. This operation is frequently used in many message-passing systems and is typically referred to as global combine, census computation, or gossiping. This paper explores the problem of global combine in the multiport postal model. This model is characterized by three parameters: n-the number of processors, k-the number of ports per processor, and λ-the communication latency. In this model, in every round r, each processor can send k distinct messages to k other processors, and it can receive k messages that were sent from k other processors λ-1 rounds earlier. This paper provides an optimal algorithm for the global combine problem that requires the least number of communication rounds and minimizes the time spent by any processor in sending and receiving messages 相似文献

15.

A Parallel Interval Computation Model for Global Optimization with Automatic Load Balancing

下载免费PDF全文

吴勇 Arun Kumar 《计算机科学技术学报》2012,27(4):744-753

In this paper,we propose a decentralized parallel computation model for global optimization using interval analysis.The model is adaptive to any number of processors and the workload is automatically and evenly distributed among all processors by alternative message passing.The problems received by each processor are processed based on their local dominance properties,which avoids unnecessary interval evaluations.Further,the problem is treated as a whole at the beginning of computation so that no initial decomposition scheme is required.Numerical experiments indicate that the model works well and is stable with different number of parallel processors,distributes the load evenly among the processors,and provides an impressive speedup,especially when the problem is time-consuming to solve. 相似文献

16.

Design and analysis of a scalable cache coherence scheme based onclocks and timestamps

Min S.L. Baer J.-L. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(1):25-44

A timestamp-based software-assisted cache coherence scheme that does not require any global communication to enforce the coherence of multiple private caches is proposed. It is intended for shared memory multiprocessors. The scheme is based on a compile-time marking of references and a hardware-based local incoherence detection scheme. The possible incoherence of a cache entry is detected and the associated entry is implicitly invalidated by comparing a clock (related to program flow) and a timestamp (related to the time of update in the cache). Results of a performance comparison, which is based on a trace-driven simulation using actual traces. between the proposed timestamp-based scheme and other software-assisted schemes indicate that the proposed scheme performs significantly better than previous software-assisted schemes, especially when the processors are carefully scheduled so as to maximize the reuse of cache contents. This scheme requires neither a shared resource nor global communication and is, therefore, scalable up to a large number of processors 相似文献

17.

Efficient termination detection for loosely synchronousapplications in multicomputers

Chengzhong Xu Lau F.C.M. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(5):537-544

We propose a simple algorithm which is based on edge-coloring of system graphs for termination detection of loosely synchronous computations. The proposed algorithm is fully symmetric in that all processors run syntactically identical code and can detect global termination at the same time. Under the 1-port communication model, the algorithm is optimal in terms of termination delay, the difference between the time when a global termination occurs and the time it is detected, in a number of structures-chain, ring of even number of nodes, k-ary n-cube and k-ary n-mesh of low degree, where k is even; and near-optimal for other cases. The optimality analysis is based on results from a related problem, periodic gossiping in edge-colored graphs. This algorithm has been applied to some practical cases in which the overhead due to its execution is found to be insignificant 相似文献

18.

In search of numerical consistency in parallel programming

Robert W. Robey Jonathan M. Robey Rob Aulwes 《Parallel Computing》2011,37(4-5):217-229

We present methods that can dramatically improve numerical consistency for parallel calculations across varying numbers of processors. By calculating global sums with enhanced precision techniques based on Kahan or Knuth summations, the consistency of the numerical results can be greatly improved with minimal memory and computational cost. This study assesses the value of the enhanced numerical consistency in the context of general finite difference or finite volume calculations. 相似文献

19.

Verification of concurrent control flow in distributed computer systems

Yau S.S. Hong W. 《IEEE transactions on pattern analysis and machine intelligence》1988,14(4):405-417

An approach to verifying control flow in distributed computer systems (DCS) is presented. The approach is based on control flow checking among software components distributed over processors and cooperating among them. In this approach, control-flow behavior of DCS software is modeled and contained in special software components called verifiers. The verifiers are distributed over the processors and consulted to check the correctness of the control flow in DCS software during its execution. Algorithms for deriving the verifiers are presented. This technique can detect global errors including synchronization errors as well as local errors. It can be used for sequential or concurrent software at various levels of details. Experiments show that using this technique requires no significant overhead.<> 相似文献

20.

SCMP: A Single-Chip Message-Passing Parallel Computer

Baker James M. Gold Brian Bucciero Mark Bennett Sidney Mahajan Rajneesh Ramachandran Priyadarshini Shah Jignesh 《The Journal of supercomputing》2004,30(2):133-149

As technology improves and transistor feature sizes continue to shrink, the effects of on-chip interconnect wire latencies on processor clock speeds will become more important. In addition, as we reach the limits of instruction-level parallelism that can be extracted from application programs, there will be an increased emphasis on thread-level parallelism. To continue to improve performance, computer architects will need to focus on architectures that can efficiently support thread-level parallelism while minimizing the length of on-chip interconnect wires. The SCMP (Single-Chip Message-Passing) parallel computer system is one such architecture. The SCMP system includes up to 64 processors on a single chip, connected in a 2-D mesh with nearest neighbor connections. Memory is included on-chip with the processors and the architecture includes hardware support for communication and the execution of parallel threads. Since there are no global signals or shared resources between the processors, the length of the interconnect wires will be determined by the size of the individual processors, not the size of the entire chip. Avoiding long interconnect wires will allow the use of very high clock frequencies, which, when coupled with the use of multiple processors, will offer tremendous computational power. 相似文献