首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
许彤  张仕健  吕涛 《计算机工程》2010,36(20):19-21
为提高处理器核仿真模型的效率,提出基于SimpleScalar架构对龙芯1号处理器进行虚拟处理器模型行为建模,IPC平均误差为2.3%,速度达到每秒1 000 000条指令。基于可控随机事件机制实现的总线功能模型可以为片上系统(SoC)设计提供激励主动生成方案和片上互连验证功能。实验结果证明,该方法对处理器IP仿真建模具有普适意义,能够被无缝融入SoC流程中。  相似文献   

2.
SimpIeScaIar是目前国际上常用的一种超标量处理器的性能模拟器。首先分析了SimpleScaIar模拟器的内部体系结构,并在此基础上深入剖析了其分支预测部件的实现机制。针对SimpIeScaIar模拟器分支预测部件只支持基于计数器预测器的局限性,通过深入研究Perceptron-based分支预测器的实现机制,提出并设计了如何在SimpleScaIar模拟器中实现Perceptron-based分支预测器的方案。对超标量处理器的性能模拟和研究有着实际的意义。  相似文献   

3.
任东华 《福建电脑》2010,26(6):108-109,112
本文主要介绍了NAT机制和网络处理器NP-2的体系结构,结合网络处理器NP-2的结构特点,提出了NAT在NP-2中的实现方案。  相似文献   

4.
杨娟  曾苗祥  徐晶  许炜 《计算机科学》2015,42(3):266-270, 295
目前基于普通架构的视频检测速度较慢,难以满足网络视频实时监测的要求,为此提出一个基于众核处理器和图形处理单元(GPU)的视频检测方案。该方案基于众核处理器实现视频解码,基于GPU实现SURF(Speed Up Robust Features)和SVM(Support Vector Machine)的图像检测算法。与基于普通PC架构的视频检测方案相比,该方案的视频检测性能提升了10倍以上。  相似文献   

5.
为了设计高可信的安全关键实时操作系统CRTOS2.0,在分析现有操作系统可信性保障机制的基础上,提出了基于时空隔离保护机制构建安全关键实时操作系统的新思想.空间隔离保护的目的是防止不同地址空间内的程序无意或恶意越界进行非法读写,而时间隔离保护的目的则是为了防止某程序长期独占或超时使用处理器而阻止或延迟其他程序的运行.为实现时间隔离保护机制,在改进传统处理器能力预留机制的基础上,基于两级调度的思想,提出了新的实现方法.时空隔离保护机制的提出,可从本质上增强安全关键实时操作系统的可信性.  相似文献   

6.
王刚  王跃科  乔纯捷 《测控技术》2007,26(12):48-50
在某大型测试系统的设计中,为了实现多只处理器问灵活的数据交换及并行处理,提出了基于Ethernet的多处理器并行系统设计;通过FPGA实现了以太网交换机的介质无关接口与处理器同步串口的接口转换,从而实现了处理器接收和发送网络数据.在此基础上实现了多处理器的并行数据处理。为实现高效的对多处理器系统的开发调试,提出了基于Ethernet的多处理器网络调试方案,最后对系统的可扩展性进行了分析。  相似文献   

7.
提出了一种基于Nios的通用编译码器的设计,利用嵌入在FPGA中的Nios处理器,对多种编译码模块进行控制。详细论述了主要模块的设计和实现方案及整个系统的启动机制。该编译码器在通信原理教学实验系统中运行良好,体现了它的稳定性及可扩展性。  相似文献   

8.
为使RISC处理器平台具备检测代码重用攻击的能力,将控制流完整性机制与可信计算中的动态远程证明协议相结合,提出面向RISC处理器的硬件辅助控制流认证方案。以开源RISC处理器为基础,扩展与处理器紧耦合的硬件监控单元,同时给出控制流认证方案的证明协议,设计用于跟踪执行路径的硬件编码方法以实现信息压缩。实验结果表明,与C-FLAT方案相比,该方案传输延时小且资源消耗少,能够保证RISC处理器控制流的可信安全。  相似文献   

9.
众核处理器片上同步机制和评估方法研究   总被引:1,自引:0,他引:1  
同步机制是片上多核/众核处理器正确执行和协同通信的关键,其效率对处理器的性能非常重要.针对片上众核体系结构,提出并实现了两种粗粒度同步机制和一种细粒度同步机制,即片上专用硬件支持的同步机制、基于原语的片上互斥访问同步机制和基于满空标志位的细粒度同步机制;提出了粗粒度同步机制的评估标准和评估方法,并设计了量化评估程序.以片上同构众核处理器Godson-T模拟器和AMD Opteron商业片上多核处理器为平台,评估比较了提出的硬件支持的同步机制与基于原语的同步机制的性能.结果表明,硬件支持可以使得片上众核处理器的同步机制性能明显提高;在传统基于原语的同步机制中,大部分性能损失是由于负载不平衡和同步点的串行化操作而造成的等待时间.  相似文献   

10.
主要介绍了QoS机制和网络处理器NP-1c的体系结构,并参照DiffServ模型,结合网络处理器NP-1c的结构特点,提出了QoS在NP-1c中的实现方案.  相似文献   

11.
This paper proposes some decentralized smoothing algorithms for a continuous-time linear estimation structure consisting of a central processor and of two local processors, in which the local models are assumed to be identical to the global model. The philosophy of the paper is to solve the problem in terms of the local forward and backward information (or Kalman) filters. The resulting algorithms are somewhat different from those based on the local smoothing estimates which have been studied by some other authors. Smoothing update and real-time smoothing algorithms are also presented, ft is shown that the present algorithms have some advantages: the global filtered estimates can be obtained in the course of computing the decentralized smoothing estimates and the central and local processors can be derived in a completely parallel fashion  相似文献   

12.
Many real-time systems must control their CPU utilizations in order to meet end-to-end deadlines and prevent overload. Utilization control is particularly challenging in distributed real-time systems with highly unpredictable workloads and a large number of end-to-end tasks and processors. This paper presents the decentralized end-to-end utilization control (DEUCON) algorithm, which can dynamically enforce the desired utilizations on multiple processors in such systems. In contrast to centralized control schemes adopted in earlier works, DEUCON features a novel decentralized control structure that requires only localized coordination among neighbor processors. DEUCON is systematically designed based on advances in distributed model predictive control theory. Both control-theoretic analysis and simulations show that DEUCON can provide robust utilization guarantees and maintain global system stability despite severe variations in task execution times. Furthermore, DEUCON can effectively distribute the computation and communication cost to different processors and tolerate considerable communication delay between local controllers. Our results indicate that DEUCON can provide a scalable and robust utilization control for large-scale distributed real-time systems executing in unpredictable environments.  相似文献   

13.
Cache coherency in a scalable parallel computer architecture requires mechanisms beyond the conventional common bus based snooping approaches which are limited to about 16 processors. The new Convex SPP-1000 achieves cache coherency across 128 processors through a two-level shared memory NUMA structure employing directory based and SCI protocol mechanisms. While hardware support for managing a common global name space minimizes overhead costs and simplifies programming, latency considerations for remote accesses may still dominate and can under unfavorable conditions constrain scalability. This paper provides the first published evaluation of the SP-1000 hierarchical cache coherency mechanisms from the perspective of measured latency and its impact on basic global How control mechanisms, scaling of a parallel science code, and sensitivity of cache miss rates to system scale. It is shown that global remote access latency is only a factor of seven greater than that of local cache miss penalty and that scaling of a challenging scientific application is not severely degraded by the hierarchical structure for achieving consistency across the system processor caches.  相似文献   

14.
Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n pieces of data and to make the result known to all the n processors. This operation is frequently used in many message-passing systems and is typically referred to as global combine, census computation, or gossiping. This paper explores the problem of global combine in the multiport postal model. This model is characterized by three parameters: n-the number of processors, k-the number of ports per processor, and λ-the communication latency. In this model, in every round r, each processor can send k distinct messages to k other processors, and it can receive k messages that were sent from k other processors λ-1 rounds earlier. This paper provides an optimal algorithm for the global combine problem that requires the least number of communication rounds and minimizes the time spent by any processor in sending and receiving messages  相似文献   

15.
In this paper,we propose a decentralized parallel computation model for global optimization using interval analysis.The model is adaptive to any number of processors and the workload is automatically and evenly distributed among all processors by alternative message passing.The problems received by each processor are processed based on their local dominance properties,which avoids unnecessary interval evaluations.Further,the problem is treated as a whole at the beginning of computation so that no initial decomposition scheme is required.Numerical experiments indicate that the model works well and is stable with different number of parallel processors,distributes the load evenly among the processors,and provides an impressive speedup,especially when the problem is time-consuming to solve.  相似文献   

16.
A timestamp-based software-assisted cache coherence scheme that does not require any global communication to enforce the coherence of multiple private caches is proposed. It is intended for shared memory multiprocessors. The scheme is based on a compile-time marking of references and a hardware-based local incoherence detection scheme. The possible incoherence of a cache entry is detected and the associated entry is implicitly invalidated by comparing a clock (related to program flow) and a timestamp (related to the time of update in the cache). Results of a performance comparison, which is based on a trace-driven simulation using actual traces. between the proposed timestamp-based scheme and other software-assisted schemes indicate that the proposed scheme performs significantly better than previous software-assisted schemes, especially when the processors are carefully scheduled so as to maximize the reuse of cache contents. This scheme requires neither a shared resource nor global communication and is, therefore, scalable up to a large number of processors  相似文献   

17.
We propose a simple algorithm which is based on edge-coloring of system graphs for termination detection of loosely synchronous computations. The proposed algorithm is fully symmetric in that all processors run syntactically identical code and can detect global termination at the same time. Under the 1-port communication model, the algorithm is optimal in terms of termination delay, the difference between the time when a global termination occurs and the time it is detected, in a number of structures-chain, ring of even number of nodes, k-ary n-cube and k-ary n-mesh of low degree, where k is even; and near-optimal for other cases. The optimality analysis is based on results from a related problem, periodic gossiping in edge-colored graphs. This algorithm has been applied to some practical cases in which the overhead due to its execution is found to be insignificant  相似文献   

18.
We present methods that can dramatically improve numerical consistency for parallel calculations across varying numbers of processors. By calculating global sums with enhanced precision techniques based on Kahan or Knuth summations, the consistency of the numerical results can be greatly improved with minimal memory and computational cost. This study assesses the value of the enhanced numerical consistency in the context of general finite difference or finite volume calculations.  相似文献   

19.
An approach to verifying control flow in distributed computer systems (DCS) is presented. The approach is based on control flow checking among software components distributed over processors and cooperating among them. In this approach, control-flow behavior of DCS software is modeled and contained in special software components called verifiers. The verifiers are distributed over the processors and consulted to check the correctness of the control flow in DCS software during its execution. Algorithms for deriving the verifiers are presented. This technique can detect global errors including synchronization errors as well as local errors. It can be used for sequential or concurrent software at various levels of details. Experiments show that using this technique requires no significant overhead.<>  相似文献   

20.
As technology improves and transistor feature sizes continue to shrink, the effects of on-chip interconnect wire latencies on processor clock speeds will become more important. In addition, as we reach the limits of instruction-level parallelism that can be extracted from application programs, there will be an increased emphasis on thread-level parallelism. To continue to improve performance, computer architects will need to focus on architectures that can efficiently support thread-level parallelism while minimizing the length of on-chip interconnect wires. The SCMP (Single-Chip Message-Passing) parallel computer system is one such architecture. The SCMP system includes up to 64 processors on a single chip, connected in a 2-D mesh with nearest neighbor connections. Memory is included on-chip with the processors and the architecture includes hardware support for communication and the execution of parallel threads. Since there are no global signals or shared resources between the processors, the length of the interconnect wires will be determined by the size of the individual processors, not the size of the entire chip. Avoiding long interconnect wires will allow the use of very high clock frequencies, which, when coupled with the use of multiple processors, will offer tremendous computational power.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号