首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 953 毫秒
1.
为满足当前高速光纤通信领域对海量数据记录的需求,在继承PCI Express总线技术和磁盘阵列存储技术最新成果的基础上,设计并研制了一台可接入2.5Gbps光纤信号的记录仪;该系统由光纤数据采集卡、上位机、磁盘阵列及相应的应用软件和驱动程序共同构成;通过对采集存储机制的合理设计,有效地解决了记录仪工作中面临的处理带宽瓶颈问题;试验测试表明,该记录仪可连续记录2.5Gbps光纤信号达40分钟之久.  相似文献   

2.
可演化网络是一种动态可变换的网络结构,针对其软件实现方式导致的数据包转发延时较大问题,设计一种基于网络处理器的可演化路由器。充分利用网络处理器硬件中嵌入式处理器的灵活性与微引擎处理器快速高效的数据包处理能力,使数据包的处理速率接近 线速。  相似文献   

3.
基于网络处理器的入侵检测方法   总被引:1,自引:0,他引:1  
入侵检测是网络安全的核心技术。随着网络速度的不断提升,现有NIDS的检测速度已不适应千兆位以上网络,漏检率和误检率越来越高。网络处理器以高度并行、硬件多线程、多级存储和灵活可编程等先进技术提供高速的数据包处理性能。该文对利用网络处理器解决入侵检测的速度瓶颈提出了观点、方法和策略,设计和实现了一个面向入侵检测的高速网络处理器原型。  相似文献   

4.
光纤通道(Fiber Channel)网络具有低延时、高带宽、高可靠性等特点,非常适合于对数据传输带宽和速率有较高要求的综合式航空电子系统.为了解决航空电子系统FC网络测试环境中多路高带宽、大容量数据实时采集和快速存储的问题,在对FC协议进行深入研究的基础上,提出一种双通道FC数据采集卡解决方案.以含高性能处理器的FPGA为核心,构造嵌入式数据采集系统,使用硬件实现FC协议的解析和处理,软件实现数据存储格式的控制,并行地对两路FC输入数据进行采集,将处理后的数据以轮询优先级的方式,通过SATA接口写入到大容量存储设备中,完成采集数据的存储,为光纤通道网络测试技术在航空电子系统中的应用奠定了基础.  相似文献   

5.
针对传统教学平台在应用过程中无法达到预期效果的问题,以住宅建筑设计原理课程为例,设计基于Web的远程教学平台。硬件方面,对处理器和存储芯片进行选型与设计,利用AMW芯片存储运行信息;软件方面,利用分级网络编码技术计算通信网络编码的延时均值,通过延时均值将节点网络与主平台网络相连,实现远程教学通信,利用Web开发远程教学交互界面。实验结果表明,该平台的数据传输速率和运行帧率优于传统平台,具有一定的应用价值。  相似文献   

6.
在大型网络中大量的规则数量会导致位向量(BV)算法的位向量过长和稀疏,要在网络处理器中实现BV算法需要大量的存储资源,而且多次存储读取也降低了算法匹配效率。针对BV算法位向量的问题,将Tuple空间分割思想与BV算法相结合缩短了位向量长度,充分利用网络处理器的并行处理机制和硬件加速单元,提出了一种适用于网络处理器的改进算法——Tuple-BV算法。该算法的元组分割缩短了位向量的长度,减少了位向量的存储空间和读取次数。通过对数据包处理延时的实验比较,当较多规则时,Tuple-BV算法在最大延时和平均延时指标上优于BV算法。  相似文献   

7.
高速条件下数据访存是流管理的瓶颈,传统表项操作“读-处理-写”模式效率仅为36%,读写转换和读写延时是制约效率的重要因素.针对这个问题,提出了连续读写法处理表项,通过合并读写时延和复用读写转换的方式使表项操作效率超过90%;并在此基础上提出了单向并行多链表法.通过多个链表的交替操作,使连续读写法应用于处理不活动超时流.理论分析和实验表明,单向并行多链表法能够适用于OC-768(40Gbps)链路下管理千万条表项明显优于辅助存储法和双向链表法的OC-192 (10Gbps)下百万条表项的管理能力.  相似文献   

8.
随着集成电路工艺水平的不断提升以及应用对处理器性能要求的日益增长,验证已成为未来片上多核处理器发展的主要技术瓶颈.文中深入分析了片上多核处理器验证中状态空间大、完备性不足、存储结构与互连网络验证复杂、硅后验证困难等突出问题,系统地总结了片上多核处理器模拟验证、硬件仿真、形式验证、硅后验证等方面的研究进展,并对该领域未来的发展方向进行了分析与展望.  相似文献   

9.
片上多核处理器(CMP)已经成为处理器发展的方向,处理器设计的重点也转到了互连网络和存储层次结构方面,其中的一个关键问题是如何维护各处理器各级缓存(Cache)的一致性,该问题在传统的共享存储多处理器中使用Cache一致性协议来解决,而CMP相对于传统的多处理器结构具有更高的片上互连带宽和速度,给Cache一致协议提出了新的要求,也提供了新的改进机会.传统的总线侦听协议存在可扩展性不足和不必要的广播、侦听过多的缺点,而目录协议则存在失效间接延时大和复杂度高、验证困难等问题.环形连接的可扩展性好于总线结构,而其实现复杂度也远小于通常目录协议所使用的包交换点到点网络.将基于环的侦听协议应用于CMP;并考虑利用环的顺序性取消原有协议中冲突引起的重发操作,消除可能的饥饿、死锁和活锁等情况,增加协议的稳定性,同时减少消息流量和功耗;利用片上互连延时短的特点,将侦听结果和侦听请求同时传播,使得处理器可以根据侦听结果来对侦听请求进行选择性的侦听操作,可减少不必要的侦听操作,降低功耗.  相似文献   

10.
设计了一种以高性能DSP及ARM9处理器(ARM940T核)为主要部件的网络视频监控系统,重点分析了H.264算法特点及其处理方法,并对监控系统的视音频采集压缩卡进行了设计;通过嵌入式Linux操作系统融合整个系统,形成了一套完整的网络监控系统解决方案,最终可实现网络监控系统的小型化;通过对实际监控系统网络数据传输、数据存储及视频画面质量等各项指标测试、比对,达到了设计中的要求;采用DSP进行H.264算法处理图像质量好,压缩码流便于ARM处理器处理,提高了数据处理运行速度.  相似文献   

11.
目前以太网的发展速度远高于存储器和CPU的发展速度,存储器访问和CPU处理网络协议已经成为TCP的性能瓶颈。网络带宽的不断增大对CPU造成了沉重的负担,大约需要1GHz的CPU处理资源对1Gbps的网络流量进行协议处理。为此,使用多核NPU作为NIC,实现TCP接收数据路径中的校验和计算、报文乱序重组功能,并将合并之后的大报文经Linux网卡驱动程序交由协议栈处理,从而减少协议栈处理报文和网卡产生中断的数量,提升端系统的TCP性能。在10Gbps以太网络中,实验取得4.9Gbps的TCP接收数据吞吐量。  相似文献   

12.
Recent simulation based studies suggest that while superpipelines and superscalars are equally capable of exploiting fine grained concurrency, multiprocessors are better at exploiting coarse grained parallelism. An analytical model that is more flexible and less costly in terms of run time than simulation, is proposed as a tool for analyzing the tradeoff between superpipelined processors, superscalar processors, and multiprocessors. The duality of superpipelines and superscalars is examined in detail. The performance limit for these systems has been derived and it supports the fetch bottleneck observation of previous researchers. Common characteristics of utilization curves for such systems are examined. Combined systems, such as superpipelined multiprocessors and superscalar multiprocessors, are also analyzed. The model shows that the number of pipelines (or processors) at which the maximum throughput is obtained is, as memory access time increases, increasingly sensitive to the ratio of memory access time to network access delay. Further, as a function of interiteration dependence distance, optimum throughput is shown to vary nonlinearly, whereas the corresponding optimum number of processors varies linearly. The predictions from the analytical model agree with similar results published using simulation based techniques  相似文献   

13.
As processors continue to exploit more instruction level parallelism, greater demands are placed on the performance of the memory system. In this paper, we introduce a novel modification of the processor pipeline called memory renaming . Memory renaming applies register access techniques to load and store instructions to speed the processing of memory traffic. The approach works by accurately predicting memory communication early in the pipeline and then re - mapping the communication to fast physical registers. This work extends previous studies of data value and dependence speculation. When memory renaming is added to the processor pipeline, renaming can be applied to 30-50 % of all memory references, translating to an overall improvement in execution time of up to 14 % for current pipeline configurations. As store forward delay times grow larger, renaming support can lead to performance improvements of as much as 42 %. Furthermore, this improvement is seen across all memory segments—including the heap segment which has often been difficult to manage efficiently.  相似文献   

14.
多核处理器性能的发挥依靠程序的并行,共享存储并行编程模型为大多数多核处理器所采用,而有效同步多个线程对共享变量的访问是其关键、也是难题.借鉴数据库中事务的思想,人们提出事务存储(transactional memory),旨在提供一种编程简单,对程序正确性推理容易的同步手段.简介了事务存储的起源,诠释了事务存储系统的概念.论述了事务存储的编程接口和执行模型.讨论了事务存储系统所涉及的主要内容,对各种方法和策略进行了比较.对事务存储中有待解决的问题进行了探讨.最后介绍了几个开源的事务存储研究平台.  相似文献   

15.
The effectiveness of parallel processing of relational join operations is examined. The skew in the distribution of join attribute values and the stochastic nature of the task processing times are identified as the major factors that can affect the effective exploitation of parallelism. Expressions for the execution time of parallel hash join and semijoin are derived and their effectiveness analyzed. When many small processors are used in the parallel architecture, the skew can result in some processors becoming sources of bottleneck while other processors are being underutilized. Even in the absence of skew, the variations in the processing times of the parallel tasks belonging to a query can lead to high task synchronization delay and impact the maximum speedup achievable through parallel execution. For example, when the task processing time on each processor is exponential with the same mean, the speedup is proportional to P/ln(P) where P is the number of processors. Other factors such as memory size, communication bandwidth, etc., can lead to even lower speedup. These are quantified using analytical models  相似文献   

16.
Parallel Learning of Belief Networks in Large and Difficult Domains   总被引:1,自引:0,他引:1  
Learning belief networks from large domains can be expensive even with single-link lookahead search (SLLS). Since a SLLS cannot learn correctly in a class of problem domains, multi-link lookahead search (MLLS) is needed which further increases the computational complexity. In our experiment, learning in some difficult domains over more than a dozen variables took days. In this paper, we study how to use parallelism to speed up SLLS for learning in large domains and to tackle the increased complexity of MLLS for learning in difficult domains. We propose a natural decomposition of the learning task for parallel processing. We investigate two strategies for job allocation among processors to further improve load balancing and efficiency of the parallel system. For learning from very large datasets, we present a regrouping of the available processors such that slow data access through the file system can be replaced by fast memory access. Experimental results in a distributed memory MIMD computer demonstrate the effectiveness of the proposed algorithms.  相似文献   

17.
The disparity between the processing speed and the data access rates presents a serious bottleneck in pipelined/vector processors. The memory bank conflict in interleaved system can be alleviated by skewing, for scientific computations performing functions on varieties of submatrices. So far uniskewing involving periodic and linear functions have been studied. Several difficulties encountered in such schemes are that they require a prime number of memory modules, may create wasted memory space, or addressing functions and the alignment network become complex. We present a new technique, termed multiskewing, which applies multiple functions on different sections of the array. Each of these functions may be as simple as a linear shift. We show that some of the advantages are that it does not require a prime number of memory, memory utilization factor is 100%, maintains the logical structure of the array, and allows optimal memory access of a large class of submatrices  相似文献   

18.
基于LSOT的高速IP路由查找算法   总被引:9,自引:0,他引:9  
由于因特网速度不断提高、网络流量不断增加、路由表规模不断扩大,IP路由查找已经成为制约路由器性能的重要原因,因而受到广泛重视。目前人们已经提出几种算法用于解决IP路由查找问题,但均不能完全满足核心路由器的要求。该文提出一种基于LSOT的IP路由查找方法,它使用可变大小段表和偏移量表,能适应SRAM和FPGA芯片内存储器容量的变化,具有查找速度高、更新时间快、存储代价低、易于实现等特点,使用FPGA设计能满足10Gbps端口速率核心路由器环境的要求,使用ASIC设计能满足40Gbps端口速率核心路由器环境的要求。  相似文献   

19.
In the last decade, the volume of unstructured data that Internet and enterprise applications create and consume has been growing at impressive rates. The tools we use to process these data are search engines, business analytics suites, natural-language processors and XML processors. These tools rely on tokenization, a form of regular expression matching aimed at extracting words and keywords in a character stream. The further growth of unstructured data-processing paradigms depends critically on the availability of high-performance tokenizers. Despite the impressive amount of parallelism that the multi-core revolution has made available (in terms of multiple threads and wider SIMD units), most applications employ tokenizers that do not exploit this parallelism. I present a technique to design tokenizers that exploit multiple threads and wide SIMD units to process multiple independent streams of data at a high throughput. The technique benefits indefinitely from any future scaling in the number of threads or SIMD width. I show the approach’s viability by presenting a family of tokenizer kernels optimized for the Cell/B.E. processor that deliver a performance seen, so far, only on dedicated hardware. These kernels deliver a peak throughput of 14.30 Gbps per chip, and a typical throughput of 9.76 Gbps on Wikipedia input. Also, they achieve almost-ideal resource utilization (99.2%). The approach is applicable to any SIMD enabled processor and matches well the trend toward wider SIMD units in contemporary architecture design.  相似文献   

20.
多核处理器片上存储系统研究   总被引:1,自引:1,他引:0       下载免费PDF全文
针对多核处理器计算能力和访存速度间差异不断增大对多核系统性能提升的制约问题,分析几款典型多核处理器存储系统的设计特点,探讨多核处理器片上存储系统发展的关键技术,包括延迟造成的非一致cache访问、核与cache互连形式对访存性能的束缚以及片上cache设计的复杂化等。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号