期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Hierarchical Binary Set Partitioning in Cache Memories

Hamid Reza?Zarandi Email author Hamid?Sarbazi-Azad Email author 《The Journal of supercomputing》2005,31(2):185-202

In this paper, a new cache placement scheme is proposed to achieve higher hit ratios with respect to the two conventional schemes namely set-associative and direct mapping. Similar to set-associative, in this scheme, cache space is divided into sets of different sizes. Hence, the length of tag fields associated to each set is also variable and depends on the partition it is in. The proposed mapping function has been simulated with some standard trace files and statistics are gathered and analyzed for different cache configurations. The results reveal that the proposed scheme exhibits a higher hit ratio compared to the two well-known mapping schemes, namely set-associative and direct mapping, using LRU replacement policy. 相似文献

2.

General method of testing the compatibility of interacting automata with finite memory

A. N. Chebotarev 《Cybernetics and Systems Analysis》1999,35(6):867-874

A method is proposed for compatibility analysis of two interacting partial nondeterministic automata A and B specified in a first-order language with monadic predicates. In contrast to a method proposed earlier, no restrictions are imposed on the form of specification of the automaton B. These investigations were partially supported by INTAS grant 96-0760. Translated from Kibernetika i Sistemnyi Analiz, No. 6, pp. 25–37, November–December, 1999. 相似文献

3.

Qing Yang Sridar Adina T. Sun 《Journal of Parallel and Distributed Computing》1998,48(2):143-164

On-chip caches to reduce average memory access latency are commonplace in today's commercial microprocessors. These on-chip caches generally have low associativity and small cache sizes. Cache line conflicts are the main source of cache misses, which are critical for overall system performance. This paper introduces an innovative design for on-chip data caches of microprocessors, called one's complement cache. While binary complement numbers have been successfully used in designing arithmetic units, to the best of our knowledge, no one has ever considered using such complement numbers in cache memory designs. This paper will show that such complement numbers help greatly in reducing cache misses in a data cache, thereby improving data cache performance. By parallel computation of cache addresses and memory addresses, the new design does not increase the critical hit time of cache accesses. Cache misses caused by line interference are reduced by evenly distributing data items referenced by program loops across all sets in a cache. Even distribution of data in the cache is achieved by making the number of sets in the cache a prime or an odd number, so that the chance of related data being mapped to a same set is small. Trace-driven simulations are used to evaluate the performance of the new design. Performance results on benchmarks show that the new design improves cache performance significantly with negligible additional hardware cost. 相似文献

4.

Bruce W. Watson 《Software》2004,34(3):239-248

New applications of finite automata, such as computational linguistics and asynchronous circuit simulation, can require automata of millions or even billions of states. All known construction methods (in particular, the most effective reachability‐based ones that save memory, such as the subset construction, and simultaneously minimizing constructions, such as Brzozowski's) have intermediate memory usage much larger than the final automaton, thereby restricting the maximum size of the automata which can be built. In this paper, I present a reachability‐based optimization which can be used in most of these construction algorithms to reduce the intermediate memory requirements. The optimization is presented in conjunction with an easily understood (and implemented) canonical automaton construction algorithm. Copyright © 2003 John Wiley & Sons, Ltd. 相似文献

5.

Performance Metrics and Models for Shared Cache

下载免费PDF全文

丁晨 ;向晓娅 ;包斌 ;罗昊 ;罗英伟 ;汪小林《计算机科学技术学报》2014,29(4):692-712

Performance metrics and models are prerequisites for scientific understanding and optimization. This paper introduces a new footprint-based theory and reviews the research in the past four decades leading to the new theory. The review groups the past work into metrics and their models in particular those of the reuse distance, metrics conversion, models of shared cache, performance and optimization, and other related techniques. 相似文献

6.

拟（h,k）存贮有限自动机的可逆性

蒙春凤邓培民易忠《计算机工程与应用》2009,45(4):59-63

主要研究拟（h,k）阶存贮有限自动机的延迟k步与k＋1步弱可逆性,以及它的弱逆,得到了拟（h,k）阶存贮有限自动机的延迟k步与k＋1步弱可逆的充分必要条件,并且通过所得结果可以比较简便地构造出延迟k步与k＋1步弱可逆拟（h,k）阶存贮有限自动机的延迟k步与k＋1步弱逆。相似文献

7.

关于二元延迟3步前馈逆有限自动机的结构 总被引：1，自引：0，他引：1

下载免费PDF全文

王鸿吉姚刚《软件学报》2007,18(1):40-49

前馈逆有限自动机的结构是有限自动机可逆性理论中的基本问题.对延迟步数≥3的前馈逆结构的刻划,则是一个长期的未解决问题.研究了二元延迟3步前馈逆有限自动机的结构.对于自治有限自动机M_a的状态图为圈的二元延迟3步弱可逆半输入存储有限自动机C(M_af ),给出了其长3极小输出权分别为1,2,8三种情形下结构的一种刻画.由于C(M_af )延迟3步弱可逆当且仅当它是延迟3步弱逆,因此,得到了二元延迟3步前馈逆有限自动机结构的一种部分刻画. 相似文献

8.

拟(h,k)存贮有限自动机的可逆性

下载免费PDF全文

蒙春凤邓培民易忠《计算机工程与应用》2009,45(4)

主要研究拟(h,k)阶存贮有限自动机的延迟k步与k+1步弱可逆性,以及它的弱逆,得到了拟(h,k)阶存贮有限自动机的延迟k步与k+1步弱可逆的充分必要条件,并且通过所得结果可以比较简便地构造出延迟k步与k+1步弱可逆拟(h,k)阶存贮有限自动机的延迟k步与k+1步弱逆. 相似文献

9.

PCI协议对Cache的支持

刘光明《计算机工程与科学》1996,18(3):21-27

论述了Ｃａｃｈｅ在高性能计算机系统中的作用和访问Ｃａｃｈｅ的过程,以及Ｃａｃｈｅ数据一致性问题和解决的方法,介绍和分析了ＰＣＩ协议对Ｃａｃｈｅ的支持。相似文献

10.

解决中央处理器运算速度与内存系统性能矛盾的一种方案

续敏汪小涓《计算机工程与应用》2002,38(6):86-89

中央处理器-存储器集成是解决当前处理器运算速度与传统的存储器系统性能滞后的一种新思路。文章运用一个简单的评价模型和模拟运行,分析了几种处理器-存储器集成的方案。相似文献

11.

私有Cache的选择对紧耦合多处理机系统访存冲突的影响

邢二保周兴铭《计算机工程》1993,19(3):32-37

相似文献

12.

一类NFA 到DFA 的直接转化方法

下载免费PDF全文

程元斌《计算机系统应用》2012,21(10):109-113

NFA的确定化具有重要的理论和实际意义.迄今为止,普遍采用子集构造法将一个NFA(非确定性自动机)转化为DFA(确定性自动机),但这种方法需要引入空输入ε及状态子集I的ε-闭包,其计算过程相对繁琐.而且在确定化过程中对于NFA状态集存在ε-closure重复计算和由于对非ε转换的判断而引起的重复计算等问题.本文描述了一种将一类NFA直接转化为DFA的方法.在本方法中,不需要引入空输入ε,可根据原始的NFA状态图或状态转移表直接得出等价的DFA状态图或状态转移表,而且所有状态都是单一的状态而非集合状态,便于软硬件实现与测试. 相似文献

13.

概率有限自动机的积和分解

吴宗显邓培民易忠《计算机工程与应用》2009,45(15):47-50

给出几种概率有限自动机的积;讨论了他们之间的相互关系;并在文献[1]的基础上利用这些积给出匀概率有限自动机的分解;证明了一个匀概率有限自动机可以分解为一个随机编码源、一个伯努利过程和一些确定有限自动机的串联积。相似文献

14.

弱可逆拟(r,r)阶存贮线性有限自动机的分解

吴成来邓培民易忠《计算机工程与应用》2007,43(23):43-47

通过对延迟r步弱可逆拟(r,r)阶存贮线性有限自动机输出权的研究,得出对延迟r步弱可逆拟(r,r)阶存贮线性有限自动机的任意一个状态,它的长r的输出权都是1;任何一个n元拟(r,r)阶存贮线性有限自动机M延迟r步弱可逆的充分必要条件是M都可以分解为一个延迟0步弱可逆有限自动机M0和一个延迟r步弱可逆拟(0,r)阶存贮线性有限自动机M1。相似文献

15.

Chenggang Clarence YAN Hui YU Weizhi XU Yingping ZHANG Bochuan CHEN Zhu TIAN Yuxuan WANG Jian YIN 《Frontiers of Computer Science》2015,9(3):431

It is an important task to improve performance for sparse matrix vector multiplication (SpMV), and it is a difficult task because of its irregular memory access. General purpose GPU (GPGPU) provides high computing ability and substantial bandwidth that cannot be fully exploited by SpMV due to its irregularity. In this paper, we propose two novel methods to optimize the memory bandwidth for SpMV on GPGPU. First, a new storage format is proposed to exploit memory bandwidth of GPU architecture more efficiently. The new storage format can ensure that there are as many non-zeros as possible in the format which is suitable to exploit the memory bandwidth of the GPU. Second, we propose a cache blocking method to improve the performance of SpMV on GPU architecture. The sparse matrix is partitioned into sub-blocks that are stored in CSR format. With the blocking method, the corresponding part of vector x can be reused in the GPU cache, so the time to access the global memory for vector x is reduced heavily. Experiments are carried out on three GPU platforms, GeForce 9800 GX2, GeForce GTX 480, and Tesla K40. Experimental results show that both new methods can efficiently improve the utilization of GPU memory bandwidth and the performance of the GPU. 相似文献

16.

一种将NFA到最小化DFA的方法 总被引：3，自引：0，他引：3

毛红梅聂承启《计算机与现代化》2004,(10):6-7,22

词法分析是编译程序重要阶段,有效的词法分析可提高编译程序的效率。本文提出用子集方法完成NFA到DFA并使用树型分割法实现DFA到最小化DFA的化简。相似文献

17.

下载免费PDF全文

Xian-He Sun Surendra Byna and Yong Chen 《计算机科学技术学报》2007,22(5):641-652

Data access delay is a major bottleneck in utilizing current high-end computing(HEC)machines.Prefetch- ing,where data is fetched before CPU demands for it,has been considered as an effective solution to masking data access delay.However,current client-initiated prefetching strategies,where a computing processor initiates prefetching instructions,have many limitations.They do not work well for applications with complex,non-contiguous data access patterns.While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice.In this paper,we present a server- based data-push approach and discuss its associated implementation mechanisms.In the server-push architecture,a dedicated server called Data Push Server(DPS)initiates and proactively pushes data closer to the client in time.Issues, such as what data to fetch,when to fetch,and how to push are studied.The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching.Simulation results show that L1 Cache miss rate can be reduced by up to 97%(71% on average)over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates. 相似文献

18.

Enhancing MC68030 performance using the SN74ACT2155 cache

Texas Instruments 《Microprocessors and Microsystems》1990,14(10):653-663

Texas Instruments highlights the important questions facing the system designer who needs to interface a cache to a fast microprocessor, giving an example solution using its cache chip 相似文献

19.

On the design of on-chip instruction caches

Carl McCrosky Brian ven der Buhs 《Microprocessors and Microsystems》1988,12(10):563-572

Cache memories reduce memory latency and traffic in computing systems. Most existing caches are implemented as board-based systems. Advancing VLSI technology will soon permit significant caches to be integrated on chip with the processors they support. In designing on-chip caches, the constraints of VLSI become significant. The primary constraints are economic limitations on circuit area and off-chip communications. The paper explores the design of on-chip instruction-only caches in terms of these constraints. The primary contribution of this work is the development of a unified economic model of on-chip instruction-only cache design which integrates the points of view of the cache designer and of the floorplan architect. With suitable data, this model permits the rational allocation of constrained resources to the achievement of a desired cache performance. Specific conclusions are that random line replacement is superior to LRU replacement, due to an increased flexibility in VLSI floorplan design; that variable set associativity can be an effective tool in regulating a chip's floorplan; and that sectoring permits area efficient caches while avoiding high transfer widths. Results are reported on economic functionality, from chip area and transfer width to miss ratio. These results, or the underlying analysis, can be used by microprocessor architects to make intelligent decisions regarding appropriate cache organizations and resource allocations. 相似文献

20.

存储系统模拟器SiMemSy的设计与实现

黄震春李三立马群生《小型微型计算机系统》2002,23(1):9-13

由于存储器间距日益扩大 ,存储系统对计算机系统整体性能的影响越来越严重 ,存储系统模拟器的研究与开发也日益重要 .传统的模拟器更多地将注意力集中于对 Cache的模拟 ,而对存储系统整体的模拟不够 .为了模拟并分析存储系统各部分的性能与其对存储系统整体性能的影响 ,本文设计并实现存储系统模拟器 Si Mem Sy(SImulator ofMEMory SYstem) .实验表明 ,Si Mem Sy可以准确、高效地对存储系统进行模拟并得到可信的结果相似文献