期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Analysis of the periodic update write policy for disk cache

Carson S.D. Setia S. 《IEEE transactions on pattern analysis and machine intelligence》1992,18(1):44-54

A disk cache is typically used in file systems to reduce average access time for data storage and retrieval. The `periodic update' write policy, widely used in existing computer systems, is one in which dirty cache blocks are written to a disk on a periodic basis. The average response time for disk read requests when the periodic update write policy is used is determined. Read and write load, cache-hit ratio, and the disk scheduler's ability to reduce service time under load are incorporated in the analysis, leading to design criteria that can be used to decide among competing cache write policies. The main conclusion is that the bulk arrivals generated by the periodic update policy cause a traffic jam effect which results in severely degraded service. Effective use of the disk cache and disk scheduling can alleviate this problem, but only under a narrow range of operating conditions. Based on this conclusion, alternate write packages that retain the periodic update policy's advantages and provide uniformly better service are proposed 相似文献

2.

基于时空局部性的层次化查询结果缓存机制

朱亚东郭嘉丰兰艳艳程学旗《中文信息学报》2016,30(1):63-71

查询结果缓存可以对查询结果的文档标识符集合或者实际的返回页面进行缓存,以提高用户查询的响应速度,相应的缓存形式可以分别称之为标识符缓存或页面缓存。对于固定大小的内存,标识符缓存可以获得更高的命中率,而页面缓存可以达到更高的响应速度。该文根据用户查询访问的时间局部性和空间局部性,提出了一种新颖的基于时空局部性的层次化结果缓存机制。首先,该机制将固定大小的结果缓存划分为两层:页面缓存和标识符缓存。对于用户提交的查询,该机制会首先使用第一层的页面缓存进行应答,如果未能命中,则继续尝试使用第二层的标识符缓存。实验显示这种层次化的缓存机制较传统的仅依赖于单一缓存形式的机制,在平均查询响应时间上,取得了可观的性能提升:例如,相对单纯的页面缓存,平均达到9%,最好情况下达到11%。其次,该机制在标识符缓存的基础上,设计了一种启发式的预取策略,对用户查询检索的空间局部性进行挖掘。实验显示,这种预取策略的融合,能进一步促进检索系统性能的有效提升,从而最终建立起一套时空完备的、有效的结果缓存机制。相似文献

3.

一种步长自适应二级cache预取机制

下载免费PDF全文

靳强郭阳鲁建壮《计算机工程与应用》2011,47(29):56-59

随着集成电路制造工艺的快速发展,片上实现大容量的cache成为可能,这从很大程度上降低了cache的失效率,与此同时,大容量的cache发生失效时的开销也更加显著。通过分析cache失效行为,设计了一种新的二级cache步长自适应预取机制,该机制充分利用了二级cache对指令地址不可见的特点,使用失效地址作为索引检查预取表。通过分析测试结果,选择了合适的结构参数,有效提高了cache性能。相似文献

4.

一种智能终端数据共享中的预取缓存技术

杨培勇赵志强孙鹏《微计算机应用》2012,1(4):31-36

针对智能终端数据共享中的网络延迟问题,本文提出一种两阶段,主动预取与被动预取相结合的数据预取缓存方法,减少网络延迟,提高用户体验。该方法利用网络空闲时间预取数据,减少用户等待时间;通过两阶段预取策略减少网络带宽消耗;通过主被动配合的预取算法来预取数据,提高预取准确率和预取效率;通过一种权重更新函数来更新客户端的缓存,减少对智能终端存储空间的消耗。实验表明使用此方法能减少用户等待时间58.2%,预取命中率为92%,带来的带宽损耗小于5%。相似文献

5.

Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems

Lee Jaejin Jung Changhee Lim Daeseob Solihin Yan 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(9):1309-1324

This paper presents a helper thread prefetching scheme that is designed to work on loosely coupled processors, such as in a standard chip multiprocessor (CMP) system or an intelligent memory system. Loosely coupled processors have an advantage in that resources such as processor and L1 cache resources are not contended by the application and helper threads, hence preserving the speed of the application. However, interprocessor communication is expensive in such a system. We present techniques to alleviate this. Our approach exploits large loop-based code regions and is based on a new synchronization mechanism between the application and helper threads. This mechanism precisely controls how far ahead the execution of the helper thread can be with respect to the application thread. We found that this is important in ensuring prefetching timeliness and avoiding cache pollution. To demonstrate that prefetching in a loosely coupled system can be done effectively, we evaluate our prefetching by simulating a standard unmodified CMP system and an intelligent memory system where a simple processor in memory executes the helper thread. Evaluating our scheme with nine memory-intensive applications with the memory processor in DRAM achieves an average speedup of 1.25. Moreover, our scheme works well in combination with a conventional processor-side sequential L1 prefetcher, resulting in an average speedup of 1.31. In a standard CMP, the scheme achieves an average speedup of 1.33. Using a real CMP system with a shared L2 cache between two cores, our helper thread prefetching plus hardware L2 prefetching achieves an average speedup of 1.15 over the hardware L2 prefetching for the subset of applications with high L2 cache misses per cycle. 相似文献

6.

Key factors in web latency savings in an experimental prefetching system

Bernardo A. de la Ossa Julio Sahuquillo Ana Pont Jose A. Gil 《Journal of Intelligent Information Systems》2012,39(1):187-207

Although Internet service providers and communications companies are continuously offering higher and higher bandwidths, users still complain about the high latency they perceive when downloading pages from the web. Therefore, latency can be considered as the main web performance metric from the user??s point of view. Many studies have demonstrated that web prefetching can be an interesting technique to reduce such latency at the expense of slightly increasing the network traffic. In this context, this paper presents an empirical study to investigate the maximum benefits that web users can expect from prefetching techniques in the current web. Unlike previous theoretical studies, this work considers a realistic prefetching architecture using real traces. In this way, the influence of real implementation constraints are considered and analyzed. The results obtained show that web prefetching could improve page latency up to 52% in the studied traces. 相似文献

7.

Integrating Fine-Grained Message Passing in Cache Coherent Shared Memory Multiprocessors

David K. Poulsen Pen-Chung Yew 《Journal of Parallel and Distributed Computing》1996,33(2):172

This paper considers the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency caused by interprocessor communication in cache coherent, shared memory multiprocessors. Data prefetching is accomplished by using a multiprocessor software pipelined algorithm. Data forwarding is used to target interprocessor data communication, rather than synchronization, and is applied to communication-related accesses between successive parallel loops. Prefetching and forwarding are each shown to be more effective for certain types of architectural and application characteristics. Given this result, a new hybrid prefetching and forwarding approach is proposed and evaluated that allows the relative amounts of prefetching and forwarding used to be adapted to these characteristics. When compared to prefetching or forwarding alone, the new hybrid scheme is shown to increase performance stability over varying application characteristics, to reduce processor instruction overheads, cache miss ratios, and memory system bandwidth requirements, and to reduce performance sensitivity to architectural parameters such as cache size. Algorithms for data prefetching, data forwarding, and hybrid prefetching and forwarding are described. These algorithms are applied by using a parallelizing compiler and are evaluated via execution-driven simulations of large, optimized, numerical application codes with loop-level and vector parallelism. 相似文献

8.

An effective pre-store/pre-load method exploiting intra-request idle time of NAND flash-based storage devices

《Microprocessors and Microsystems》2017

NAND flash-based storage devices (NFSDs) are widely employed owing to their superior characteristics when compared to hard disk drives. However, NAND flash memory (NFM) still exhibits drawbacks, such as a limited lifetime and an erase-before-write requirement. Along with effective software management, the implementation of a cache buffer is one of the most common solutions to overcome these limitations. However, the read/write performance becomes saturated primarily because the eviction overhead caused by limited DRAM capacity significantly impacts overall NFSD performance. This paper therefore proposes a method that hides the eviction overhead and overcomes the saturation of the read/write performance. The proposed method exploits the new intra-request idle time (IRIT) in NFSD and employs a new data management scheme. In addition, the new pre-store eviction scheme stores dirty page data in the cache to NFMs in advance. This reduces the eviction overhead by maintaining a sufficient number of clean pages in the cache. Further, the new pre-load insertion scheme improves the read performance by frequently loading data that needs to be read into the cache in advance. Unlike previous methods with large migration overhead, our scheme does not cause any eviction/insertion overhead because it actually exploits the IRIT to its advantage. We verified the effectiveness of our method, by integrating it into two cache management strategies which were then compared. Our proposed method reduced read latency by 43% in read-intensive traces, reduced write latency by 40% in write-intensive traces, and reduced read/write latency by 21% and 20%, respectively, on average compared to NFSD with a conventional write cache buffer. 相似文献

9.

Web object-based storage management in proxy caches

《Future Generation Computer Systems》2006,22(1-2):16-31

Proxy caches are essential to improve the performance of the World Wide Web and to enhance user perceived latency. Appropriate cache management strategies are crucial to achieve these goals. In our previous work, we have introduced Web object-based caching policies. A Web object consists of the main HTML page and all of its constituent embedded files. Our studies have shown that these policies improve proxy cache performance substantially.In this paper, we propose a new Web object-based policy to manage the storage system of a proxy cache. We propose two techniques to improve the storage system performance. The first technique is concerned with prefetching the related files belonging to a Web object, from the disk to main memory. This prefetching improves performance as most of the files can be provided from the main memory rather than from the proxy disk. The second technique stores the Web object members in contiguous disk blocks in order to reduce the disk access time. We used trace-driven simulations to study the performance improvements one can obtain with these two techniques. Our results show that the first technique by itself provides up to 50% reduction in hit latency, which is the delay involved in providing a hit document by the proxy. An additional 5% improvement can be obtained by incorporating the second technique. 相似文献

10.

An expiration age-based document placement scheme for cooperative Web caching

Ramaswamy L. Ling Liu 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(5):585-600

The sharing of caches among proxies is an important technique to reduce Web traffic, alleviate network bottlenecks, and improve response time of document requests. Most existing work on cooperative caching has been focused on serving misses collaboratively. Very few have studied the effect of cooperation on document placement schemes and its potential enhancements on cache hit ratio and latency reduction. We propose a new document placement scheme which takes into account the contentions at individual caches in order to limit the replication of documents within a cache group and increase document hit ratio. The main idea of this new scheme is to view the aggregate disk space of the cache group as a global resource of the group and uses the concept of cache expiration age to measure the contention of individual caches. The decision of whether to cache a document at a proxy is made collectively among the caches that already have a copy of this document. We refer to this new document placement scheme as the Expiration Age-based scheme (EA scheme). The EA scheme effectively reduces the replication of documents across the cache group, while ensuring that a copy of the document always resides in a cache where it is likely to stay for the longest time. We report our study on the potentials and limits of the EA scheme using both analytic modeling and trace-based simulation. The analytical model compares and contrasts the existing (ad hoc) placement scheme of cooperative proxy caches with our new EA scheme and indicates that the EA scheme improves the effectiveness of aggregate disk usage, thereby increasing the average time duration for which documents stay in the cache. The trace-based simulations show that the EA scheme yields higher hit rates and better response times compared to the existing document placement schemes used in most of the caching proxies. 相似文献

11.

基于存取模式的Cache预取自适应策略研究

周可张江陵冯丹万志坤《计算机工程与科学》2003,25(1):80-84

不同的Cache预取策略适用于不同的存取模式。本文介绍了存储系统Cache预取技术的研究现状，从分析存取模式出发，构造了存取模式三元组模型，并在磁盘阵列上测试了适用于复杂环境下的Cache预取自适应策略，结果证明，自适应策略能够在不同环境上获得磁盘阵列的最优性能。相似文献

12.

Prefetch-aware fingerprint cache management for data deduplication systems

Mei LI Hongjun ZHANG Yanjun WU Chen ZHAO 《Frontiers of Computer Science》2019,13(3):500

Data deduplication has been widely utilized in large-scale storage systems, particularly backup systems. Data deduplication systems typically divide data streams into chunks and identify redundant chunks by comparing chunk fingerprints. Maintaining all fingerprints in memory is not cost-effective because fingerprint indexes are typically very large. Many data deduplication systems maintain a fingerprint cache in memory and exploit fingerprint prefetching to accelerate the deduplication process. Although fingerprint prefetching can improve the performance of data deduplication systems by leveraging the locality of workloads, inaccurately prefetched fingerprints may pollute the cache by evicting useful fingerprints. We observed that most of the prefetched fingerprints in a wide variety of applications are never used or used only once, which severely limits the performance of data deduplication systems. We introduce a prefetch-aware fingerprint cache management scheme for data deduplication systems (PreCache) to alleviate prefetch-related cache pollution. We propose three prefetch-aware fingerprint cache replacement policies (PreCache-UNU, PreCache-UOO, and PreCache-MIX) to handle different types of cache pollution. Additionally, we propose an adaptive policy selector to select suitable policies for prefetch requests. We implement PreCache on two representative data deduplication systems (Block Locality Caching and SiLo) and evaluate its performance utilizing three real-world workloads (Kernel, MacOS, and Homes). The experimental results reveal that PreCache improves deduplication throughput by up to 32.22% based on a reduction of on-disk fingerprint index lookups and improvement of the deduplication ratio by mitigating prefetch-related fingerprint cache pollution. 相似文献

13.

Strip-oriented asynchronous prefetching for parallel disk systems

Yang LIU Jian-zhong HUANG Xiao-dong SHI Qiang CAO Chang-sheng XIE 《浙江大学学报:C卷英文版》2012,(11):799-815

Sequential prefetching schemes are widely employed in storage servers to mask disk latency and improve system throughput. However, existing schemes cannot benefit parallel disk systems as expected due to the fact that they ignore the distinct internal characteristics of the parallel disk system, in particular, data striping. Moreover, their aggressive prefetching pattern suffers from premature evictions and prolonged request latencies. In this paper, we propose a strip-oriented asynchronous prefetching (SoAP) technique, which is dedicated to the parallel disk system. It settles the above-mentioned problems by providing multiple novel features, e.g., enhanced prediction accuracy, adaptive prefetching strength, physical data layout awareness, and timely prefetching. To validate SoAP, we implement a prototype by modifying the software redundant arrays of inexpensive disks (RAID) under Linux. Experimental results demonstrate that SoAP can consistently offer improved average response time and throughput to the parallel disk system under non-random workloads compared with STEP, SP, ASP, and Linux-like SEQPs. 相似文献

14.

Sequential hardware prefetching in shared-memory multiprocessors

Dahlgren F. Dubois M. Stenstrom P. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(7):733-746

To offset the effect of read miss penalties on processor utilization in shared-memory multiprocessors, several software- and hardware-based data prefetching schemes have been proposed. A major advantage of hardware techniques is that they need no support from the programmer or compiler. Sequential prefetching is a simple hardware-controlled prefetching technique which relies on the automatic prefetch of consecutive blocks following the block that misses in the cache, thus exploiting spatial locality. In its simplest form, the number of prefetched blocks on each miss is fixed throughout the execution. However, since the prefetching efficiency varies during the execution of a program, we propose to adapt the number of pre-fetched blocks according to a dynamic measure of prefetching effectiveness. Simulations of this adaptive scheme show reductions of the number of read misses, the read penalty, and of the execution time by up to 78%, 58%, and 25% respectively 相似文献

15.

IP Routing table compaction and sampling schemes to enhance TCAM cache performance

Ruirui Guo José G. Delgado-Frias 《Journal of Systems Architecture》2009,55(1):61-69

Routing table lookup is an important operation in packet forwarding. This operation has a significant influence on the overall performance of the network processors. Routing tables are usually stored in main memory which has a large access time. Consequently, small fast cache memories are used to improve access time. In this paper, we propose a novel routing table compaction scheme to reduce the number of entries in the routing table. The proposed scheme has three versions. This scheme takes advantage of ternary content addressable memory (TCAM) features. Two or more routing entries are compacted into one using don’t care elements in TCAM. A small compacted routing table helps to increase cache hit rate; this in turn provides fast address lookups. We have evaluated this compaction scheme through extensive simulations involving IPv4 and IPv6 routing tables and routing traces. The original routing tables have been compacted over 60% of their original sizes. The average cache hit rate has improved by up to 15% over the original tables. We have also analyzed port errors caused by caching, and developed a new sampling technique to alleviate this problem. The simulations show that sampling is an effective scheme in port error-control without degrading cache performance. 相似文献

16.

Correlation prefetching with a user-level memory thread

Solihin Y. Lee J. Torrellas J. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(6):563-580

This paper proposes using a user-level memory thread (ULMT) for correlation prefetching. In this approach, a user thread runs on a general-purpose processor in main memory, either in the memory controller chip or in a DRAM chip. The thread performs correlation prefetching in software, sending the prefetched data into the L2 cache of the main processor. This approach requires minimal hardware beyond the memory processor: The correlation table is a software data structure that resides in main memory, while the main processor only needs a few modifications to its L2 cache so that it can accept incoming prefetches. In addition, the approach has wide applicability, as it can effectively prefetch even for irregular applications. Finally, it is very flexible, as the prefetching algorithm can be customized by the user on an application basis. Our simulation results show that, through a new design of the correlation table and prefetching algorithm, our scheme delivers good results. Specifically, nine mostly-irregular applications show an average speedup of 1.32. Furthermore, our scheme works well in combination with a conventional processor-side sequential prefetcher, in which case the average speedup increases to 1.46. Finally, by exploiting the customization of the prefetching algorithm, we increase the average speedup to 1.53. 相似文献

17.

The impact of parallel loop scheduling strategies on prefetching ina shared memory multiprocessor

Lilja D.J. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(6):573-584

Trace-driven simulations of numerical Fortran programs are used to study the impact of the parallel loop scheduling strategy on data prefetching in a shared memory multiprocessor with private data caches. The simulations indicate that to maximize memory performance, it is important to schedule blocks of consecutive iterations to execute on each processor, and then to adaptively prefetch single-word cache blocks to match the number of iterations scheduled. Prefetching multiple single-word cache blocks on a miss reduces the miss ratio by approximately 5% to 30% compared to a system with no prefetching. In addition, the proposed adaptive prefetching scheme further reduces the miss ratio while significantly reducing the false sharing among cache blocks compared to nonadaptive prefetching strategies. Reducing the false sharing causes fewer coherence invalidations to be generated, and thereby reduces the total network traffic. The impact of the prefetching and scheduling strategies on the temporal distribution of coherence invalidations also is examined. It is found that invalidations tend to be evenly distributed throughout the execution of parallel loops, but tend to be clustered when executing sequential program sections. The distribution of invalidations in both types of program sections is relatively insensitive to the prefetching and scheduling strategy 相似文献

18.

分布式文件系统元数据服务的负载均衡框架

孙耀刘杰叶丹钟华《软件学报》2016,27(12):3192-3207

请求负载均衡,是分布式文件系统元数据管理需要面对的核心问题.以最大化元数据服务器集群吞吐量为目标,在已有元数据管理层之上设计实现了一种分布式缓存框架,专门管理热点元数据,均衡不断变化的负载.与已有的元数据负载均衡架构相比,这种两层的负载均衡架构灵活度更高,对负载的感知能力更强,并且避免了热点元数据重新分布、迁移引起的元数据命名空间结构被破坏的情况.经观察分析,元数据尺寸小、数量大,预取错误元数据带来的代价远远小于预取错误数据带来的代价.针对元数据的以上鲜明特点,提出一种元数据预取策略和基于预取机制的元数据缓存替换算法,加强了上述分布式缓存层的性能,这种两层的元数据负载均衡框架同时考虑了缓存一致性的问题.最后,在一个真实的分布式文件系统中验证了框架及方法的有效性. 相似文献

19.

一种面向实时系统的程序基本块指令预取技术

王恩东倪璠陈继承王洪伟唐士斌《软件学报》2016,27(9):2426-2442

面向通用计算机系统的指令预取技术无法满足实时系统的应用需求,其中一个重要原因是：无效预取引起的指令Cache内容污染使得实时任务WCET评估值不够精确,导致系统可调度性下降,严重影响系统效率.以简化实时任务WCET分析、降低任务WCET评估值为目标,提出一种基于程序基本块的指令预取方法.该方法以基本块为粒度执行指令预取,避免了传统指令预取技术引入的无效预取;通过简化最坏情况下的指令访问命中/缺失情况判定,简化任务WCET分析过程并优化WCET评估值.实时基准测试程序评估结果表明：与常规无预取方法相比,该预取方法可使实时任务WCET评估值降低约20%,平均执行情况下的指令Cache访问性能提升约10%. 相似文献

20.

结合访存失效队列状态的预取策略 总被引：1，自引：0，他引：1

郇丹丹李祖松胡伟武刘志勇《计算机学报》2007,30(7):1104-1114

随着存储系统的访问速度与处理器的运算速度的差距越来越显著,访存性能已成为提高计算机系统性能的瓶颈.通过对指令Cache和数据Cache失效行为的分析,提出一种预取策略--结合访存失效队列状态的预取策略.该预取策略保持了指令和数据访问的次序,有利于预取流的提取.并将指令流和数据流的预取相分离,避免相互替换.在预取发起时机的选择上,不但考虑当前总线是否空闲,而且结合访存失效队列的状态,减小对处理器正常访存请求的影响.通过流过滤机制提高预取准确性,降低预取对访存带宽的需求.结果表明,采用结合访存失效队列状态的预取策略,处理器的平均访存延时减少30%,SPEC CPU2000程序的IPC值平均提高8.3%. 相似文献