首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The traditional dynamic random-access memory (DRAM) storage medium can be integrated on chips via modern emerging 3D-stacking technology to architect a DRAM shared cache in multicore systems. Compared with static random-access memory (SRAM), DRAM is larger but slower. In the existing research, a lot of work has been devoted to improving the workload performance using SRAM and stacked DRAM together in shared cache systems, ranging from SRAM structure improvement to optimizing cache tags and data access. However, little attention has been paid to designing a shared cache scheduling scheme for multiprogrammed workloads with different memory footprints in multicore systems. Motivated by this, we propose a hybrid shared cache scheduling scheme that allows a multicore system to utilize SRAM and 3D-stacked DRAM efficiently, thus achieving better workload performance. This scheduling scheme employs (1) a cache monitor, which is used to collect cache statistics; (2) a cache evaluator, which is used to evaluate the cache information during the process of programs being executed; and (3) a cache switcher, which is used to self-adaptively choose SRAM or DRAM shared cache modules. A cache data migration policy is naturally developed to guarantee that the scheduling scheme works correctly. Extensive experiments are conducted to evaluate the workload performance of our proposed scheme. The experimental results showed that our method can improve the multiprogrammed workload performance by up to 25% compared with state-of-the-art methods (including conventional and DRAM cache systems).  相似文献   

2.
Web caching is a widely deployed technique to reduce the load to web servers and to reduce the latency for web browsers. Peer-to-Peer (P2P) web caching has been a hot research topic in recent years as it can create scalable and robust designs for decentralized internet-scale applications. However, many P2P web caching systems suffer expensive overheads such as lookup and publish messages, and lack locality awareness. In this paper, we present the development of a locality aware cache diffusion system that makes use of routing table locality, aggregation, and soft state to overcome these limitations. The analysis and experiments show that our cache diffusion system reduces the amount of information processed by nodes, reduces the number of index messages sent by nodes, and improves the locality of cache pointers.  相似文献   

3.
The job-scheduling function in a multiprogramming computer system plays a key role in the achievement of the performance goals for the system. It is possible and convenient to partition this scheduling function into a priority assignment function and resource assignment function. Implementation of a general purpose resource assignment module permits a large family of scheduling strategies to be implemented via different priority calculation schemes. The implications of this partitioning of the scheduling function are studied by use of a simulation model. A system embodying this approach to job scheduling is discussed as an application of the approach to other types of systems.  相似文献   

4.
Summary An earlier paper on an adaptive workload balancing strategy to achieve efficient resource utilization is extended to include memory cost. The scheme corrects for the imbalance in resource utilization by bringing into play the “Invisible Hand” of classical economics. In this framework, the memory price and the bid, W ij, of each user program U jfor resource R iare calculated adaptively. These prices are not apparent to the users, but they are rather shadow prices determined from the characteristics of user programs, the current resource congestion, and the budget constraints of users. There emerge a set of effective priorities for user programs based on the bid prices, W ij, and these determine the scheduling of programs. An imbalance due to high congestion of some resource will result in a relative decrease of priority for heavy users ofthat resource and a relative increase for light users. Some ideas on an actual implementation scheme are described based on an approximation to the above abstract scheme. It is shown that this approximate scheme will also tend to balance resource utilization.  相似文献   

5.
《Parallel Computing》2014,40(5-6):59-69
We present a cache-aware method for accelerating texture-based volume rendering on a graphics processing unit (GPU). Because a GPU has hierarchical architecture in terms of processing and memory units, cache optimization is important to maximize performance for memory-intensive applications. Our method localizes texture memory reference according to the location of the viewpoint and dynamically selects the width and height of thread blocks (TBs) so that each warp, which is a series of 32 threads processed simultaneously, can minimize memory access strides. We also incorporate transposed indexing of threads to perform TB-level cache optimization for specific viewpoints. Furthermore, we maximize TB size to exploit spatial locality with fewer resident TBs. For viewpoints with relatively large strides, we synchronize threads of the same TB at regular intervals to realize synchronous ray propagation. Experimental results indicate that our cache-aware method doubles the worst rendering performance compared to those provided by the CUDA and OpenCL software development kits.  相似文献   

6.
This paper proposes a locality correlation preserving based support vector machine (LCPSVM) by combining the idea of margin maximization between classes and local correlation preservation of class data. It is a Support Vector Machine (SVM) like algorithm, which explicitly considers the locality correlation within each class in the margin and the penalty term of the optimization function. Canonical correlation analysis (CCA) is used to reveal the hidden correlations between two datasets, and a variant of correlation analysis model which implements locality preserving has been proposed by integrating local information into the objective function of CCA. Inspired by the idea used in canonical correlation analysis, we propose a locality correlation preserving within-class scatter matrix to replace the within-class scatter matrix in minimum class variance support machine (MCVSVM). This substitution has the property of keeping the locality correlation of data, and inherits the properties of SVM and other similar modified class of support vector machines. LCPSVM is discussed under linearly separable, small sample size and nonlinearly separable conditions, and experimental results on benchmark datasets demonstrate its effectiveness.  相似文献   

7.
对基于记录Cache victim来发现模式局部性的几种方法进行了研究.记录victim决定的一个好处是减少相关信息的复制,并保存victim行当前的位置.典型的Cache可以发现时间局部性和空间局部性.这里要考虑的是如何发现模式的局部性--指的是上一次相邻访问的行在下次仍被一起访问的特性.描述了一些记录模式局部性的新型Cache结构,以及通过几个追踪仿真得到的缺失率和通信传输性能.显示出基于victim统计的模式局部性信息对于提高预取决定的质量帮助很大.  相似文献   

8.
Parallel applications typically do not perform well in a multiprogrammed environment that uses time‐sharing to allocate processor resources to the applications' parallel threads. Co‐scheduling related parallel threads, or statically partitioning the system, often can reduce the applications' execution times, but at the expense of reducing the overall system utilization. To address this problem, there has been increasing interest in dynamically allocating processors to applications based on their resource demands and the dynamically varying system load. The Loop‐Level Process Control (LLPC) policy (Yue K, Lilja D. Efficient execution of parallel applications in multiprogrammed multiprocessor systems. 10th International Parallel Processing Symposium, 1996; 448–456) dynamically adjusts the number of threads an application is allowed to execute based on the application's available parallelism and the overall system load. This study demonstrates the feasibility of incorporating the LLPC strategy into an existing commercial operating system and parallelizing compiler and provides further evidence of the performance improvement that is possible using this dynamic allocation strategy. In this implementation, applications are automatically parallelized and enhanced with the appropriate LLPC hooks so that each application interacts with the modified version of the Solaris operating system. The parallelism of the applications are then dynamically adjusted automatically when they are executed in a multiprogrammed environment so that all applications obtain a fair share of the total processing resources. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

9.
提出了一种基于嵌入保局投影的人脸表情识别方法,称之为ELPP。通过降维处理,ELPP不仅保留邻空间的局部结构信息,通过样本化的图像嵌入处理,还保留了嵌入空间的数据信息,这样可从原始表情数据中提取更多更有效更具判决性的表情特征信息。基于JAFFE和CED-WYU两个表情数据库的识别结果表明,基于ELPP的特征提取方法能有效地改善识别效果。  相似文献   

10.
研究表明基于整体思想的人脸识别方法由于忽略图像的局部信息,在识别性能方面不如局部信息特征保持较好的基于子模块思想的识别算法。基于应用流形技术对图像降维后能够较好保持非线性子流形中的局部数据流形结构,提出了一种改进的子模式局部保持映射人脸识别算法。其主要思想是将同类的不同图像一并划分子集,由同位置子图组成子模块,并对子模块运用LPP算法学习其流形结构,与将不同类图像一并划分子集学习流形的方法不同。实验表明,该算法能更好地保持人脸图像的局部流形结构和信息特征,提高了识别率。  相似文献   

11.
In face recognition, when the number of images in the training set is much smaller than the number of pixels in each image, Locality Preserving Projections (LPP) often suffers from the singularity problem. To overcome singularity problem, principal component analysis is applied as a preprocessing step. But this procession may discard some important discriminative information. In this paper, a novel algorithm called Optimal Locality Preserving Projections (O-LPP) is proposed. The algorithm transforms the singular eigensystem computation to eigenvalue decomposition problems without losing any discriminative information, which can reduce the computation complexity. And the theoretical analysis related to the algorithm is also obtained. Extensive experiments on face databases demonstrate the proposed algorithm is superior to the traditional LPP algorithm.  相似文献   

12.
In this paper, we consider scheduling of a multi-item single stage production-inventory system in the presence of uncertainty regarding demand patterns, production times and switchover times. For a given specification of base-stock levels of individual items and under (S − 1, S) requests for replenishment policy, a mathematical program to minimize long-run average system wide costs is formulated. We derive approximations for the first two moments of demand over lead time using residual service analysis of vacation queue models. Subsequently, we develop an approximate convex program for the original cost model and determine optimal production frequencies for individual types. Based on these relative frequencies, we determine a table size and devise an efficient heuristic to construct a tabular sequence in which individual items appear according to their respective absolute frequencies and items are positioned such that variance of their inter-visit times is minimized. A numerical study that demonstrates effectiveness of the proposed policy against cyclic policies is given.  相似文献   

13.
A new cache architecture based on temporal and spatial locality   总被引:5,自引:0,他引:5  
A data cache system is designed as low power/high performance cache structure for embedded processors. Direct-mapped cache is a favorite choice for short cycle time, but suffers from high miss rate. Hence the proposed dual data cache is an approach to improve the miss ratio of direct-mapped cache without affecting this access time. The proposed cache system can exploit temporal and spatial locality effectively by maximizing the effective cache memory space for any given cache size. The proposed cache system consists of two caches, i.e., a direct-mapped cache with small block size and a fully associative spatial buffer with large block size. Temporal locality is utilized by caching candidate small blocks selectively into the direct-mapped cache. Also spatial locality can be utilized aggressively by fetching multiple neighboring small blocks whenever a cache miss occurs. According to the results of comparison and analysis, similar performance can be achieved by using four times smaller cache size comparing with the conventional direct-mapped cache.And it is shown that power consumption of the proposed cache can be reduced by around 4% comparing with the victim cache configuration.  相似文献   

14.
For face recognition, graph embedding techniques attempt to produce a high data locality projection for better recognition performance. However, estimation of population data locality could be severely biased due to small number of training samples. The biased estimation triggers overfitting problem and hence poor generalization. In this paper, we propose a new linear graph embedding technique based upon an adaptive locality preserving regulation model (ALPRM), known as Regularized Locality Preserving Discriminant Embedding (RLPDE). In RLPDE, the projection features are regulated based on ALPRM to approach population data locality, which can directly enhance the locality preserving capability of the projection features. This paper also presents the relation between locality preserving capability and class discrimination. Specifically, we show that the optimization of the locality preserving function minimizes the within-class variability. Experiments on three face datasets such as PIE, FRGC and FERET show the promising performance of the proposed technique.  相似文献   

15.
Multimedia Tools and Applications - In this paper, aiming at the drawback of the popular dimensionality reduction method Discriminant Sparse Neighborhood Preserving Embedding(DSNPE), i.e. the...  相似文献   

16.
已知样本与待识别样本的视角差异是影响步态识别精度的主要因素,子空间方法将不同视角的步态投影到公共子空间,能有效避免视角差异的影响.但现有方法多通过学习投影矩阵对样本进行线性投影,难以保持多视角步态数据的原始非线性结构.针对于此,本文提出多非线性多视角局部保持投影.先用非线性函数族实现样本的多次非线性投影,再基于局部结构保持原则将不同视角的样本投影到公共子空间,最后在公共子空间中进行最近邻分类识别.在多视角步态库CASIA(B)进行步态识别实验,结果表明本文方法在多种视角组合下优于其它投影方法.  相似文献   

17.
Regularized locality preserving discriminant analysis for face recognition   总被引:1,自引:0,他引:1  
This paper proposes a regularized locality preserving discriminant analysis (RLPDA) approach for facial feature extraction and recognition. The RLPDA approach decomposes the eigenspace of the locality preserving within-class scatter matrix into three subspaces, i.e., the face space, the noise space and the null space, and then regularizes the three subspaces differently according to their predicted eigenvalues. As a result, the proposed approach integrates discriminative information in all of the three subspaces, de-emphasizes the effect of the eigenvectors corresponding to the small eigenvalues, and meanwhile suppresses the small sample size problem. Extensive experiments on ORL face database, FERET face subset and UMIST face database illustrate the effectiveness of the proposed approach.  相似文献   

18.
19.
一种新的有监督的局部保持典型相关分析算法   总被引:2,自引:0,他引:2       下载免费PDF全文
从模式识别的角度出发,在局部保持典型相关分析的基础上,提出一种有监督的局部保持典型相关分析算法(SALPCCA)。该方法在构造样本近邻图时将样本的类别信息考虑在内,由样本间的距离度量确定权重,建立样本间的多重权重相关,通过使同类内的成对样本及其近邻间的权重相关性最大,从而能够在利用样本的类别信息的同时,也能保持数据的局部结构信息。此外,为了能够更好地提取样本的非线性信息,将特征集映射到核特征空间,又提出一种核化的SALPCCA(KSALPCCA)算法。在ORL、Yale、AR等人脸数据库上的实验结果表明,该方法较其他的传统典型相关分析方法有着更好的识别效果。  相似文献   

20.
Exploiting cache locality of parallel programs at runtime is a complementary approach to a compiler optimization. This is particularly important for those applications with dynamic memory access patterns. We propose a memory-layout oriented technique to exploit cache locality of parallel loops at runtime on Symmetric Multiprocessor (SMP) systems. Guided by application-dependent and targeted architecture-dependent hints, our system, called Cacheminer, reorganizes and partitions a parallel loop using the memory-access space of its execution. Through effective runtime transformations, our system maximizes the data reuse in each partitioned data region assigned in a cache, and minimizes the data sharing among the partitioned data regions assigned to all caches. The executions of tasks in the partitions are scheduled in an adaptive and locality-presented way to minimize the execution time of programs by trading off load balance and locality. We have implemented the Cacheminer runtime library on two commercial SMP servers and an SimCS simulated SMP. Our simulation and measurement results show that our runtime approach can achieve comparable performance with the compiler optimizations for programs with regular computation and memory-access patterns, whose load balance and cache locality can be well optimized by the tiling and other program transformations. However, our experimental results show that our approach is able to significantly improve the memory performance for the applications with irregular computation and dynamic memory access patterns. These types of programs are usually hard to optimize by static compiler optimizations  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号