首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Typical request processing systems, such as web servers and database servers, try to accommodate all requests as fast as possible, which can be described as a Best-Effort approach. However, different application items may have different quality-of-service (QoS) requirements, and this can be viewed as an orthogonal concern to the basic system functionality. In this paper we propose the QoS-Broker, a middleware for delivering QoS over servers and applications. We show its architecture to support contracts over varied targets including queries, transactions, services or sessions, also allowing expressions on variables to be specified in those targets. We also discuss how the QoS-Broker implements basic strategies for QoS over workloads. Our experimental results illustrate the middleware by applying priority and weighted- fair-queuing based differentiation over clients and over transactions, and also admission control, using a benchmark as a case-study.  相似文献   

2.
The traditional dynamic random-access memory (DRAM) storage medium can be integrated on chips via modern emerging 3D-stacking technology to architect a DRAM shared cache in multicore systems. Compared with static random-access memory (SRAM), DRAM is larger but slower. In the existing research, a lot of work has been devoted to improving the workload performance using SRAM and stacked DRAM together in shared cache systems, ranging from SRAM structure improvement to optimizing cache tags and data access. However, little attention has been paid to designing a shared cache scheduling scheme for multiprogrammed workloads with different memory footprints in multicore systems. Motivated by this, we propose a hybrid shared cache scheduling scheme that allows a multicore system to utilize SRAM and 3D-stacked DRAM efficiently, thus achieving better workload performance. This scheduling scheme employs (1) a cache monitor, which is used to collect cache statistics; (2) a cache evaluator, which is used to evaluate the cache information during the process of programs being executed; and (3) a cache switcher, which is used to self-adaptively choose SRAM or DRAM shared cache modules. A cache data migration policy is naturally developed to guarantee that the scheduling scheme works correctly. Extensive experiments are conducted to evaluate the workload performance of our proposed scheme. The experimental results showed that our method can improve the multiprogrammed workload performance by up to 25% compared with state-of-the-art methods (including conventional and DRAM cache systems).  相似文献   

3.
We present approaches to the generation of synthetic workloads for benchmarking multiplayer online gaming infrastructures. Existing techniques, such as mobility or traffic models, are often either too simple to be representative for this purpose or too specific for a particular network structure. Desirable properties of a workload are reproducibility, representativeness, and scalability to any number of players. We analyze different mobility models and AI-based workload generators. Real gaming sessions with human players using the prototype game Planet PI4 serve as a reference workload. Novel metrics are used to measure the similarity between real and synthetic traces with respect to neighborhood characteristics. We found that, although more complicated to handle, AI players reproduce real workload characteristics more accurately than mobility models.  相似文献   

4.
The flash-based SSD is used as a tiered cache between RAM and HDD. Conventional schemes do not utilize the nonvolatile feature of SSD and cannot cache write requests. Writes are a significant, or often dominant, fraction of storage workloads. To cache write requests, the SSD cache should persistently and consistently manage its data and metadata, and guarantee no data loss even after a crash. Persistent cache management may require frequent metadata changes and causes high overhead. Some researchers insist that a nonvolatile persistent cache requires new additional primitives that are not supported by general SSDs in the market. We proposed a fully persistent read/write cache, which improves both read and write performance, does not require any special primitive, has a low overhead, guarantees the integrity of the cache metadata and the consistency of the cached data, even during a crash or power failure, and is able to recover the flash cache quickly without any data loss. We implemented the persistent read/write cache as a block device driver in Linux. Our scheme aims at virtual desktop infra servers. So the evaluation was performed with massive, real desktop traces of five users for ten days. The evaluation shows that our scheme outperforms an LRU version of SSD cache by 50% and the read-only version of our scheme by 37%, on average, for all experiments. This paper describes most of the parts of our scheme in detail. Detailed pseudo-codes are included in the Appendix.  相似文献   

5.
This paper focuses on energy consumption which is a major problem in the dark silicon era. As energy consumption becomes a key issue for operation and maintenance of cloud data centers, cloud computing providers are becoming significantly concerned. Here, we show how spin-transfer torque random access memory (STT-RAM) can be used as an on-chip L2 cache to obtain lower energy compared to conventional L2 caches, like SRAM. High density, fast read access and non-volatility make STT-RAM a significant technology for on-chip memories. Previous studies have mainly studied specific schemes based on common applications and do not provide a thorough analysis of emerging scale-out applications with multiple design options. Here, we discuss different outlooks consisting of performance and energy efficiency in cloud processors by running emerging scale-out workloads. Experiment results on the CloudSuite benchmarks show that the proposed method reduces energy by 51% (on average) and improves energy delay product by 37% (on average) where instruction per cycle degradation is only 22% (on average) compared to the SRAM method.  相似文献   

6.
A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Consequently, the knowledge embedded in a data stream is more likely to be changed as time goes by. Identifying the recent change of a data stream, especially for an online data stream, can provide valuable information for the analysis of the data stream. However, most of mining algorithms or frequency approximation algorithms over a data stream do not differentiate the information of recently generated data elements from the obsolete information of old data elements which may be no longer useful or possibly invalid at present. Therefore, they are not able to extract the recent change of information in a data stream adaptively. This paper proposes a data mining method for finding recently frequent itemsets adaptively over an online transactional data stream. The effect of old transactions on the current mining result of a data steam is diminished by decaying the old occurrences of each itemset as time goes by. Furthermore, several optimization techniques are devised to minimize processing time as well as memory usage. Finally, the performance of the proposed method is analyzed by a series of experiments to identify its various characteristics.  相似文献   

7.
With the emerging of 3D-stacking technology, the dynamic random-access memory (DRAM) can be stacked on chips to architect the DRAM last level cache (LLC). Compared with static randomaccess memory (SRAM), DRAM is larger but slower. In the existing research papers, a lot of work has been devoted to improving the workload performance using SRAM and stacked DRAM together, ranging from SRAM structure improvement, to optimizing cache tag and data access. Instead, little attention has been paid to designing an LLC scheduling scheme for multi-programmed workloads with different memory footprints. Motivated by this, we propose a self-adaptive LLC scheduling scheme, which allows us to utilize SRAM and 3D-stacked DRAM efficiently, achieving better workload performance. This scheduling scheme employs (1) an evaluation unit, which is used to probe and evaluate the cache information during the process of programs being executed; and (2) an implementation unit, which is used to self-adaptively choose SRAM or DRAM. To make the scheduling scheme work correctly, we develop a data migration policy. We conduct extensive experiments to evaluate the performance of our proposed scheme. Experimental results show that our method can improve the multi-programmed workload performance by up to 30% compared with the state-of-the-art methods.  相似文献   

8.
本文对基于Redis缓存的系统怎样进行缓存数据容量扩展展开探讨,并提出具体的解决方案,从而使缓存数据容量在进行扩展的过程中,减少对系统的影响,保持系统的服务提供。  相似文献   

9.
基于Flash控制器的FPGA在线加载功能设计   总被引:2,自引:1,他引:1  
传统的FPGA程序更新的方式是使用开发工具通过JTAG方式将FPGA程序固化至存储器件Nor Flash中,当某一复杂系统内需要更新多块FPGA时,JTAG方式由于同时只能更新一块FPGA,耗费时间长,并且还必须连接线缆,无法实现远程更新。因此,提出了一种FPGA在线更新程序的实现方案,该方案可以实现系统内的多块FPGA程序更新,最大化更新速度的同时,可通过网络实现远程更新,便于调试及远程升级。  相似文献   

10.
The transactional approach to contention management guarantees atomicity by aborting transactions that may violate consistency. A major challenge in this approach is to schedule transactions in a manner that reduces the total time to perform all transactions (the makespan), since transactions are often aborted and restarted. The performance of a transactional scheduler can be evaluated by the ratio between its makespan and the makespan of an optimal, clairvoyant scheduler that knows the list of resource accesses that will be performed by each transaction, as well as its release time and duration.  相似文献   

11.
Kunkel  S. Armstrong  B. Vitale  P. 《Micro, IEEE》1999,19(3):56-64
Major performance enhancements in large commercial systems are best achieved when advances in hardware technology are matched with advances in software technology. This article connects recent AS/400 hardware advances with the corresponding approaches used to tune the system performance for large online transaction processing (OLTP) workloads. We particularly emphasize those tuning efforts that affect the memory system. OLTP workloads are large and complex, stressing many parts of both the software and hardware. These workloads quickly expose software bottlenecks caused by contention on software locks. They also have large working sets, populated with hard-to-predict access patterns that make cache miss rates high. This causes the processor to spend a significant part of its execution time waiting for memory accesses. In multiprocessor systems, compilers alone have minimal effect on cycles spent in storage latency. Other optimizations are needed to affect this portion of the execution time, and many of those require direct involvement of the system software  相似文献   

12.
Miniatures are an alternative to icons for the representation of a large graphical object such as a window in a reduced format. A front end user interface to an existing videotex system was implemented using icons as well as miniatures to represent previously seen frames in a visual cache, and an empirical comparison showed that users had the same performance with the two representations but subjectively preferred icons.  相似文献   

13.
Analytical workloads in data warehouses often include heavy joins where queries involve multiple fact tables in addition to the typical star-patterns, dimensional grouping and selections. In this paper we propose a new processing and storage framework called bitwise dimensional co-clustering (BDCC) that avoids replication and thus keeps updates fast, yet is able to accelerate all these foreign key joins, efficiently support grouping and pushes down most dimensional selections. The core idea of BDCC is to cluster each table on a mix of dimensions, each possibly derived from attributes imported over an incoming foreign key and this way creating foreign key connected tables with partially shared clusterings. These are later used to accelerate any join between two tables that have some dimension in common and additionally permit to push down and propagate selections (reduce I/O) and accelerate aggregation and ordering operations. Besides the general framework, we describe an algorithm to derive such a physical co-clustering database automatically and describe query processing and query optimization techniques that can easily be fitted into existing relational engines. We present an experimental evaluation on the TPC-H benchmark in the Vectorwise system, showing that co-clustering can significantly enhance its already high performance and at the same time significantly reduce the memory consumption of the system.  相似文献   

14.
Transactional memory is an alternative to locks for handling concurrency in multi-threaded environments. Instead of providing critical regions that only one thread can enter at a time, transactional memory records sufficient information to detect and correct for conflicts if they occur. This paper surveys the range of options for implementing software transactional memory in Scala. Where possible, we provide references to implementations that instantiate each technique. As part of this survey, we document for the first time several techniques developed in the implementation of Manchester University Transactions for Scala. We order the implementation techniques on a scale moving from the least to the most invasive in terms of modifications to the compilation and runtime environment. This shows that, while the less invasive options are easier to implement and more common, they are more verbose and invasive in the codes using them, often requiring changes to the syntax and program structure throughout the code.  相似文献   

15.
To evaluate the performance of database applications and database management systems (DBMSs), we usually execute workloads of queries on generated databases of different sizes and then benchmark various measures such as respond time and throughput. This paper introduces MyBenchmark, a parallel data generation tool that takes a set of queries as input and generates database instances. Users of MyBenchmark can control the characteristics of the generated data as well as the characteristics of the resulting workload. Applications of MyBenchmark include DBMS testing, database application testing, and application-driven benchmarking. In this paper, we present the architecture and the implementation algorithms of MyBenchmark. Experimental results show that MyBenchmark is able to generate workload-aware databases for a variety of workloads including query workloads extracted from TPC-C, TPC-E, TPC-H, and TPC-W benchmarks.  相似文献   

16.
We consider the problem of implementing transactional memory in large-scale distributed networked systems. We present Spiral, a novel distributed directory-based protocol for transactional memory, and theoretically analyze and experimentally evaluate it for the performance boundaries of this approach from the worst-case perspective. Spiral is designed for the data-flow distributed implementation of software transactional memory which supports three basic operations: publish, allowing a shared object to be inserted in the directory so that other nodes can find it; lookup, providing a read-only copy of the object to the requesting node; move, allowing the requesting node to write the object locally after the node gets it. The protocol runs on a hierarchical directory construction based on sparse covers, where clusters at each level are ordered to avoid race conditions while serving concurrent requests. Given a shared object the protocol maintains a directory path pointing to the object. The basic idea is to use “spiral” paths that grow outward to search for the directory path of the object in a bottom-up fashion. For general networks, this protocol guarantees an \(\mathcal{O}(\log ^2 n\cdot \log D)\) approximation in sequential and one-shot concurrent executions of a finite set of move requests, where \(n\) is the number of nodes and \(D\) is the diameter of the network. It also guarantees poly-log approximation for any single lookup request. Our bounds are deterministic and hold in the worst-case. Moreover, this protocol requires only polylogarithmic bits of memory per node. Experimental evaluations in real networks also confirm our theoretical findings. To the best of our knowledge, this is the first deterministic consistency protocol for distributed transactional memory that achieves poly-log approximation in general networks.  相似文献   

17.
Transactional Memory is a concurrent programming API in which concurrent threads synchronize via transactions (instead of locks). Although this model has mostly been studied in the context of multiprocessors, it has attractive features for distributed systems as well. In this paper, we consider the problem of implementing transactional memory in a network of nodes where communication costs form a metric. The heart of our design is a new cache-coherence protocol, called the Ballistic protocol, for tracking and moving up-to-date copies of cached objects. For constant-doubling metrics, a broad class encompassing both Euclidean spaces and growth-restricted networks, this protocol has stretch logarithmic in the diameter of the network. Supported by NSF grant 0410042 and by grants from Intel Corporation and Sun Microsystems.  相似文献   

18.
《Ergonomics》2012,55(9):1013-1031
A series of psychophysical lifting studies was conducted to establish maximum acceptable weights of lift (MAWL) for three supply items commonly handled in underground coal mines (rock dust bags, ventilation stopping blocks, and crib blocks). Each study utilized 12 subjects, all of whom had considerable experience working in underground coal mines. Effects of lifting in four postures (standing, stooping under a 1·5m ceiling, stooping under a l·2m ceiling, and kneeling) were investigated together with four lifting conditions (combinations of lifting symmetry and lifting height). The frequency of lifting was set at four per min, and the task duration was 15?min. Posture significantly affected the MAWL for the rock dust bag (standing MAWL was 7% greater than restricted postures and kneeling MAWL was 6·4% less than stooped); however, posture interacted with lifting conditions for both of the other materials. Physiological costs were found to be significantly greater in the stooped postures compared with kneeling for all materials. Other contrasts (standing versus restricted postures, stooping under 1·5?m ceiling versus stooping under l·2?m ceiling) did not exhibit significantly different levels of energy expenditure. Energy expenditure was significantly affected by vertical lifting height; however, the plane of lifting had little influence on metabolic cost. Recommended acceptable workloads for the three materials are 20·0?kg for the rock dust bag, 16·5?kg for the ventilation stopping block, and 14·7?kg for the crib block. These results suggest that miners are often required to lift supplies that are substantially heavier than psychophysically acceptable lifting limits.  相似文献   

19.
This study measured how student interactions (as captured by Transactional Distance dialogue (Moore, 1993)) in online and blended learning environments impacted student learning outcomes, as measured by student satisfaction and student grades. Dialogue was measured as student interactions with other students (student–student interaction), the technologies used (student–technology interaction), the instructors (student–teacher interaction), and the course contents (student–content interaction). In addition, moderating effects of media and modality of interactions and individual differences on student learning outcomes were also measured. Data was obtained from 342 online and blended students between 2010 and 2013. Findings indicate that student–content interaction had a larger effect on student learning outcomes than other forms of dialogue. Implications for educational policies that require teacher-presence (student–teacher) and student–student interactions in distance learning environments are also discussed.  相似文献   

20.
In-Memory Databases (IMDBs), such as SAP HANA, enable new levels of database performance by removing the disk bottleneck and by compressing data in memory. The consequence of this improved performance means that reports and analytic queries can now be processed on demand. Therefore, the goal is now to provide near real-time responses to compute and data intensive analytic queries. To facilitate this, much work has investigated the use of acceleration technologies within the database context. While current research into the application of these technologies has yielded positive results, they have tended to focus on single database tasks or on isolated single user requests. This paper uses SHEPARD, a framework for managing accelerated tasks across shared heterogeneous resources, to introduce acceleration into an IMDB. Results show how, using SHEPARD, multiple simultaneous user queries all receive speed-up by using a shared pool of accelerators. Results also show that offloading analytic tasks onto accelerators can have indirect benefits for other database workloads by reducing contention for CPU resources.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号