期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Pedro Furtado 《Service Oriented Computing and Applications》2009,3(3):159-169

Typical request processing systems, such as web servers and database servers, try to accommodate all requests as fast as possible, which can be described as a Best-Effort approach. However, different application items may have different quality-of-service (QoS) requirements, and this can be viewed as an orthogonal concern to the basic system functionality. In this paper we propose the QoS-Broker, a middleware for delivering QoS over servers and applications. We show its architecture to support contracts over varied targets including queries, transactions, services or sessions, also allowing expressions on variables to be specified in those targets. We also discuss how the QoS-Broker implements basic strategies for QoS over workloads. Our experimental results illustrate the middleware by applying priority and weighted- fair-queuing based differentiation over clients and over transactions, and also admission control, using a benchmark as a case-study. 相似文献

2.

On synthetic workloads for multiplayer online games: a methodology for generating representative shooter game workloads

Max Lehn Tonio Triebel Robert Rehner Benjamin Guthier Stephan Kopf Alejandro Buchmann Wolfgang Effelsberg 《Multimedia Systems》2014,20(5):609-620

We present approaches to the generation of synthetic workloads for benchmarking multiplayer online gaming infrastructures. Existing techniques, such as mobility or traffic models, are often either too simple to be representative for this purpose or too specific for a particular network structure. Desirable properties of a workload are reproducibility, representativeness, and scalability to any number of players. We analyze different mobility models and AI-based workload generators. Real gaming sessions with human players using the prototype game Planet PI4 serve as a reference workload. Novel metrics are used to measure the similarity between real and synthetic traces with respect to neighborhood characteristics. We found that, although more complicated to handle, AI players reproduce real workload characteristics more accurately than mobility models. 相似文献

3.

An energy-efficient 3D-stacked STT-RAM cache architecture for cloud processors: the effect on emerging scale-out workloads

Adnan Nasri Mahmood Fathy Ali Broumandnia 《The Journal of supercomputing》2018,74(4):1547-1561

This paper focuses on energy consumption which is a major problem in the dark silicon era. As energy consumption becomes a key issue for operation and maintenance of cloud data centers, cloud computing providers are becoming significantly concerned. Here, we show how spin-transfer torque random access memory (STT-RAM) can be used as an on-chip L2 cache to obtain lower energy compared to conventional L2 caches, like SRAM. High density, fast read access and non-volatility make STT-RAM a significant technology for on-chip memories. Previous studies have mainly studied specific schemes based on common applications and do not provide a thorough analysis of emerging scale-out applications with multiple design options. Here, we discuss different outlooks consisting of performance and energy efficiency in cloud processors by running emerging scale-out workloads. Experiment results on the CloudSuite benchmarks show that the proposed method reduces energy by 51% (on average) and improves energy delay product by 37% (on average) where instruction per cycle degradation is only 22% (on average) compared to the SRAM method. 相似文献

4.

Finding recently frequent itemsets adaptively over online transactional data streams 总被引：1，自引：0，他引：1

Joong Hyuk Chang Won Suk Lee 《Information Systems》2006,31(8):849-869

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Consequently, the knowledge embedded in a data stream is more likely to be changed as time goes by. Identifying the recent change of a data stream, especially for an online data stream, can provide valuable information for the analysis of the data stream. However, most of mining algorithms or frequency approximation algorithms over a data stream do not differentiate the information of recently generated data elements from the obsolete information of old data elements which may be no longer useful or possibly invalid at present. Therefore, they are not able to extract the recent change of information in a data stream adaptively. This paper proposes a data mining method for finding recently frequent itemsets adaptively over an online transactional data stream. The effect of old transactions on the current mining result of a data steam is diminished by decaying the old occurrences of each itemset as time goes by. Furthermore, several optimization techniques are devised to minimize processing time as well as memory usage. Finally, the performance of the proposed method is analyzed by a series of experiments to identify its various characteristics. 相似文献

5.

Toward multi-programmed workloads with different memory footprints: a self-adaptive last level cache scheduling scheme

Jingyu Zhang Minyi Guo Chentao Wu Yuanyi Chen 《中国科学:信息科学(英文版)》2018,61(1):012105

With the emerging of 3D-stacking technology, the dynamic random-access memory (DRAM) can be stacked on chips to architect the DRAM last level cache (LLC). Compared with static randomaccess memory (SRAM), DRAM is larger but slower. In the existing research papers, a lot of work has been devoted to improving the workload performance using SRAM and stacked DRAM together, ranging from SRAM structure improvement, to optimizing cache tag and data access. Instead, little attention has been paid to designing an LLC scheduling scheme for multi-programmed workloads with different memory footprints. Motivated by this, we propose a self-adaptive LLC scheduling scheme, which allows us to utilize SRAM and 3D-stacked DRAM efficiently, achieving better workload performance. This scheduling scheme employs (1) an evaluation unit, which is used to probe and evaluate the cache information during the process of programs being executed; and (2) an implementation unit, which is used to self-adaptively choose SRAM or DRAM. To make the scheduling scheme work correctly, we develop a data migration policy. We conduct extensive experiments to evaluate the performance of our proposed scheme. Experimental results show that our method can improve the multi-programmed workload performance by up to 30% compared with the state-of-the-art methods. 相似文献

6.

基于Redis的系统缓存容量平滑扩展方案

赖歆《网络安全技术与应用》2014,(10):78-79

本文对基于Redis缓存的系统怎样进行缓存数据容量扩展展开探讨,并提出具体的解决方案,从而使缓存数据容量在进行扩展的过程中,减少对系统的影响,保持系统的服务提供。相似文献

7.

基于Flash控制器的FPGA在线加载功能设计 总被引：1，自引：1，他引：1

林天静阮翔刘春《电子技术应用》2019,45(1):88-91

传统的FPGA程序更新的方式是使用开发工具通过JTAG方式将FPGA程序固化至存储器件Nor Flash中,当某一复杂系统内需要更新多块FPGA时,JTAG方式由于同时只能更新一块FPGA,耗费时间长,并且还必须连接线缆,无法实现远程更新。因此,提出了一种FPGA在线更新程序的实现方案,该方案可以实现系统内的多块FPGA程序更新,最大化更新速度的同时,可通过网络实现远程更新,便于调试及远程升级。相似文献

8.

Miniatures versus icons as a visual cache for videotex browsing

Jakob Nielsen 《Behaviour & Information Technology》1990,9(6):441-449

Miniatures are an alternative to icons for the representation of a large graphical object such as a window in a reduced format. A front end user interface to an existing videotex system was implemented using icons as well as miniatures to represent previously seen frames in a visual cache, and an empirical comparison showed that users had the same performance with the two representations but subjectively preferred icons. 相似文献

9.

Transactional scheduling for read-dominated workloads

Hagit Attiya Alessia Milani 《Journal of Parallel and Distributed Computing》2012

The transactional approach to contention management guarantees atomicity by aborting transactions that may violate consistency. A major challenge in this approach is to schedule transactions in a manner that reduces the total time to perform all transactions (the makespan), since transactions are often aborted and restarted. The performance of a transactional scheduler can be evaluated by the ratio between its makespan and the makespan of an optimal, clairvoyant scheduler that knows the list of resource accesses that will be performed by each transaction, as well as its release time and duration. 相似文献

10.

System optimization for OLTP workloads

Kunkel S. Armstrong B. Vitale P. 《Micro, IEEE》1999,19(3):56-64

Major performance enhancements in large commercial systems are best achieved when advances in hardware technology are matched with advances in software technology. This article connects recent AS/400 hardware advances with the corresponding approaches used to tune the system performance for large online transaction processing (OLTP) workloads. We particularly emphasize those tuning efforts that affect the memory system. OLTP workloads are large and complex, stressing many parts of both the software and hardware. These workloads quickly expose software bottlenecks caused by contention on software locks. They also have large working sets, populated with hard-to-predict access patterns that make cache miss rates high. This causes the processor to spend a significant part of its execution time waiting for memory accesses. In multiprocessor systems, compilers alone have minimal effect on cycles spent in storage latency. Other optimizations are needed to affect this portion of the execution time, and many of those require direct involvement of the system software 相似文献

11.

Software transactional memories for Scala

Daniel Goodman Behram KhanAuthor VitaeSalman KhanAuthor Vitae Mikel LujánAuthor VitaeIan WatsonAuthor Vitae 《Journal of Parallel and Distributed Computing》2013

Transactional memory is an alternative to locks for handling concurrency in multi-threaded environments. Instead of providing critical regions that only one thread can enter at a time, transactional memory records sufficient information to detect and correct for conflicts if they occur. This paper surveys the range of options for implementing software transactional memory in Scala. Where possible, we provide references to implementations that instantiate each technique. As part of this survey, we document for the first time several techniques developed in the implementation of Manchester University Transactions for Scala. We order the implementation techniques on a scale moving from the least to the most invasive in terms of modifications to the compilation and runtime environment. This shows that, while the less invasive options are easier to implement and more common, they are more verbose and invasive in the codes using them, often requiring changes to the syntax and program structure throughout the code. 相似文献

12.

Bitwise dimensional co-clustering for analytical workloads

Stephan Baumann Peter Boncz Kai-Uwe Sattler 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(3):291-316

Analytical workloads in data warehouses often include heavy joins where queries involve multiple fact tables in addition to the typical star-patterns, dimensional grouping and selections. In this paper we propose a new processing and storage framework called bitwise dimensional co-clustering (BDCC) that avoids replication and thus keeps updates fast, yet is able to accelerate all these foreign key joins, efficiently support grouping and pushes down most dimensional selections. The core idea of BDCC is to cluster each table on a mix of dimensions, each possibly derived from attributes imported over an incoming foreign key and this way creating foreign key connected tables with partially shared clusterings. These are later used to accelerate any join between two tables that have some dimension in common and additionally permit to push down and propagate selections (reduce I/O) and accelerate aggregation and ordering operations. Besides the general framework, we describe an algorithm to derive such a physical co-clustering database automatically and describe query processing and query optimization techniques that can easily be fitted into existing relational engines. We present an experimental evaluation on the TPC-H benchmark in the Vectorwise system, showing that co-clustering can significantly enhance its already high performance and at the same time significantly reduce the memory consumption of the system. 相似文献

13.

Distributed transactional memory for general networks

Gokarna Sharma Costas Busch 《Distributed Computing》2014,27(5):329-362

We consider the problem of implementing transactional memory in large-scale distributed networked systems. We present Spiral, a novel distributed directory-based protocol for transactional memory, and theoretically analyze and experimentally evaluate it for the performance boundaries of this approach from the worst-case perspective. Spiral is designed for the data-flow distributed implementation of software transactional memory which supports three basic operations: publish, allowing a shared object to be inserted in the directory so that other nodes can find it; lookup, providing a read-only copy of the object to the requesting node; move, allowing the requesting node to write the object locally after the node gets it. The protocol runs on a hierarchical directory construction based on sparse covers, where clusters at each level are ordered to avoid race conditions while serving concurrent requests. Given a shared object the protocol maintains a directory path pointing to the object. The basic idea is to use “spiral” paths that grow outward to search for the directory path of the object in a bottom-up fashion. For general networks, this protocol guarantees an \(\mathcal{O}(\log ^2 n\cdot \log D)\) approximation in sequential and one-shot concurrent executions of a finite set of move requests, where \(n\) is the number of nodes and \(D\) is the diameter of the network. It also guarantees poly-log approximation for any single lookup request. Our bounds are deterministic and hold in the worst-case. Moreover, this protocol requires only polylogarithmic bits of memory per node. Experimental evaluations in real networks also confirm our theoretical findings. To the best of our knowledge, this is the first deterministic consistency protocol for distributed transactional memory that achieves poly-log approximation in general networks. 相似文献

14.

MyBenchmark: generating databases for query workloads

Eric Lo Nick Cheng Wilfred W. K. Lin Wing-Kai Hon Byron Choi 《The VLDB Journal The International Journal on Very Large Data Bases》2014,23(6):895-913

To evaluate the performance of database applications and database management systems (DBMSs), we usually execute workloads of queries on generated databases of different sizes and then benchmark various measures such as respond time and throughput. This paper introduces MyBenchmark, a parallel data generation tool that takes a set of queries as input and generates database instances. Users of MyBenchmark can control the characteristics of the generated data as well as the characteristics of the resulting workload. Applications of MyBenchmark include DBMS testing, database application testing, and application-driven benchmarking. In this paper, we present the architecture and the implementation algorithms of MyBenchmark. Experimental results show that MyBenchmark is able to generate workload-aware databases for a variety of workloads including query workloads extracted from TPC-C, TPC-E, TPC-H, and TPC-W benchmarks. 相似文献

15.

Distributed transactional memory for metric-space networks

Maurice Herlihy Ye Sun 《Distributed Computing》2007,20(3):195-208

Transactional Memory is a concurrent programming API in which concurrent threads synchronize via transactions (instead of locks). Although this model has mostly been studied in the context of multiprocessors, it has attractive features for distributed systems as well. In this paper, we consider the problem of implementing transactional memory in a network of nodes where communication costs form a metric. The heart of our design is a new cache-coherence protocol, called the Ballistic protocol, for tracking and moving up-to-date copies of cached objects. For constant-doubling metrics, a broad class encompassing both Euclidean spaces and growth-restricted networks, this protocol has stretch logarithmic in the diameter of the network. Supported by NSF grant 0410042 and by grants from Intel Corporation and Sun Microsystems. 相似文献

16.

Acceptable workloads for three common mining materials

《Ergonomics》2012,55(9):1013-1031

A series of psychophysical lifting studies was conducted to establish maximum acceptable weights of lift (MAWL) for three supply items commonly handled in underground coal mines (rock dust bags, ventilation stopping blocks, and crib blocks). Each study utilized 12 subjects, all of whom had considerable experience working in underground coal mines. Effects of lifting in four postures (standing, stooping under a 1·5m ceiling, stooping under a l·2m ceiling, and kneeling) were investigated together with four lifting conditions (combinations of lifting symmetry and lifting height). The frequency of lifting was set at four per min, and the task duration was 15?min. Posture significantly affected the MAWL for the rock dust bag (standing MAWL was 7% greater than restricted postures and kneeling MAWL was 6·4% less than stooped); however, posture interacted with lifting conditions for both of the other materials. Physiological costs were found to be significantly greater in the stooped postures compared with kneeling for all materials. Other contrasts (standing versus restricted postures, stooping under 1·5?m ceiling versus stooping under l·2?m ceiling) did not exhibit significantly different levels of energy expenditure. Energy expenditure was significantly affected by vertical lifting height; however, the plane of lifting had little influence on metabolic cost. Recommended acceptable workloads for the three materials are 20·0?kg for the rock dust bag, 16·5?kg for the ventilation stopping block, and 14·7?kg for the crib block. These results suggest that miners are often required to lift supplies that are substantially heavier than psychophysically acceptable lifting limits. 相似文献

17.

Managed acceleration for In-Memory database analytic workloads

Eoghan O’Neill John McGlone Peter Kilpatrick Dimitrios Nikolopoulos 《International Journal of Parallel, Emergent and Distributed Systems》2017,32(4):406-427

In-Memory Databases (IMDBs), such as SAP HANA, enable new levels of database performance by removing the disk bottleneck and by compressing data in memory. The consequence of this improved performance means that reports and analytic queries can now be processed on demand. Therefore, the goal is now to provide near real-time responses to compute and data intensive analytic queries. To facilitate this, much work has investigated the use of acceleration technologies within the database context. While current research into the application of these technologies has yielded positive results, they have tended to focus on single database tasks or on isolated single user requests. This paper uses SHEPARD, a framework for managing accelerated tasks across shared heterogeneous resources, to introduce acceleration into an IMDB. Results show how, using SHEPARD, multiple simultaneous user queries all receive speed-up by using a shared pool of accelerators. Results also show that offloading analytic tasks onto accelerators can have indirect benefits for other database workloads by reducing contention for CPU resources. 相似文献

18.

Exponential family tensor factorization: an online extension and applications

Kohei Hayashi Takashi Takenouchi Tomohiro Shibata Yuki Kamiya Daishi Kato Kazuo Kunieda Keiji Yamada Kazushi Ikeda 《Knowledge and Information Systems》2011,33(1):57-88

In this paper, we propose a new probabilistic model of heterogeneously attributed multi-dimensional arrays. The model can manage heterogeneity by employing individual exponential family distributions for each attribute of the tensor array. Entries of the tensor are connected by latent variables and share information across the different attributes through the latent variables. The assumption of heterogeneity makes a Bayesian inference intractable, and we cast the EM algorithm approximated by the Laplace method and Gaussian process. We also extended the proposal algorithm for online learning. We apply our method to missing-values prediction and anomaly detection problems and show that our method outperforms conventional approaches that do not consider heterogeneity. 相似文献

19.

A theory of online learning as online participation

Stefan Hrastinski 《Computers & Education》2009

In this paper, an initial theory of online learning as online participation is suggested. It is argued that online learner participation (1) is a complex process of taking part and maintaining relations with others, (2) is supported by physical and psychological tools, (3) is not synonymous with talking or writing, and (4) is supported by all kinds of engaging activities. Participation and learning are argued to be inseparable and jointly constituting. The implication of the theory is straightforward: If we want to enhance online learning, we need to enhance online learner participation. 相似文献

20.

A chip multithreaded processor for network-facing workloads

Sanjiv Kapil McGhan H. Lawrendra J. 《Micro, IEEE》2004,24(2):20-30

Throughput computing is based on chip multithreading processor design technology. In CMT technology, maximizing the amount of work accomplished per unit of time or other relevant resource, rather than minimizing the time needed to complete a given task or set of tasks, defines performance. By CMT standards, the best processor accomplishes the most work per second of time, per watt of expended power, per square millimeter of die area, and so on (that is, it operates most efficiently). The processor described is a member of Sun's first generation of CMT processors designed to efficiently execute network-facing workloads. Network-facing systems primarily service network clients and are often grouped together under die label "Web servers". The processor's dual-thread execution capability, compact die size, and minimal power consumption combine to produce high throughput performance per watt, per transistor, and per square millimeter of die area. Given the short design cycle Sun needed to create the processor, the result is a compelling early proof of the value of throughput computing. 相似文献