期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A dynamic execution time estimation model to save energy in heterogeneous multicores running periodic tasks

《Future Generation Computer Systems》2016

Nowadays, real-time embedded applications have to cope with an increasing demand of functionalities, which require increasing processing capabilities. With this aim real-time systems are being implemented on top of high-performance multicore processors that run multithreaded periodic workloads by allocating threads to individual cores. In addition, to improve both performance and energy savings, the industry is introducing new multicore designs such as ARM’s big.LITTLE that include heterogeneous cores in the same package.A key issue to improve energy savings in multicore embedded real-time systems and reduce the number of deadline misses is to accurately estimate the execution time of the tasks considering the supported processor frequencies. Two main aspects make this estimation difficult. First, the running threads compete among them for shared resources. Second, almost all current microprocessors implement Dynamic Voltage and Frequency Scaling (DVFS) regulators to dynamically adjust the voltage/frequency at run-time according to the workload behavior. Existing execution time estimation models rely on off-line analysis or on the assumption that the task execution time scales linearly with the processor frequency, which can bring important deviations since the memory system uses a different power supply.In contrast, this paper proposes the Processor–Memory (Proc–Mem) model, which dynamically predicts the distinct task execution times depending on the implemented processor frequencies. A power-aware EDF (Earliest Deadline First)-based scheduler using the Proc–Mem approach has been evaluated and compared against the same scheduler using a typical Constant Memory Access Time model, namely CMAT. Results on a heterogeneous multicore processor show that the average deviation of Proc–Mem is only by 5.55% with respect to the actual measured execution time, while the average deviation of the CMAT model is 36.42%. These results turn in important energy savings, by 18% on average and up to 31% in some mixes, in comparison to CMAT for a similar number of deadline misses. 相似文献

2.

Ali Reza Zamani Moustafa AbdelBaky Daniel Balouek‐Thomert J. J. Villalobos Ivan Rodero Manish Parashar 《Concurrency and Computation》2020,32(16)

相似文献

3.

大规模科学可视化的数据流网络优化分析

廖丽徐平均王弘堃《计算机研究与发展》2009,46(Z2)

对大规模科学数据的分析与理解最终要依赖于可视化手段来完成,数据的存储与组织方式是影响可视化效率的关键因素,特别是集成大量可视化与数据分析功能的大规模并行分布式可视化与数据分析系统、可视化流程及数据流网络中所需要的优化手段对数据的存储与组织提出更高的要求.通过分析针对大规模数据可视化所采用的方式方法总结对数据及相关信息的存储与组织需求,借助高效的数据组织方式加快科学可视化过程. 相似文献

4.

Hyunjin Lee Lei Jin Kiyeon Lee Socrates Demetriades Michael Moeng Sangyeun Cho 《Software》2010,40(3):239-258

Simulation is indispensable in computer architecture research. Researchers increasingly resort to detailed architecture simulators to identify performance bottlenecks, analyze interactions among different hardware and software components, and measure the impact of new design ideas on the system performance. However, the slow speed of conventional execution‐driven architecture simulators is a serious impediment to obtaining desirable research productivity. This paper describes a novel fast multicore processor architecture simulation framework called Two‐Phase Trace‐driven Simulation (TPTS), which splits detailed timing simulation into a trace generation phase and a trace simulation phase. Much of the simulation overhead caused by uninteresting architectural events is only incurred once during the cycle‐accurate simulation‐based trace generation phase and can be omitted in the repeated trace‐driven simulations. We report our experiences with tsim, an event‐driven multicore processor architecture simulator that models detailed memory hierarchy, interconnect, and coherence protocol based on the TPTS framework. By applying aggressive event filtering, tsim achieves an impressive simulation speed of 146 millions of simulated instructions per second, when running 16‐thread parallel applications. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

5.

Ravi Reddy Manumachu Alexey L. Lastovetsky 《Concurrency and Computation》2019,31(4)

Self‐adaptability is a highly preferred feature in HPC applications. A crucial building block of a self‐adaptable application is a data partitioning algorithm that must possess several essential qualities apart from low runtime and memory costs. On modern platforms composed of multicore CPU processors, data partitioning algorithms striving to solve the bi‐objective optimization problem for performance and energy (BOPPE) face a formidable challenge. They must take into account the new complexities inherent in these platforms such as severe resource contention and non‐uniform memory access (NUMA). Novel model‐based methods and data partitioning algorithms have been proposed that address the challenge. However, these methods take as input full functional performance and energy models (FPM and FEM), which have prohibitively high model construction costs. Therefore, they are not suitable for employment in self‐adaptable applications. In this paper, we present a self‐adaptable data partitioning algorithm called ADAPTALEPH, which solves BOPPE on homogeneous clusters of multicore CPUs. Unlike the state‐of‐the‐art solving BOPPE that take as inputs full FPM and FEM, it constructs partial FPM and FEM during its execution using all the available processors. It returns a locally Pareto‐optimal set of solutions, which are the heterogeneous workload distributions that achieve inter‐node optimization of data‐parallel applications for performance and energy. We experimentally study the efficiency of ADAPTALEPH for three data‐parallel applications, ie, matrix‐vector multiplication, matrix‐matrix multiplication, and fast Fourier transform, on a modern multicore CPU and simulations for homogeneous clusters of such CPUs. We demonstrate that the locally Pareto‐optimal front approaches the globally Pareto‐optimal front as the number of points in the partial discrete FPM and FEM functions are increased. The number of points in the partial FPM/FEM when the locally Pareto‐optimal front becomes the globally Pareto‐optimal front is considerably less than the number of points in the full FPM/FEM thereby suggesting development of methods that can leverage this finding to drastically reduce the model construction times. 相似文献

6.

中央空调能效监测平台设计

下载免费PDF全文

谢秀颖姜海明王少林王敏张道良《计算机系统应用》2016,25(9):285-289

针对目前中央空调系统存在的能效信息分散,不易共享的问题,设计了中央空调能效监测平台,根据中央空调系统现有的能效监测手段及能效计算方法,建立能效评价体系,并结合中央空调诊断本体库,展示中央空调系统各设备、子系统的能效监测结果及诊断决策. 该平台为中央空调能效监测提供了便利,为进一步的节能维护提供了条件. 相似文献

7.

Chao‐Tung Yang Chao‐Chin Wu Jen‐Hsiang Chang 《Concurrency and Computation》2011,23(8):721-744

Parallel loop self‐scheduling on parallel and distributed systems has been a critical problem and it is becoming more difficult to deal with in the emerging heterogeneous cluster computing environments. In the past, some self‐scheduling schemes have been proposed as applicable to heterogeneous cluster computing environments. In recent years, multicore computers have been widely included in cluster systems. However, previous researches into parallel loop self‐scheduling did not consider certain aspects of multicore computers; for example, it is more appropriate for shared‐memory multiprocessors to adopt Open Multi‐Processing (OpenMP) for parallel programming. In this paper, we propose a performance‐based approach using hybrid OpenMP and MPI parallel programming, which partition loop iterations according to the performance weighting of multicore nodes in a cluster. Because iterations assigned to one MPI process are processed in parallel by OpenMP threads run by the processor cores in the same computational node, the number of loop iterations allocated to one computational node at each scheduling step depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

8.

具有能量效率的容错多事件簇 总被引：1，自引：0，他引：1

肖伟徐明《计算机工程与应用》2010,46(2):14-18

在无线传感器网络中,容错性和高效性是衡量网络性能的重要指标,在应用中如何同时兼顾这两个因素一直是算法研究的关键问题。针对多事件的监测和感知处理的应用,特别是当事件的感知区域发生重叠情况时,提出了具有能量效率的容错事件簇算法AECA。算法中首先给出了既考虑节点剩余能量又考虑节点容错性的分布式的簇头选举方法;然后研究了事件簇之间具有重叠区域的情况下节点处理的策略。通过仿真实验表明,算法AECA能有效地提高传感器网络的容错性和生存周期,并具有可靠性和可扩展性的特点。相似文献

9.

《Concurrency and Computation》2017,29(8)

In this paper, we propose an energy‐aware task management mechanism designed for the forward‐in‐time algorithms running on multicore central processing units (CPUs), where the multidimensional positive definite advection transport algorithm stencil‐based algorithm is one of the representative examples. This mechanism is based on the dynamic voltage and frequency scaling technique and allows the reduction of energy consumption for an existing algorithm (or application) such that the predefined execution time is respected, without requiring any modifications in the algorithm itself. This paper also provides the formulation of a method for minimizing the energy consumption with time constraints, which is based on the adaptive scheduling with online modeling. Finally, using the autotuning technique, we provide the automation of the process for creation and determination of the best energy profile at runtime, even in the presence of additional CPU workloads. The experimental results on a 6‐core computing platform show that the proposed mechanism provides the energy savings of up to 1.43x when compared to the default Linux scaling governor. Also, we confirm the effectiveness of the self‐adaptive feature of the proposed mechanism, by showing its ability to maintain the requested execution time in spite of additional CPU workloads imposed by other applications. 相似文献

10.

一种可控簇规模的能耗均衡路由协议的设计

应可珍方飞施伟元毛科技陈庆章《传感技术学报》2018,31(3):429-435

无线传感器网络路由研究的主要难点在于节点负载不均衡,容易导致某些节点能量提前耗尽,从而使得传感器网络过早死亡.针对这一难点,本文提出了一种可控簇规模的能量均衡路由协议(CCERP).该协议综合考虑节点的剩余能量、节点与邻居节点的链路质量、节点的度选择簇首节点,然后根据节点距离Sink节点的最短跳数控制簇首竞争半径,从而控制簇规模,接着利用虚拟力模型进行普通节点成簇,最后簇首通过多跳路由将采集的数据发送至Sink节点.该协议在簇首选举、簇规模控制和簇首间路由都充分考虑了负载均衡,实验结果验证了该协议具有较好的能耗均衡性. 相似文献

11.

面向异构多核系统芯片的高效动态带宽划分方法

刘阳国陆俊林程旭易江芳佟冬刘锋《计算机辅助设计与图形学学报》2016,28(10):1786-1795

针对异构MPSoC中各主设备频繁争抢有限访存带宽、请求相互干扰、严重影响系统性能的问题,提出一种基于限流的动态DRAM带宽分配机制——TDBA.首先实时监测主设备访存特性,通过访存干扰程度评估将主设备分组;然后对造成严重干扰的主设备设置带宽限流阈值来防止其过度争抢带宽,并根据系统带宽使用情况动态调整该阈值,同时优先计算密集主设备的请求以进一步提高系统性能.将TDBA应用于真实异构MPSoC系统的实验结果表明,TDBA可以有效地降低访存干扰,明显提高系统性能. 相似文献

12.

面向数值模拟数据的HDF5性能优化

沈卫超曹立强夏芳宋磊《计算机研究与发展》2012,(Z1):314-318

大规模数值模拟数据对可视化分析提出了挑战,I/O是影响可视化交互性能的重要因素.HDF5是科学计算领域广泛采用的存储格式,介绍了HDF5的抽象数据模型、数据读写流程,并使用典型数值模拟数据测试了HDF5的读性能.测试发现HDF5的数据集定位开销较大.根据数值模拟数据的数据块以整数有规律编号的特点,通过在HDF5中增加数据块视图对象来提高读性能.测试表明,该方法可显著加速数据的读取性能. 相似文献

13.

Torvald Mrtensson Daniel Sthl Jan Bosch 《Journal of Software: Evolution and Process》2019,31(4)

Based on 25 interviews with participants from four case study companies that develop large‐scale software embedded systems, this paper presents the Test Activity Stakeholders (TAS) model. The TAS model shows how the continuous integration and delivery pipeline can be designed to include test activities that support four stakeholder interests: “Check changes,” “Secure stability,” “Measure progress,” and “Verify compliance.” The model is developed to show how each of the stakeholder interests are best supported by unit/component tests or system tests, by automated testing or manual testing and by tests executed in simulated environments or on real hardware. The TAS model may serve as a starting point for companies when evaluating and designing their continuous integration and delivery pipeline. The validation of the TAS model included twelve individuals from three case study companies. The validation showed that the TAS model is actionable and useful in practice, enabling the identification of stakeholders and showing where improvement efforts should be focused in order to support all stakeholder interests in the continuous integration and delivery pipeline. 相似文献

14.

Kevin Lee Georg Buss Daniel Veit 《Concurrency and Computation》2016,28(5):1527-1547

An increasing number of enterprise applications are intensive in their consumption of IT but are infrequently used. Consequently, either organizations host an oversized IT infrastructure or they are incapable of realizing the benefits of new applications. A solution to the challenge is provided by the large‐scale computing infrastructures of clouds and grids, which allow resources to be shared. A major challenge is the development of mechanisms that allow efficient sharing of IT resources. Market mechanisms are promising, but there is a lack of research in scalable market mechanisms. We extend the multi‐attribute combinatorial exchange mechanism with greedy heuristics to address the scalability challenge. The evaluation shows a trade‐off between efficiency and scalability. There is no statistical evidence for an influence on the incentive properties of the market mechanism. This is an encouraging result as theory predicts heuristics to ruin the mechanism's incentive properties. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

15.

Bao‐Yin Zhang Guang‐Wen Yang Wei‐Min Zheng 《Concurrency and Computation》2006,18(12):1541-1557

In this paper, we present Jcluster, an efficient Java parallel environment that provides some critical services, in particular automatic load balancing and high‐performance communication, for developing parallel applications in Java on a large‐scale heterogeneous cluster. In the Jcluster environment, we implement a task scheduler based on a transitive random stealing (TRS) algorithm. Performance evaluations show that the scheduler based on TRS can make any idle node obtain a task from another node with much fewer stealing times than random stealing (RS), which is a well‐known dynamic load‐balancing algorithm, on a large‐scale cluster. In the performance aspects of communication, with the method of asynchronously multithreaded transmission, we implement a high‐performance PVM‐like and MPI‐like message‐passing interface in pure Java. The evaluation of the communication performance is conducted among the Jcluster environment, LAM‐MPI and mpiJava on LAM‐MPI based on the Java Grande Forum's pingpong benchmark. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

16.

提升大规模集群上并行计算软件系统可靠性和服务性的方法与实践

林彦宇陈虎苗军韩佳龙媚赖路双《计算机工程与科学》2015,37(1):1-6

大规模集群上的并行计算软件需要具备处理部分节点、网络等失效的容错能力,也需要具有易于管理、维护、移植和可扩展的服务能力。针对星形计算模型,研究和开发了一套并行计算框架。利用调度节点内部的可变粒度分解器、相关队列等方法,实现了全系统容错,且具有较好的易用性、可移植性和可扩展性。系统目前可以实现300TFlops计算能力下连续运行超过150h,而且还具有进一步的可扩展能力。相似文献

17.

无线传感器网络中多事件簇的数据容错与能效研究

肖伟徐明《计算机工程与科学》2012,34(3):19-23

基于无线传感器网络中事件簇容错和能效的要求,本文给出多事件簇数据容错模式MED-FT。该模式首先利用剩余能量和事件可信度的积值,给出分布式簇头节点的选举方法;然后,提出了多事件簇重叠区域下节点的处理策略,并且建立了事件簇的数据容错补偿机制。仿真实验表明,具有数据容错模式的多事件簇不仅能获得更长的网络生存周期,并且能获得更好的数据正确性和容错性能。相似文献

18.

LI Kun JIANG Li li 《计算机工程与科学》2013,35(3):38

相似文献

19.

Kenneth G. Crowther Yacov Y. Haimes 《Systems Engineering》2005,8(4):323-341

Our modern era is characterized by a large‐scale web of interconnected and interdependent economic and infrastructure systems, coupled with threats of terrorism. This paper demonstrates the value of introducing interdependency analysis into various phases of risk assessment and management through application of the Inoperability Input–Output Model (IIM). The IIM estimates the cascading inoperability and economic losses that result from interdependencies within large‐scale economic and infrastructure systems. Based on real data and the Nobel Prize‐winning W. Leontief economic model, the IIM is a computationally efficient, inexpensive, holistic method for estimating economic impacts. Three illustrative case studies are presented. The first and second illustrate how the supply‐ and demand‐side IIM is used to calculate higher‐order effects from attacks to vulnerabilities and implementation of risk management policies in large‐scale economic systems. The final case study illustrates a more general use for interdependency analysis: to evaluate risk management options against multiple objectives. This study calculates a Pareto‐optimal or efficient frontier of solutions by integrating a simplified model of the costs of recovery to the Power sector derived from open‐source data with the IIM. Through these case studies, which use a database from the Bureau of Economic Analysis, we illustrate the value of interdependency analysis in the risk assessment and management process as an integral part of systems engineering. © 2005 Wiley Periodicals, Inc. Syst Eng 8: 323–341, 2005 相似文献

20.

基于灰色聚类的大规模群体语言评价信息集结研究

王翯华朱建军方志耕《控制与决策》2012,27(2):271-275

研究群体规模较大的情况下基于语言评价信息的决策方法,提出了基于灰色关联聚类的方法以降低大规模群体的协调困难;根据决策者所属群体类别差异,基于语言信息转化函数提出了类内评价指标权重确定方法,并测算了各群体类别的综合偏好;研究群体类间权重确定模型,建立了基于群体类别意见偏差最小的群体偏好集结模型;进而提出了决策者判断一致度定义以及决策者偏差协调方法.算例分析表明了该方法的应用步骤及可行性. 相似文献