排序方式: 共有57条查询结果,搜索用时 812 毫秒
1.
Extreme-scale computing is set to provide the infrastructure for the advances and breakthroughs that will solve some of the hardest problems in science and engineering. However, resilience and energy concerns loom as two of the major challenges for machines at that scale. The number of components that will be assembled in the supercomputers plays a fundamental role in these challenges. First, a large number of parts will substantially increase the failure rate of the system compared to the failure frequency of current machines. Second, those components have to fit within the power envelope of the installation and keep the energy consumption within operational margins. Extreme-scale machines will have to incorporate fault tolerance mechanisms and honor the energy and power restrictions. Therefore, it is essential to understand how fault tolerance and energy consumption interplay. This paper presents a comparative evaluation and analysis of energy consumption of three different rollback-recovery protocols: checkpoint/restart, message logging and parallel recovery. Our experimental evaluation shows parallel recovery has the minimum execution time and energy consumption. Additionally, we present an analytical model that projects parallel recovery can reduce energy consumption more than 37% compared to checkpoint/restart at extreme scale. 相似文献
2.
大规模计算系统故障特征及容错机制分析 总被引:1,自引:0,他引:1
本文围绕国内外若干大规模计算系统的运行稳定性状况展开调研:首先根据若干典型系统的故障数据,从故障模式、故障特征方面对目前实际生产性系统的稳定性进行分析;然后,在总结目前系统级容错研究思路的基础上,分析了未来更大规模计算系统容错机制的挑战及可能的解决方案。 相似文献
3.
4.
传统的调试器调试程序时,仅仅能够让程序正向运行并获取其当前的状态.提出了一种可以让程序逆向运行,回到过去任意时刻的调试方法,来增强调试器的功能.该方法是通过为Xen虚拟机添加完整的日志记录和回放功能以及对GDB调试器作相应修改来实现的;调试对象可以恢复到其运行过程的任意时刻.该可逆调试器,可以解决大型软件和操作系统内核... 相似文献
5.
6.
A critical performance issue for a number of scientific and engineering applications is the efficient transfer of data to secondary storage. Languages such as High Performance Fortran (HPF) have been introduced to allow programming distributed-memory systems at a relatively high level of abstraction. However, the present version of HPF does not provide appropriate constructs for controlling the parallel I/O capabilities of these systems. In this paper, constructs to specify parallel I/O operations on multidimensional arrays in the context of HPF are proposed. The paper also presents implementation concepts that are based on the HPF compiler VFC and the parallel I/O run-time system Panda. Experimental performance results are discussed in the context of financial management and traffic simulation applications. 相似文献
7.
内存数据库存在于易失性内存中,数据较易丢失,故障恢复部件至关重要。考虑系统环境限制,设计了一种利用日志与影子页面技术相结合的系统恢复模型,讨论了日志协议、模糊检查点策略、重装算法、恢复技术等模块的实现方法。该方法无需额外硬件的支持,日志记录数量少,检查点与事务处理并发执行,重装和恢复过程快速,极大地提高了内存数据库的恢复效率。 相似文献
8.
根据嵌入式系统环境的特点及其恢复需要,提出一种基于逻辑日志的嵌入式内存数据库恢复子系统设计模式。该子系统采用一主两副的节点模式,保证了数据对象恢复时状态与逻辑日志写时状态的一致性。经过验证试验表明该子系统有效减少了日志信息量,缩短了系统的恢复时间,提高了系统的性能。 相似文献
9.
通过现有的PostgreSQL的恢复算法和ARIES算法的比较,说明采用ARIES算法来改进PostgreSQL的必要性,并在分析现有的PostgreSQL与恢复相关的几个实现策略上,介绍为采用ARIES算法而设计的主要数据结构以及如何改进事务恢复管理器以便支持ARIES算法的事务基本操作和恢复管理的实现流程。这样比较显著地提高了PostgreSQL的事务处理能力。 相似文献
10.
日志结构文件系统技术的研究 总被引:1,自引:0,他引:1
介绍了一种新的磁盘管理技术-日志结构文件系统,把对文件的修改汇成日志条目顺序地写入磁盘,既加速了写文件的速度,又加速了崩溃恢复的速度,把整个磁盘作为日志,磁盘上包含有效的读取日志结构文件所需的索引信息,为了保持快速写所需的大的磁盘空闲快,将磁盘分成段,用一个清理工线程收集压缩分散到各个段中的有效信息,以一个日志结构文件系统的原型,即Sprite LFS为例,对日志文件系统设计和实现的各个阶段进行了分析,并与Unix的文件系统,即快速文件系统(FFS)进行了比较。 相似文献