期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王之元胡庆丰陈娟《计算机工程与科学》2009,31(11)

随着并行系统规模的扩大,高性能计算系统运行时消耗的能耗也在急剧增长,过高的能耗也给系统的可靠性、稳定性等方面带来严峻挑战。在这种情形下,能耗问题受到了前所未有的关注。因此,设计和研究高性能计算系统,需要在考虑高计算性能的同时兼顾系统低能耗的要求,这为高性能计算系统的度量模型提出了新的挑战。于是,大规模并行系统逐渐从"高性能"走向"高效能"的衡量标准。基于此,本文采用加速比度量指标,从系统可扩展角度将计算性能和能量消耗要素进行综合,提出了一种度量高性能计算系统综合性能的能耗并行加速比模型。该模型能够直观地反映并行计算系统的效能,旨在指导系统设计和应用研究。最后,通过对该模型的分析和模拟,验证了模型的有效性。相似文献

2.

并行计算系统度量指标综述

下载免费PDF全文

王之元杨学军《计算机工程与科学》2010,32(10):44-48

系统度量指标的研究一直是并行系统和应用设计的重要问题。本文首先通过对国内外并行计算系统度量指标的分析,将当前并行计算系统度量指标分为单一的计算性能度量指标和多要素综合的高效能度量指标两种。其次,总结了这些度量指标的研究现状,并指出这些度量指标研究中存在的一些问题以及需要考虑的难点。由于并行计算系统正在逐渐从"高性能"走向"高效能",本文主要考虑当前大规模并行计算系统的可靠性和能耗要素对系统的影响,从系统可扩展角度建立了可靠并行加速比和能耗并行加速比模型,并进一步扩展为度量并行计算系统效能的综合指标模型。最后指出了未来并行计算系统度量的研究方向。相似文献

3.

应用驱动的高效能计算机系统的研究与发展 总被引：1，自引：0，他引：1

洪学海詹剑锋樊建平张志宏《计算机研究与发展》2007,44(10):1633-1639

在分析并行应用为高效能计算机系统提出的挑战的基础上,针对Cray和IBM的相关计划,探讨了应用对高效能计算机研发计划、系统决策、系统架构以及系统软件的影响,总结了有代表性的千万亿次计算机计划及其系统,并从应用范围的角度对高效能计算机系统进行了分类;进一步综述了高效能计算机系统在体系结构、编程环境、管理以及鲁棒性等方面取得的进展.最后从应用的角度展望了高效能计算机系统的发展. 相似文献

4.

High Performance Inverse Preconditioning 总被引：1，自引：0，他引：1

George A. Gravvanis 《Archives of Computational Methods in Engineering》2009,16(1):77-108

The derivation of parallel numerical algorithms for solving sparse linear systems on modern computer systems and software platforms has attracted the attention of many researchers over the years. In this paper we present an overview on the design issues of parallel approximate inverse matrix algorithms, based on an anti-diagonal “wave pattern” approach and a “fish-bone” computational procedure, for computing explicitly various families of exact and approximate inverses for solving sparse linear systems. Parallel preconditioned conjugate gradient-type schemes in conjunction with parallel approximate inverses are presented for the efficient solution of sparse linear systems. Applications of the proposed parallel methods by solving characteristic sparse linear systems on symmetric multiprocessor systems and distributed systems are discussed and the parallel performance of the proposed schemes is given, using MPI, OpenMP and Java multithreading. 相似文献

5.

优化并行计算的性能评价

刘杰迟利华胡庆丰《计算机工程与设计》2000,21(6):4-7

传统的并行计算的性能评价模型是加速比,文中讨论了加速比的缺点和不足,在此基础上提出了一种新的优化并行计算的性能评价模型（我们称之为优化加速比）。利用优化加速比分析了NAS基准测试程序MG和FT在IBM SP2(66mhz/wn)上的性能。相似文献

6.

并行程序的优化与性能评价 总被引：5，自引：0，他引：5

下载免费PDF全文

刘杰迟利华胡庆丰《计算机工程与科学》2000,22(5):67-70

文中讨论了并行程序的优化问题,指出并行程序的优化应从数据划分、通信优化和串行优化三个方面着手。针对传统加速比的缺点和不足,我们提出了优化加速比模型来评价优化并行程序的性能;对ＮＡＳ基准测试程序ＭＧ和ＦＴ进行了优化,用优化加速比模型分析了上述两个程序在ＩＢＭＳＰ２上的性能。相似文献

7.

Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

Julien C. Thibault Inanc Senocak 《The Journal of supercomputing》2012,59(2):693-719

Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially. 相似文献

8.

A Computational Science IDE for HPC Systems: Design and Applications

David E. Hudak Neil Ludban Ashok Krishnamurthy Vijay Gadepally Siddharth Samsi John Nehrbass 《International journal of parallel programming》2009,37(1):91-105

相似文献

9.

Opinion helpfulness prediction in the presence of “words of few mouths” 总被引：1，自引：0，他引：1

Richong Zhang Thomas Tran Yongyi Mao 《World Wide Web》2012,15(2):117-138

This paper identifies a widely existing phenomenon in social media content, which we call the “words of few mouths” phenomenon. This phenomenon challenges the development of recommender systems based on users’ online opinions by presenting additional sources of uncertainty. In the context of predicting the “helpfulness” of a review document based on users’ online votes on other reviews (where a user’s vote on a review is either HELPFUL or UNHELPFUL), the “words of few mouths” phenomenon corresponds to the case where a large fraction of the reviews are each voted only by very few users. Focusing on the “review helpfulness prediction” problem, we illustrate the challenges associated with the “words of few mouths” phenomenon in the training of a review helpfulness predictor. We advocate probabilistic approaches for recommender system development in the presence of “words of few mouths”. More concretely, we propose a probabilistic metric as the training target for conventional machine learning based predictors. Our empirical study using Support Vector Regression (SVR) augmented with the proposed probability metric demonstrates advantages of incorporating probabilistic methods in the training of the predictors. In addition to this “partially probabilistic” approach, we also develop a logistic regression based probabilistic model and correspondingly a learning algorithm for review helpfulness prediction. We demonstrate experimentally the superior performance of the logistic regression method over SVR, the prior art in review helpfulness prediction. 相似文献

10.

产出率并行加速比模型

王之元《计算机工程》2011,37(5):10-12

针对并行计算系统的性能度量问题,在产出率度量模型的基础上,建立综合系统可靠性、通信、并行化控制和成本投入要素的产出率并行加速比模型,分析总结模型中各要素影响产出率并行加速比的关键因子,包括容错开销因子、通信开销因子、并行控制开销因子及成本开销因子,对上述关键因子进行模拟实验,以验证该模型的有效性。相似文献

11.

Robust scalability analysis and SPM case studies

Dejiang Jin Sotirios G. Ziavras 《The Journal of supercomputing》2008,43(3):199-223

Scalability has become an attribute of paramount importance for computer systems used in business, scientific and engineering applications. Although scalability has been widely discussed, especially for pure parallel computer systems, it conveniently focuses on improving performance when increasing the number of computing processors. In fact, the term “scalable” is so much abused that it has become a marketing tool for computer vendors independent of the system’s technical qualifications. Since the primary objective of scalability analysis is to determine how well a system can work on larger problems with an increase in its size, we introduce here a generic definition of scalability. For illustrative purposes only, we apply this definition to PC clusters, a rather difficult subject due to their long communication latencies. Since scalability does not solely depend on the system architecture but also on the application programs and their actual management by the run-time environment, for the sake of illustration, we evaluate scalability for programs developed under the super-programming model (SPM) (Jin and Ziavras in IEEE Trans. Parallel Distrib. Syst. 15(9):783–794, 2004; J. Parallel Distrib. Comput. 65(10):1281–1289, 2005; IEICE Trans. Inf. Syst. E87-D(7):1774–1781, 2004). 相似文献

12.

My personal history in the early explorations of computer graphics

Gary Demos 《The Visual computer》2005,21(12):961-978

Gary Demos discovered computer graphics while hearing a computer-generated film presentation at CalTech in 1970 by John Whitney Sr. Gary then began working under the direction of Ivan Sutherland in Utah to develop early computer graphics hardware and software. In 1974 Gary and John Whitney Jr. started the “Motion Picture Project” at Information International to produce computer generated simulated scenes for movies (Futureworld, Looker, and Tron) and commercials. These early computer-generated visuals were quite challenging given the level of software and hardware technology available in the 1970’s. In 1981 Gary and John left Information International to form Digital Productions, where they produced effects for the movies Last Starfighter and “2010”, which were both released in 1984. Digital Productions used the Cray XMP computer, together with the Digital Film Printer that they had developed at Information International. Following a hostile takeover by Omibus of Digital Productions in 1986, Whitney/Demos Productions was formed, using a Thinking Machine parallel computer. This paper describes the technical challenges and achievements of this early visual computing. 相似文献

13.

Collaborative image retrieval via regularized metric learning

Luo Si Rong Jin Steven C. H. Hoi Michael R. Lyu 《Multimedia Systems》2006,12(1):34-44

In content-based image retrieval (CBIR), relevant images are identified based on their similarities to query images. Most CBIR algorithms are hindered by the semantic gap between the low-level image features used for computing image similarity and the high-level semantic concepts conveyed in images. One way to reduce the semantic gap is to utilize the log data of users' feedback that has been collected by CBIR systems in history, which is also called “collaborative image retrieval.” In this paper, we present a novel metric learning approach, named “regularized metric learning,” for collaborative image retrieval, which learns a distance metric by exploring the correlation between low-level image features and the log data of users' relevance judgments. Compared to the previous research, a regularization mechanism is used in our algorithm to effectively prevent overfitting. Meanwhile, we formulate the proposed learning algorithm into a semidefinite programming problem, which can be solved very efficiently by existing software packages and is scalable to the size of log data. An extensive set of experiments has been conducted to show that the new algorithm can substantially improve the retrieval accuracy of a baseline CBIR system using Euclidean distance metric, even with a modest amount of log data. The experiment also indicates that the new algorithm is more effective and more efficient than two alternative algorithms, which exploit log data for image retrieval. 相似文献

14.

High-speed processing in wired-and-wireless integrated autonomous decentralized system and its application to IC card ticket system

Akio Shiibashi Naoki Mizoguchi Kinji Mori 《Innovations in Systems and Software Engineering》2007,3(1):53-60

The automatic fare collection systems need both high performance and high reliability. High performance is one of the most expected functions on the automatic fare collection gates (AFCGs) handling the highly dense passengers during rush hours. Reliability is also indispensable because the tickets are equivalent to money. For a wireless IC card ticket system, expected to improve the passengers’ convenience and to reduce the maintenance cost, is difficult to meet these two requirements because of wireless communications between an IC card and an AFCG. This paper introduces the autonomous decentralized system as the solution and how it is applied to the system. Then, two models are prepared and simulated to evaluate the efficiency, especially high-speed processing. The technologies are implemented into the “Suica” system at East Japan Railway Company and have proven the effectiveness. 相似文献

15.

SimK: A Large-Scale Parallel Simulation Engine

下载免费PDF全文

Jian-Wei Xu 《计算机科学技术学报》2009,24(6):1048-1060

Simulation is an important method to evaluate future computer systems. Currently microprocessor architecture has switched to parallel, but almost all simulators remained at sequential stage, and the advantages brought by multi-core or many-core processors cannot be utilized. This paper presents a parallel simulator engine (SimK) towards the prevalent SMP/CMP platform, aiming at large-scale fine-grained computer system simulation. In this paper, highly efficient synchronization, communication and buffer management policies used in SimK are introduced, and a novel lock-free scheduling mechanism that avoids using any atomic instructions is presented. To deal with the load fluctuation at light load case, a cooperated dynamic task migration scheme is proposed. Based on SimK, we have developed large-scale parallel simulators HppSim and HppNetSim, which simulate a full supercomputer system and its interconnection network respectively. Results show that HppSim and HppNetSim both gain sound speedup with multiple processors, and the best normalized speedup reaches 14.95X on a two-way quad-core server. 相似文献

16.

Performance-based parallel application toolkit for high-performance clusters 总被引：1，自引：1，他引：0

Kuan-Ching Li Tien-Hsiung Weng 《The Journal of supercomputing》2009,48(1):43-65

Advances in computer technology, encompassed with fast emerging of multicore processor technology, have made the many-core personal computers available and more affordable. The availability of network of workstations and cluster of many-core SMPs have made them an attractive solution for high performance computing by providing computational power equal or superior to supercomputers or mainframes at an affordable cost using commodity components. In order to search alternative ways to extract unused and idle computing power from these computing resources targeting to improve overall performance, as well as to fully utilize the underlying new hardware platforms, these are major topics in this field of research. In this research paper, the design rationale and implementation of an effective toolkit for performance measurement and analysis of parallel applications in cluster environments is introduced; not only generating parallel applications’ timing graph representation, but also to provide application execution’s performance data charts. The goal in developing this toolkit is to permit application developers have a better understanding of the application’s behavior among selected computing nodes purposed for that particular execution. Additionally, multiple execution results of a given application under development can be combined and overlapped, permitting application developers to perform “what-if” analysis, i.e., to deeper understand the utilization of allocated computational resources. Experimentations using this toolkit have shown its effectiveness on the development and performance tuning of parallel applications, extending the use in teaching of message passing, and shared memory model parallel programming courses.

Tien-Hsiung WengEmail:

相似文献

17.

数值并行计算可扩展性评价与测试 总被引：3，自引：1，他引：2

迟利华刘杰胡庆丰《计算机研究与发展》2005,42(6):1073-1078

分析了几种可扩展性能评价模型存在的问题,针对实际评价与测试的需要,提出了一种基于等平均负载的数值并行计算可扩展性评价模型．该评价模型对可扩展性能加速比和可扩展性进行了重新定义,给出了使用该模型的进行可扩展加速比和可扩展性测试的方法,结合曲线拟合或并行计算时间模型可以预测并行系统的可扩展性,对NPB BT,SP和矩阵乘法进行了可扩展性预测．相似文献

18.

Method of multimodal biometric data analysis for optimal efficiency evaluation of recognition algorithms and systems

V. V. Lobantsov I. A. Matveev A. B. Murynin 《Pattern Recognition and Image Analysis》2011,21(3):515-518

A primary consideration of this paper is to determine different factors influencing the reliability of performance evaluations of remote person recognition algorithms and systems. The authors suggest a method for determining and computing quantitative quality criteria of multimodal biometric data and consider the possibility of extrapolating test results to various practical applications. The functions of biometric data quality and biometric data artificiality that are introduced as a measure of proximity of the available biometric data to biometric data registered “naturally,” i.e., data of unaware and noncollaborative subjects, are under examination in this paper. 相似文献

19.

Requirement emergence computation of networked software 总被引：3，自引：0，他引：3

He Keqing Liang Peng Peng Rong Li Bing Liu Jing 《Frontiers of Computer Science in China》2007,1(3):322-328

Emergence Computation has become a hot topic in the research of complex systems in recent years. With the substantial increase in scale and complexity of network-based information systems, the uncertain user requirements from the Internet and personalized application requirement result in the frequent change for the software requirement. Meanwhile, the software system with non self-possessed resource become more and more complex. Furthermore, the interaction and cooperation requirement between software units and running environment in service computing increase the complexity of software systems. The software systems with complex system characteristics are developing into the “Networked Software” with characteristics of change-on-demand and change-with-cooperation. The concepts “programming”, “compiling” and “running” of software in common sense are extended from “desktop” to “network”. The core issue of software engineering is moving to the requirement engineering, which becomes the research focus of complex system software engineering. In this paper, we present the software network view based on complex system theory, and the concept of networked software and networked requirement. We propose the challenge problem in the research of emergence computation of networked software requirement. A hierarchical & cooperative unified requirement modeling framework URF (Unified Requirement Framework) and related RGPS (Role, Goal, Process and Service) meta-models are proposed. Five scales and the evolutionary growth mechanism in requirement emergence computation of networked software are given with focus on user-dominant and domain-oriented requirement, and the rules and predictability in requirement emergence computation are analyzed. A case study in the application of networked e-Business with evolutionary growth based on State design pattern is presented in the end. 相似文献

20.

Progress and Challenges in High Performance Computer Technology

下载免费PDF全文

Xue-Jun Yang Yong Dou and Qing-Feng Hu 《计算机科学技术学报》2006,21(5):674-681

High performance computers provide strategic computing power in the construction of national economy and defense, and become one of symbols of the country＇s overall strength. Over 30 years, with the supports of governments, the technology of high performance computers is in the process of rapid development, during which the computing performance increases nearly 3 million times and the processors number expands over 10 hundred thousands times. To solve the critical issues related with parallel efficiency and scalability, scientific researchers pursued extensive theoretical studies and technical innovations. The paper briefly looks back the course of building high performance computer systems both at home and abroad, and summarizes the significant breakthroughs of international high performance computer technology. We also overview the technology progress of China in the area of parallel computer architecture, parallel operating system and resource management, parallel compiler and performance optimization, environment for parallel programming and network computing. Finally, we examine the challenging issues, ＂memory wall＂, system scalability and ＂power wall＂, and discuss the issues of high productivity computers, which is the trend in building next generation high performance computers. 相似文献