共查询到20条相似文献,搜索用时 31 毫秒
1.
Mohammad R. Hajihashemi Magda El-ShenaweeAuthor Vitae 《Journal of Parallel and Distributed Computing》2010
A parallelized version of the level-set algorithm based on the MPI technique is presented. TM-polarized plane waves are used to illuminate two-dimensional perfect electric conducting targets. A variety of performance measures such as the efficiency, the load balance, the weak scaling, and the communication/computation times are discussed. For electromagnetic inverse scattering problems, retrieving the target’s arbitrary shape and location in real time is considered as a main goal, even as a trade-off with algorithm efficiency. For the three cases considered here, a maximum speedup of 53X-84X is achieved when using 256 processors. However, the overall efficiency of the parallelized level-set algorithm is 21%–33% when using 256 processors and 26%–52% when using 128 processors. The effects of the bottlenecks of the level-set algorithm on the algorithm efficiency are discussed. 相似文献
2.
Steve C. Chiu 《The Journal of supercomputing》2008,46(2):105-107
The abundance of parallel and distributed computing platforms, such as MPP, SMP, and the Beowulf clusters, to name just a
few, has added many more possibilities and challenges to high performance computing (HPC), parallel I/O, mass data storage,
scalable architectures, and large-scale simulations, which traditionally belong to the realm of custom-tailored parallel systems.
The intent of this special issue is to discuss problems and solutions, to identify new issues, and to help shape future research
directions in these areas. From these perspectives, this special issue addresses the problems encountered at the hardware,
architectural, and application levels, while providing conceptual as well as empirical treatments to the current issues in
high performance computing, and the I/O architectures and systems utilized therein. 相似文献
3.
A high performance algorithm for static task scheduling in heterogeneous distributed computing systems 总被引:2,自引:0,他引:2
Effective task scheduling is essential for obtaining high performance in heterogeneous distributed computing systems (HeDCSs). However, finding an effective task schedule in HeDCSs requires the consideration of both the heterogeneity of processors and high interprocessor communication overhead, which results from non-trivial data movement between tasks scheduled on different processors. In this paper, we present a new high-performance scheduling algorithm, called the longest dynamic critical path (LDCP) algorithm, for HeDCSs with a bounded number of processors. The LDCP algorithm is a list-based scheduling algorithm that uses a new attribute to efficiently select tasks for scheduling in HeDCSs. The efficient selection of tasks enables the LDCP algorithm to generate high-quality task schedules in a heterogeneous computing environment. The performance of the LDCP algorithm is compared to two of the best existing scheduling algorithms for HeDCSs: the HEFT and DLS algorithms. The comparison study shows that the LDCP algorithm outperforms the HEFT and DLS algorithms in terms of schedule length and speedup. Moreover, the improvement in performance obtained by the LDCP algorithm over the HEFT and DLS algorithms increases as the inter-task communication cost increases. Therefore, the LDCP algorithm provides a practical solution for scheduling parallel applications with high communication costs in HeDCSs. 相似文献
4.
The aim of the proposed fault tolerant model is to attain reliability and high performance for distributed computing on the Internet. The novelty of this model lies in the integration of three unique schemes that work in unison within a single framework. These three schemes are consecutive message transmission, adaptive buffer control, and message balancing. Message balancing essentially seeks to ensure that each message queue is served for an interval, which depends on the current length of the queue, by the processor. In the experiments, only two parameters: current buffer length and rate of change of the actual queue length were used for proportional and derivative feedback control of adaptive buffer management. Test results have indicated clearly that the model goes a considerable way towards achieving the stated aim. 相似文献
5.
The paper presents a new approach for the introduction of computational science into high level school curricula. It also discusses a set of real life problems that are appropriate for these curricula because they can be described through simple models. The computer based simulation of these systems require an ad hoc environment, including a programming language, suitable for this target age. The paper proposes a new environment, the ORESPICS environment, including a new programming language. The sequential part of the language integrates the classical imperative constructs with a simple set of graphical primitives, mostly taken from the LOGO language. The concurrent part of the language is based on the message passing paradigm. The solutions of some classical problems through ORESPICS are shown. 相似文献
6.
Matching high performance approximate inverse preconditioning to architectural platforms 总被引:1,自引:0,他引:1
K. M. Giannoutakis G. A. Gravvanis B. Clayton A. Patil T. Enright J. P. Morrison 《The Journal of supercomputing》2007,42(2):145-163
In this paper we examine the performance of parallel approximate inverse preconditioning for solving finite element systems,
using a variety of clusters containing the Message Passing Interface (MPI) communication library, the Globus toolkit and the
Open MPI open-source software. The techniques outlined in this paper contain parameters that can be varied so as to tune the
execution to the underlying platform. These parameters include the number of CPUs, the order of the linear system (n) and the “retention parameter” (δ
l) of the approximate inverse used as a preconditioner. Numerical results are presented for solving finite element sparse linear
systems on platforms with various CPU types and number, different compilers, different File System types, different MPI implementations
and different memory sizes.
相似文献
J. P. MorrisonEmail: |
7.
Atomistic simulations of thin film deposition, based on the lattice Monte Carlo method, provide insights into the microstructure evolution at the atomic level. However, large-scale atomistic simulation is limited on a single computer—due to memory and speed constraints. Parallel computation, although promising in memory and speed, has not been widely applied in these simulations because of the intimidating overhead. The key issue in achieving optimal performance is, therefore, to reduce communication overhead among processors. In this paper, we propose a new parallel algorithm for the simulation of large-scale thin film deposition incorporating two optimization strategies: (1) domain decomposition with sub-domain overlapping and (2) asynchronous communication. This algorithm was implemented both on message-passing-processor systems (MPP) and on cluster computers. We found that both architectures are suitable for parallel Monte Carlo simulation of thin film deposition in either a distributed memory mode or a shared memory mode with message-passing libraries. 相似文献
8.
From microarrays and next generation sequencing to clinical records, the amount of biomedical data is growing at an exponential rate. Handling and analyzing these large amounts of data demands that computing power and methodologies keep pace. The goal of this paper is to illustrate how high performance computing methods in SAS can be easily implemented without the need of extensive computer programming knowledge or access to supercomputing clusters to help address the challenges posed by large biomedical datasets. We illustrate the utility of database connectivity, pipeline parallelism, multi-core parallel process and distributed processing across multiple machines. Simulation results are presented for parallel and distributed processing. Finally, a discussion of the costs and benefits of such methods compared to traditional HPC supercomputing clusters is given. 相似文献
9.
VAV中央空调能耗建模与仿真研究 总被引:2,自引:0,他引:2
中央空调是一个复杂的非线性、时滞系统。中央空调系统在运行过程中存在着巨大的节能潜力,对中央空调系统的节能优化研究应以中央空调的能耗模型为基础。根据VAV中央空调各设备的能耗数学模型,并综合考虑VAV中央空调各设备之间的耦合关系,利用matlab中的simulink工具箱建立了反映VAV中央空调运行过程中各变量与系统能耗之间关系的仿真模型进行仿真,并对仿真结果进行了分析验证。模型可以用于对中央空调节能的参数优化研究中,对中央空调的节能优化控制具有重要的意义。 相似文献
10.
11.
Mostafa I. SolimanAuthor Vitae Ghada Y. AbozaidAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(8):1075-1084
This paper describes the FPGA implementation of FastCrypto, which extends a general-purpose processor with a crypto coprocessor for encrypting/decrypting data. Moreover, it studies the trade-offs between FastCrypto performance and design parameters, including the number of stages per round, the number of parallel Advance Encryption Standard (AES) pipelines, and the size of the queues. Besides, it shows the effect of memory latency on the FastCrypto performance. FastCrypto is implemented with VHDL programming language on Xilinx Virtex V FPGA. A throughput of 222 Gb/s at 444 MHz can be achieved on four parallel AES pipelines. To reduce the power consumption, the frequency of four parallel AES pipelines is reduced to 100 MHz while the other components are running at 400 MHz. In this case, our results show a FastCrypto performance of 61.725 bits per clock cycle (b/cc) when 128-bit single-port L2 cache memory is used. However, increasing the memory bus width to 256-bit or using 128-bit dual-port memory, improves the performance to 112.5 b/cc (45 Gb/s at 400 MHz), which represents 88% of the ideal performance (128 b/cc). 相似文献
12.
介绍了一种面向高性能计算的芯片组,在设计和实现的基础上抽象出信道和交叉开关的环境参数,围绕高性能计算的通信特征分析了测试模型参数,并给出与性能评价相关的各个参数;建立了硬件FPGA测试平台和软件仿真环境,测试并分析了芯片组各环境参数对通信延迟和带宽的影响,总结出面向高性能计算的芯片组应尽量提高每次交易的传输粒度,确定了其信道参数. 相似文献
13.
The HIRLAM (high resolution limited area modelling) limited-area atmospheric model was originally developed and optimized for shared memory vector-based computers, and has been used for operational weather forecasting on such machines for several years. This paper describes the algorithms applied to obtain a highly parallel implementation of the model, suitable for distributed memory machines. The performance results presented indicate that the parallelization effort has been successful, and the Norwegian Meteorological Institute will run the parallel version in production on a Cray T3E. 相似文献
14.
Energy consumption in datacenters has recently become a major concern due to the rising operational costs and scalability issues. Recent solutions to this problem propose the principle of energy proportionality, i.e., the amount of energy consumed by the server nodes must be proportional to the amount of work performed. For data parallelism and fault tolerance purposes, most common file systems used in MapReduce-type clusters maintain a set of replicas for each data block. A covering subset is a group of nodes that together contain at least one replica of the data blocks needed for performing computing tasks. In this work, we develop and analyze algorithms to maintain energy proportionality by discovering a covering subset that minimizes energy consumption while placing the remaining nodes in low-power standby mode in a data parallel computing cluster. Our algorithms can also discover covering subset in heterogeneous computing environments. In order to allow more data parallelism, we generalize our algorithms so that it can discover k-covering subset, i.e., a set of nodes that contain at least k replicas of the data blocks. Our experimental results show that we can achieve substantial energy saving without significant performance loss in diverse cluster configurations and working environments. 相似文献
15.
《Journal of Systems Architecture》2015,61(1):49-70
To meet the increasing complexity of mobile multimedia applications, SoCs equipping modern mobile devices integrate powerful heterogeneous processing elements among which Digital Signal Processors (DSP) and General Purpose Processors (GPP) are the most common ones. Due to the ever-growing gap between battery lifetime and hardware/software complexity in addition to application’s computing power needs, the energy saving issue becomes crucial in the design of such architectures. In this context, we propose in this paper an end-to-end study of video decoding on both GPP and DSP. The study was achieved thanks to a two steps methodology: (1) a comprehensive characterization and evaluation of the performance and the energy consumption of video decoding, (2) an accurate high level energy model is extracted based on the characterization step.The characterization of the video decoding is based on an experimental methodology and was achieved on an embedded platform containing a GPP and a DSP. This step highlighted the importance of considering the end-to-end decoding flow when evaluating the energy efficiency of video decoding application. The measurements obtained in this step were used to build a comprehensive analytical energy model for video decoding on both GPP and DSP. Thanks to a sub-model decomposition, the developed model estimates the energy consumption in terms of processor clock frequency and video bit-rate in addition to a set of constant coefficients which are related to the video complexity, the operating system and the considered hardware architecture. The obtained model gave very accurate results (R2 = 97%) for both GPP and DSP energy consumption. Finally, based on the results emerged from the modeling methodology, we show how one can build rapidly a video decoding energy model for a given target architecture without executing the full characterization steps described in this paper. 相似文献
16.
洪昌建 《电脑与微电子技术》2013,(24):19-22,31
无线传感器网络中,越靠近Sink 的节点将承担更多的数据转发,导致能量消耗较高而最先死亡,从而形成能量空洞使网络提前死亡。对传感器网络能量空洞进行研究,建立节点均匀分布的网络模型,提出一种分层的动态路由协议;通过分析网络动态路由获取各层网络节点能量负载情况,进而提出一种基于能量分配的传感器网络能量空洞避免算法EABEHA。仿真实验表明,该算法能够合理分配传感器网络的能量,和Flooding、LEACH等算法相比,EABEHA算法能够显著延长网络寿命。 相似文献
17.
Seung Woo Son Konrad Malkowski Guilin Chen Mahmut Kandemir Padma Raghavan 《The Journal of supercomputing》2007,41(3):179-213
Reducing power consumption is quickly becoming a first-class optimization metric for many high-performance parallel computing
platforms. One of the techniques employed by many prior proposals along this direction is voltage scaling and past research
used it on different components such as networks, CPUs, and memories. In contrast to most of the existent efforts on voltage
scaling that target a single component (CPU, network or memory components), this paper proposes and experimentally evaluates
a voltage/frequency scaling algorithm that considers CPU and communication links in a mesh network at the same time. More
specifically, it scales voltages/frequencies of CPUs in the nodes and the communication links among them in a coordinated
fashion (instead of one after another) such that energy savings are maximized without impacting execution time. Our experiments
with several tree-based sparse matrix computations reveal that the proposed integrated voltage scaling approach is very effective
in practice and brings 13% and 17% energy savings over the pure CPU and pure communication link voltage scaling schemes, respectively.
The results also show that our savings are consistent with the different network sizes and different sets of voltage/frequency
levels.
相似文献
Padma RaghavanEmail: |
18.
介绍DDS的性能特点和两种技术实现方案。针对DDS的缺点,利用多路并行技术设计的新型DDS具有扩展输出带宽、保特高速稳定和频谱杂散的性能。 相似文献
19.
全球人口的快速增长和技术进步极大地提高了世界的总发电量,电能消耗预测对于电力系统调度和发电量管理发挥着重要的作用,为了提高电能消耗的预测精度,针对能耗数据的复杂时序特性,文中提出了一种将注意力机制(Attention)放置于双层长短期记忆人工神经网络(Double layer Long Short-Term Memory,DLSTM)中的新颖夹层结构,即A-DLSTM。该网络结构利用夹层中的注意力机制自适应地关注单个时间单元中不同的特征量,通过双层LSTM网络对序列中的时间信息进行抓取,以对序列数据进行预测。文中的实验数据为UCI机器学习数据集上某家庭近5年的用电量,采用网格搜索法进行调参,实验对比了A-DLSTM与现有的模型在能耗数据上的预测性能,文中的网络结构在均方误差、均方根误差、平均绝对误差、平均绝对百分比误差上均达到了最优,且通过热力图对注意力层进行了分析,确定了对用电量预测影响最大的因素。 相似文献
20.
In this paper, we study and compare grid and global computing systems and outline the benefits of having a hybrid system called DIRAC. To evaluate the DIRAC scheduling for high throughput computing, a new model is presented and a simulator was developed for many clusters of heterogeneous nodes belonging to a local network. These clusters are assumed to be connected to each other through a global network and each cluster is managed via a local scheduler which is shared by many users. We validate our simulator by comparing the experimental and analytical results of a M/M/4 queuing system. Next, we do the comparison with a real batch system and we obtain an average error of 10.5% for the response time and 12% for the makespan. We conclude that the simulator is realistic and well describes the behaviour of a large-scale system. Thus we can study the scheduling of our system called DIRAC in a high throughput context. We justify our decentralized, adaptive and opportunistic approach in comparison to a centralized approach in such a context. 相似文献