期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A. Averbuch R. Dekel E. Gabber 《Concurrency and Computation》1996,8(2):91-123

The Portable Parallelizing Fortran Compiler (PPFC) is an additional component for the portable programming environment developed in Tel-Aviv University for scientific code. This environment supports portable and efficient programming of diverse MIMD multiprocessors, both distributed- and shared-memory. Till now this environment has consisted of two tools: the Virtual Machine for MultiProcessors (VMMP) and the Portable Parallelizing Pascal compiler (P³C). We have added the PPFC which is an automatic parallelizer compiler for the Fortran language. The compiler is fully automatic (does not require additional declarations to assist parallelization), which is characterized by loops operating on regular data structures, and produces efficient and portable code for a variety of multiprocessors from the same serial code. The parallel implementation uses the VMMP, which is a software package that provides a coherent set of services for explicitly parallel application programs running on diverse MIMD multiprocessors. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. The PPFC parallelized 12 out of the 24 Livermore Loops. It was also applied to parallelize all the 14 Fortran application programs that where parallelized by the P³C and achieved the same speed-ups and efficiencies. In most examples the PPFC achieved high speed-ups and efficiencies on all target multiprocessors. The PPFC emphasizes efficiency and code portability. Although PPFC employs a relatively simple data flow analysis, it produces efficient code for various widely used application programs. 相似文献

2.

A Comparison of Co-Array Fortran and OpenMP Fortran for SPMD Programming

Alan J. Wallcraft 《The Journal of supercomputing》2002,22(3):231-250

Co-Array Fortran, formally called F^––, is a small set of extensions to Fortran 90/95 for Single-Program-Multiple-Data (SPMD) parallel processing. OpenMP Fortran is a set of compiler directives that provide a high level interface to threads in Fortran, with both thread-local and thread-shared memory. OpenMP is primarily designed for loop-level directive-based parallelization, but it can also be used for SPMD programs by spawning multiple threads as soon as the program starts and having each thread then execute the same code independently for the duration of the run. The similarities and differences between these two SPMD programming models are described.Co-Array Fortran can be implemented using either threads or processes, and is therefore applicable to a wider range of machine types than OpenMP Fortran. It has also been designed from the ground up to support the SPMD programming style. To simplify the implementation of Co-Array Fortran, a formal Subset is introduced that allows the mapping of co-arrays onto standard Fortran arrays of higher rank. An OpenMP Fortran compiler can be extended to support Subset Co-Array Fortran with relatively little effort. 相似文献

3.

程序执行时间的静态预估与可视化分析方法 总被引：3，自引：0，他引：3

下载免费PDF全文

孙昌爱金茂忠刘超靳若明《软件学报》2003,14(1):68-75

软件时间性能分析与评估技术是实时软件开发中的一个重要课题.提出了一种基于控制流程图的程序执行时间的可视化分析框架,研究了中间代码段与源程序中语句的对应关系的自动分析、源程序语句行的CPU周期数的提取和计算方法、基于控制流程图的点到点最大时间分析算法和CPU周期的绝对时间估计方法.设计并实现了一个实用的基于控制流程图的程序执行时间静态分析与评估工具.最后,对研究工作进行了相关比较和总结. 相似文献

4.

Criticality: static profiling for real-time programs

Florian Brandner Stefan Hepp Alexander Jordan 《Real-Time Systems》2014,50(3):377-410

With the increasing performance demand in real-time systems it becomes more and more important to provide feedback to programmers and software development tools on the performance-relevant code parts of a real-time program. So far, this information was limited to an estimation of the worst-case execution time (WCET) and its associated worst-case execution path (WCEP) only. However, both, the WCET and the WCEP, only provide partial information. Only code parts that are on one of the WCEPs are indicated to the programmer. No information is provided for all other code parts. To give a comprehensive view covering the entire code base, tools in the spirit of program profiling are required. This work proposes an efficient approach to compute worst-case timing information for all code parts of a program using a complementary metric, called criticality. Every statement of a program is assigned a criticality value, expressing how critical the code is with respect to the global WCET. This gives valuable information how close the worst execution path passing through a specific program part is to the global WCEP. We formally define the criticality metric and investigate some of its properties with respect to dominance in control-flow graphs. Exploiting some of those properties, we propose an algorithm that reduces the overhead of computing the metric to cover complete programs. We also investigate ways to efficiently find only those code parts whose criticality is above a given threshold. Experiments using well-established real-time benchmark programs show an interesting distribution of the criticality values, revealing considerable amounts of highly critical as well as uncritical code. The metric thus provides ideal information to programmers and software development tools to optimize the worst-case execution time of these programs. 相似文献

5.

结合AOP与反射机制动态改变软件的行为

王小民杨志辉张雄许满武《计算机科学》2007,34(11):274-278

软件系统的运行环境日益复杂，这样的复杂性已经远远超过了人的控制能力。面向对象的程序设计方法会造成关注点不能分离，代码纠缠在一起，使得软件的模块性与复用性大大降低。面向方面的程序设计（Aspect-oftented programming，AOP）可以很好地分离关注点使软件更好地模块化。使用反射机制（Reflection），可以使程序在运行时通过自省（introspection）了解自己的状态，自己调节（intercession）自己（运行时自动修改程序），即动态地获得新的行为的能力。我们结合使用这两种方法的优点，使用AspectJ和Java的反射机制使得软件在运行时可以根据运行情况动态地改变行为。相似文献

6.

Inner loops in flowgraphs and code optimization

S. Vasudevan 《Acta Informatica》1982,17(2):143-155

Summary A criterion is developed to define a hierarchy of inner loops in a program which constitute sections of the program which take up large proportions of the execution time; this hierarchy lends a dynamic loop structure to the program. It is assumed that the program has been given a flowgraph representation in which each vertex corresponds to a statement or a set of statements and the flow in each edge corresponds to the frequency of passage of control from one statement or set of statements to another. While developing this criterion an attempt is made to guarantee that moving a loop invariant statement from an inner loop to a point outside of the loop would always yield a more optimal code. 相似文献

7.

On estimating the useful work distribution of parallel programs under P3T: a static performance estimator

Thomas Fahringer 《Concurrency and Computation》1996,8(4):261-282

In order to improve a parallel program's performance it is critical to evaluate how even the work contained in a program is distributed over all processors dedicated to the computation. Traditional work distribution analysis is commonly performed at the machine level. The disadvantage of this method is that it cannot identify whether the processors are performing useful or redundant (replicated) work. The paper describes a novel method of statically estimating the useful work distribution of distributed-memory parallel programs at the program level, which carefully distinguishes between useful and redundant work. The amount of work contained in a parallel program, which correlates with the number of loop iterations to be executed by each processor, is estimated by accurately modeling loop iteration spaces, array access patterns and data distributions. A cost function defines the useful work distribution of loops, procedures and the entire program. Lower and upper bounds of the described parameter are presented. The computational complexity of the cost function is independent of the program's problem size, statement execution and loop iteration counts. As a consequence, estimating the work distribution based on the described method is considerably faster than simulating or actually compiling and executing the program. Automatically estimating the useful work distribution is fully implemented as part of P³T, which is a static parameter based performance prediction tool under the Vienna Fortran Compilation System (VFCS). The Lawrence Livermore Loops are used as a test case to verify the approach. 相似文献

8.

基于分布函数的WCET快速估计

周国昌郭宝龙高翔王健闫允一《计算机科学》2016,43(5):157-161

在实时软件系统中,软件时间性能的分析与评估技术是一个重要的课题,然而随着CPU的结构越来越复杂,采用传统的模拟底层硬件执行的方法越来越困难。而基于分布函数的最坏执行时间(Worst Case Execution Time,WCET)估计方法从概率角度出发,可以绕过复杂的底层硬件建模,估计程序的最坏执行时间。首先对TI TMS320C6713 DSP汇编代码进行基本块的划分,以基本块为结点构建程序流图;然后用贝塔分布模拟每条指令的运行时间并采用改进的计划评审技术(Program Evaluation and Review Technique,PERT)确定贝塔分布相关参数,指令叠加后用正态分布模拟每个基本块的执行时间;最后利用基于路径的方法得到整个程序的最坏执行时间。实验结果表明此方法是可行的和合理的。相似文献

9.

CXTANNEAL: an improved program for estimating solute transport parameters 总被引：1，自引：0，他引：1

L. Li D. A. Barry J. Morris F. Stagnitti 《Environmental Modelling & Software》1999,14(6):109

CXTANNEAL is a program for analysing contaminant transport in soils. The code, written in Fortran 77, is a modified version of CXTFIT, a commonly used package for estimating solute transport parameters in soils. The improvement of the present code is that it includes simulated annealing as the optimization technique for curve fitting. Tests with hypothetical data show that CXTANNEAL performs better than the original code in searching for optimal parameter estimates. To reduce the computational time, a parallel version of CXTANNEAL (CXTANNEAL_P) was also developed. 相似文献

10.

An approach to genuine dynamic linking

W. Wilson Ho Ronald A. Olsson 《Software》1991,21(4):375-390

This paper describes a new approach to dynamic link/unlink editing. The basis of this approach is a library of link editing functions that can add compiled object code to or remove such code from a process any time during its execution. Loading modules, searching libraries, resolving external references, and allocating storage for global and static data structures are all performed at run time. This approach provides the efficiency of native machine code execution along with the flexibility to modify a program during its execution, thereby making many new applications possible. This paper also describes three sample applications of these dynamic link editing functions: program customization, incremental program development, and support for debugging and testing. A prototype of this approach is implemented under UNIX as a library package called dld for the C programming language and is available for VAX, Sun 3 and SPARCstation machines. 相似文献

11.

Using SPEC CPU2006 to evaluate the sequential and parallel code generated by commercial and open-source compilers

Aldea Sergio Llanos Diego R. González-Escribano Arturo 《The Journal of supercomputing》2012,59(1):486-498

相似文献

12.

Subprogram inlining: a study of its effects on program executiontime

Davidson J.W. Holler A.M. 《IEEE transactions on pattern analysis and machine intelligence》1992,18(2):89-102

相似文献

13.

Development of dynamic protection against timing channels

Shahrzad Kananizadeh Kirill Kononenko 《International Journal of Information Security》2017,16(6):641-651

Information systems face many threats, such as covert channels, which declassify hidden information by, e.g., analyzing the program execution time. Such threats exist at various stages of the execution of instructions. Even if software developers are able to neutralize these threats in source code, new attack vectors can arise in compiler-generated machine code from these representations. Existing approaches for preventing vulnerabilities have numerous restrictions related to both their functionality and the range of threats that can be found and removed. This study presents a technique for removing threats and generating safer code using dynamic compilation in an execution environment by combining information from program analysis of the malicious code and re-compiling such code to run securely. The proposed approach stores summary information in the form of rules that can be shared among analyses. The annotations enable us to conduct the analyses to mitigate threats. Developers can update the analyses and control the volume of resources that are allocated to perform these analyses by changing the precision. The authors’ experiments show that the binary code created by applying the suggested method is of high quality. 相似文献

14.

A Data Parallel Scientific Modeling Language

《Journal of Parallel and Distributed Computing》1994,21(1):46-60

The data parallel meta language (DPML) and its associated Fortran source code rewriter (DP77) support architecture independent, high performance climate and weather prediction models. The language allows the data domains over which a program operates, the communication patterns required between elements of those data domains, and some or all of the calculations of a program to be expressed at a very high level. DPML uses explicit data parallelism to express the inherent parallelism of the models, with the result that programs are easily compilable into target machine code. DP77 uses information from the DPML program to translate Fortran routines into the host specific Fortran form required for their parallel execution within the model. This paper describes the general strategy behind the development of DPML, discusses its language features using examples drawn from climate modelling, and provides details of the mechanism it uses for incorporating Fortran into data parallel programs. Encouraging results are reported for DPML versions of the standard weather benchmark models executing on vector, SIMD, and MIMD (shared memory) machines. While the paper is set within the framework of climate modelling, the technique has obvious wider implications. 相似文献

15.

The MOLDY short-range molecular dynamics package

G.J. Ackland K. D?Mellow S.L. Daraszewicz D.J. Hepburn M. Uhrin K. Stratford 《Computer Physics Communications》2011,182(12):2587-2604

We describe a parallelised version of the MOLDY molecular dynamics program. This Fortran code is aimed at systems which may be described by short-range potentials and specifically those which may be addressed with the embedded atom method. This includes a wide range of transition metals and alloys. MOLDY provides a range of options in terms of the molecular dynamics ensemble used and the boundary conditions which may be applied. A number of standard potentials are provided, and the modular structure of the code allows new potentials to be added easily. The code is parallelised using OpenMP and can therefore be run on shared memory systems, including modern multicore processors. Particular attention is paid to the updates required in the main force loop, where synchronisation is often required in OpenMP implementations of molecular dynamics. We examine the performance of the parallel code in detail and give some examples of applications to realistic problems, including the dynamic compression of copper and carbon migration in an iron–carbon alloy.

Program summary

Program title: MOLDYCatalogue identifier: AEJU_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJU_v1_0.htmlProgram obtainable from: CPC Program Library, Queen?s University, Belfast, N. IrelandLicensing provisions: GNU General Public License version 2No. of lines in distributed program, including test data, etc.: 382 881No. of bytes in distributed program, including test data, etc.: 6 705 242Distribution format: tar.gzProgramming language: Fortran 95/OpenMPComputer: AnyOperating system: AnyHas the code been vectorised or parallelized?: Yes. OpenMP is required for parallel executionRAM: 100 MB or moreClassification: 7.7Nature of problem: Moldy addresses the problem of many atoms (of order 10⁶) interacting via a classical interatomic potential on a timescale of microseconds. It is designed for problems where statistics must be gathered over a number of equivalent runs, such as measuring thermodynamic properities, diffusion, radiation damage, fracture, twinning deformation, nucleation and growth of phase transitions, sputtering etc. In the vast majority of materials, the interactions are non-pairwise, and the code must be able to deal with many-body forces.Solution method: Molecular dynamics involves integrating Newton?s equations of motion. MOLDY uses verlet (for good energy conservation) or predictor–corrector (for accurate trajectories) algorithms. It is parallelised using open MP. It also includes a static minimisation routine to find the lowest energy structure. Boundary conditions for surfaces, clusters, grain boundaries, thermostat (Nose), barostat (Parrinello–Rahman), and externally applied strain are provided. The initial configuration can be either a repeated unit cell or have all atoms given explictly. Initial velocities are generated internally, but it is also possible to specify the velocity of a particular atom. A wide range of interatomic force models are implemented, including embedded atom, Morse or Lennard-Jones. Thus the program is especially well suited to calculations of metals.Restrictions: The code is designed for short-ranged potentials, and there is no Ewald sum. Thus for long range interactions where all particles interact with all others, the order-N scaling will fail. Different interatomic potential forms require recompilation of the code.Additional comments: There is a set of associated open-source analysis software for postprocessing and visualisation. This includes local crystal structure recognition and identification of topological defects.Running time: A set of test modules for running time are provided. The code scales as order N. The parallelisation shows near-linear scaling with number of processors in a shared memory environment. A typical run of a few tens of nanometers for a few nanoseconds will run on a timescale of days on a multiprocessor desktop. 相似文献

16.

Tools to aid in discovering parallelism and localizing arithmetic in Fortran programs

Wayne R. Cowell Christopher P. Thompson 《Software》1990,20(1):25-47

We describe a collection of software tools that analyse and transform Fortran programs. The analysis tools detect parallelism in blocks of code and are primarily intended to aid in adapting existing programs to execute on multiprocessors. The transformation tools are aimed at eliminating data dependencies, thereby introducing parallelism, and at localizing arithmetic in registers, of primary interest in adapting programs to execute on machines that can be memory bound (common for machines with vector architecture). The tools are unified conceptually by their use of a set of conditions for data independence; these conditions have been implemented so as to combine tool analysis with user/tool interaction. We include timing results from applying the tools to programs intended for execution on two machines with different architectures — a Sequent Balance and a CRAY-2. The tools are written in Fortran in the tool-writing environment provided by Toolpack and are easily incorporated into a Toolpack installation. 相似文献

17.

Parallelization support for coupled grid applications with small meshes

Lorie M. Liebrock Ken Kennedy 《Concurrency and Computation》1996,8(8):581-615

Composite grid problems arise in important application areas, e.g. reactor simulation. Related physical phenomena are inherently parallel and their simulations are computationally intensive. Unfortunately, parallel languages, such as High Performance Fortran, provide little support for these problems. We illustrate topological connections via a coupling statement, develop a programming style and transformation system to support composite grid code development, and develop an algorithm that automatically determines distributions for composite grid problems with small meshes. A mesh is classified as small if the amount of computational work associated with the mesh is less than the amount of work to be assigned to a single processor. Precompiler transformations, such as cloning for alignment specification, are described. Excerpts from a High Performance Fortran program before and after transformation illustrate user programming style and transformation issues. Our distribution algorithm's alignment and distribution specifications are input to the transformed High Performance Fortran programs which applies the mapping for execution of the simulation code. Some advantages of this approach are: transformations are applied before compilation and allow communication optimization; data distribution may be determined for any number of problems without recompilation; user determined distribution for parallelization is unnecessary; portability is improved. We validate the topology-based data distribution algorithm using a number of reactor configurations. Two random distribution algorithms provide a basis of comparison with measures of load balance and communication cost. Experiments show that the topology-based distribution algorithm almost always obtains load balance at least as good as, and often significantly better than, random algorithms while reducing the total communication per iteration from 50% to as much as a factor of ten. 相似文献

18.

基于参数变化的云应用程序执行时间预估方法

郑顾平王秋萍《计算机工程与应用》2017,53(11):95-99

根据程序执行的语句条数,提出了一种基于参数变化的程序执行时间预估方法。通过分析任务源码,进行资源评估,得出任务执行的语句数;通过自动曲线拟合,得出语句数随参数变化的趋势;通过实际运行程序,找出了语句数与运行时间的关系,得出程序运行时间随参数变化的趋势。使用pascal汉诺塔程序对预估方法进行验证,发现测得的参数变化下的程序运行时间与实际运行时间非常接近,误差率很小。相似文献

19.

A comparison of two Fortran dialects for expressing parallel solutions for a problem in linear algebra

M. Clint J. S. Weston C. W. Bleakney 《Parallel Computing》1992,18(12):1325-1333

Recently, AMT has issued an extended version of Fortran Plus [1] which allows software to be developed without the developer needing to take explicit accout of the grid size of the target processor. Fortran-Plus and its extension, Fortran Plus Enhanced [2], have been developed for use on the AMT DAP 510 array processor. This machine has 1024 processors arranged in a square grid with nearest neighbour and wraparound connections. It is interesting to enquire whether the performance of code generated by the Fortran-Plus Enhanced compiler is, for a particular application, superior to that generated by the Fortran-Plus compiler from a program which recognises and is tailored to fit the characteristic features of the DAP 510. In this paper the performances of two implementations of an algorithm for the eigensolution of real tridiagonal symmetric matrices are compared. The algorithm is characterised by its heavy use of matrix operations, all of which can be efficiently implemented on an array processor. Some of the constituent operations commonly occur in other applications while others are specific to the problem being addressed. 相似文献

20.

A fortran language system for mutation-based software testing

K. N. King A. Jefferson Offutt 《Software》1991,21(7):685-718

相似文献