首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
Automatic process partitioning is the operation of automatically rewriting an algorithm as a collection of tasks, each operating primarily on its own portion of the data, to carry out the computation in parallel. Hybrid shared memory systems provide a hierarchy of globally accessible memories. To achieve high performance on such machines one must carefully distribute the work and the data so as to keep the workload balanced while optimizing the access to nonlocal data. In this paper we consider a semi-automatic approach to process partitioning in which the compiler, guided by advice from the user, automatically transforms programs into such an interacting set of tasks. This approach is illustrated with a picture processing example written in BLAZE, which is transformed by the compiler into a task system maximizing locality of memory reference.Research supported by an IBM Graduate Fellowship.Research supported under NASA Contract No. 520-1398-0356.Research supported by NASA Contract No. NAS1-18107 while the last two authors were in residence at ICASE, NASA, Langley Research Center.  相似文献   

2.
In this paper we propose a fast method for solving wave guide problems. In particular, we consider the guide to be inhomogeneous, and allow propagation of waves of higher-order modes. Such techniques have been handled successfully for acoustic wave propagation problems with single mode and finite length. This paper extends this concept to electromagnetic wave guides with several modes and infinite in length. The method is shown and results of computations are presented.Research was supported by the National Aeronautics and Space Administration under NASA Contract No. NAS1-18107 while the first author was in residence at the ICASE, NASA Langley Research Center, Hampton, VA 23665-5225, and by NASA Grant No. NAG-1-624.  相似文献   

3.
This paper presents an analytically robust, globally convergent approach to managing the use of approximation models of varying fidelity in optimization. By robust global behaviour we mean the mathematical assurance that the iterates produced by the optimization algorithm, started at an arbitrary initial iterate, will converge to a stationary point or local optimizer for the original problem. The approach presented is based on the trust region idea from nonlinear programming and is shown to be provably convergent to a solution of the original high-fidelity problem. The proposed method for managing approximations in engineering optimization suggests ways to decide when the fidelity, and thus the cost, of the approximations might be fruitfully increased or decreased in the course of the optimization iterations. The approach is quite general. We make no assumptions on the structure of the original problem, in particular, no assumptions of convexity and separability, and place only mild requirements on the approximations. The approximations used in the framework can be of any nature appropriate to an application; for instance, they can be represented by analyses, simulations, or simple algebraic models. This paper introduces the approach and outlines the convergence analysis.This research was supported by the Dept. of Energy grant DEFG03-95ER25257 and Air Force Office of Scientific Research grant F49620-95-1-0210This research was supported by the National Aeronautics and Space Administration under NASA Contract No. NAS1-19480 while the author was in residence at the Institute for Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, VA 23681, USAThis research was supported by the Air Force Office of Scientific Research grant F49620-95-1-0210 and by the National Aeronautics and Space Administration under NASA Contract No. NAS1-19480 while the author was in residence at the Institute for Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, VA 23681, USA  相似文献   

4.
In irregular scientific computational problems one is periodically forced to choosea delay point where some overhead cost is suffered to ensure correctness, or to improve subsequent performance. Examples of delay points are problem remappings, and global synchronizations. One sometimes has considerable latitude in choosing the placement and frequency of delay points; we consider the problem of scheduling delay points so as to minimize the overal execution time. We illustrate the problem with two examples, a regridding method which changes the problem discretization during the course of the computation, and a method for solving sparse triangular systems of linear equations. We show that one can optimally choose delay points in polynomial time using dynamic programming. However, the cost models underlying this approach are often unknown. We consequently examine a scheduling heuristic based on maximizing performance locally, and empirically show it to be nearly optimal on both problems. We explain this phenomenon analytically by identifying underlying assumptions which imply that overall performance is maximized asymptotically if local performance is maximized.This research was supported in part by the National Aeronautics and Space Administration under NASA contract NAS1-18107 while the author consulted at ICASE, Mail Stop 132C, NASA Langley Research Center, Hampton, Virginia 23665.Supported in part by NASA contract NAS1-18107, the Office of Naval Research under Contract No. N00014-86-K-0654, and NSF Grant DCR 8106181.  相似文献   

5.
Fourier spectral method can achieve exponential accuracy both on the approximation level and for solving partial differential equations if the solutions are analytic. For a linear PDE with discontinuous solutions, Fourier spectral method will produce poor point-wise accuracy without post-processing, but still maintains exponential accuracy for all moments against analytic functions. In this note we assess the accuracy of Fourier spectral method applied to nonlinear conservation laws through a numerical case study. We have found out that the moments against analytic functions are no longer very accurate. However the numerical solution does contain accurate information which can be extracted by a Gegenbauer polynomial based post-processing.Research supported by ARO Grant DAAL03-91-G-0123 and DAAH04-94-G-0205, NSF Grant DMS-9211820, NASA Grant NAG1-1145 and contract NAS1-19480 while the first author was in residence at ICASE, NASA Langley Research Center, Hampton, Virginia 23681-0001, and AFOSR Grant 93-0090.  相似文献   

6.
A concurrent processing algorithm is developed for a materially nonlinear analysis of hollow square and rectangular structural sections and implemented on a special purpose multiprocessor computer at NASA Langley Research Center referred to as the Finite Element Machine (FEM). The cross-sectional thrust-moment-curvature relations are generated concurrently using a tangent stiffness approach, and yield surfaces are obtained that represent the interaction between axial load and biaxial moments. For the study, a maximum speed-up factor of 7.69 is achieved on eight processors.  相似文献   

7.
Since 1988 NASA Langley Research Center has supported a formal methods research program. From its inception, a primary goal of the program has been to transfer formal methods technology into aerospace industries focusing on applications in commercial air transport. The overall program has been described elsewhere. This paper gives an account of the technology transfer strategy and its evolution.  相似文献   

8.
风洞试验中,模型姿态是实验数据修正的重要环节,测量的准确与否对实验结果有着重要影响.从风洞试验实际出发,提出一种基于双目立体视觉的风洞模型姿态测量技术,并介绍了该技术在风洞中的应用.试验表明文中使用的方法与美国兰利研究中心31-inch Mach 10风洞中使用的视觉姿态测量系统精度相当,具有广泛的使用前景.  相似文献   

9.
《Real》2002,8(2):157-172
The high-speed civil transport (HSCT) aircraft has been designed with limited cockpit visibility. To handle this, the National Aeronautics and Space Administration (NASA) has proposed an external visibility system (XVS) to aid pilots in overcoming this lack of visibility. XVS obtains video images using high-resolution cameras mounted on and directed outside the aircraft. Images captured by the XVS enable automatic computer analysis in real-time, and thereby alert pilots about potential flight path hazards. Thus, the system is useful in helping pilots avoid air collisions. In this study, a system was configured to capture image sequences from an on-board high-resolution digital camera at a live video rate, record the images into a high-speed disk array through a fiber channel, and process the images using a Datacube MaxPCI machine with multiple pipelined processors to perform real-time obstacle detection. In this paper, we describe the design, implementation, and evaluation of this computer vision system. Using this system, real-time obstacle detection was performed and digital image data were obtained successfully in flight tests conducted at NASA Langley Research Center in January and September 1999. The system is described in detail so that other researchers can easily replicate the work.  相似文献   

10.
Wood  W.A. Kleb  W.L. 《Software, IEEE》2003,20(3):30-36
Can we successfully apply XP (Extreme Programming) in a scientific research context? A pilot project at the NASA Langley Research Center tested XPs applicability in this context. Since the cultural environment at a government research center differs from the customer-centric business view, eight of XPs 12 practices seemed incompatible with the existing research culture. Despite initial awkwardness, the authors determined that XP can function in situations for which it appears to be ill suited.  相似文献   

11.
In a previous work we studied the concurrent implementation of a numerical model, CONDIFP, developed for the analysis of depth-averaged convection–diffusion problems. Initial experiments were conducted on the Intel Touchstone Delta System, using up to 512 processors and different problem sizes. As for other computation-intensive applications, the results demonstrated an asymptotic trend to unity efficiency when the computational load dominates the communication load. This paper relates some other numerical experiences, in both one and two space dimensions with various choices of initial and boundary conditions, carried out on the Intel Paragon XP/S Model L38 with the aim to illustrate the parallel solver versatility and reliability.  相似文献   

12.
Parallel implementation of large-scale structural optimization   总被引:1,自引:0,他引:1  
Advances in computer technology and performance allow researchers to pose useful optimization problems that were previously too large for consideration. For example, NASA Langley Research Center is investigating the large structural optimization problems that arise in aircraft design. The total number of design variables and constraints for these nonlinear optimization problems is now an order of magnitude larger than anything previously reported. To find solutions in a reasonable amount of time, a coarse-grained parallel-processing algorithm is recommended. This paper studies the effects of problem size on sequential and parallel versions of this algorithm.For initial testing of this algorithm, a hub frame optimization problem is devised such that the size of the problem can be adjusted by adding members and load cases. Numerous convergence histories demonstrate that the algorithm performs correctly and in a robust manner. Timing profiles for a wide range of randomly generated problems highlight the changes in the subroutine timings that are caused by the increase in problem size. The potential benefits and drawbacks associated with the parallel approach are summarized.  相似文献   

13.
We consider the problem of optimally assigning the modules of a parallel/pipelined program over the processors of a multiple processor system under certain restrictions on the interconnection structure of the program as well as the multiple computer system. We show that for a variety of such problems, it is possible to find if a partition of the modular program exists in which the load on any processor is whithin a certain bound. This method when combined with a binary search over a fixed range, provides an optimal solution to the partitioning problem.The specific problems we consider are partitioning of (1) a chain structured parallel program over a chain-like computer system, (2) multiple chain-like programs over a host-satellite system, and (3) a tree structured parallel program over a host-satellite system.For a problem withN modules andM processors, the complexity of our algorithm is no worse thanO(Mlog(N)log(W T/)), whereW T is the cost of assigning all modules to one processors, and the desired accuracy. This algorithm provides an improvement over the recently developed best known algorithm that runs inO(MNlog(N)) time.This Research was supported by a grant from the Division of Research Extension and Advisory Services, University of Engineering and Technology Lahore, Pakistan. Further support was provided by NASA Contracts NAS1-17070 and NAS1-18107 while the author was resident at the Institute for Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, Virginia, USA.  相似文献   

14.
A concurrent processing algorithm is developed for materially nonlinear stability analysis of imperfect columns with biaxial partial rotational end restraints. The algorithm for solving the governing nonlinear ordinary differential equations is implemented on a multiprocessor computer called the finite element machine, developed at the NASA Langley Research Center. Numerical results are obtained on up to nine concurrent processors. A substantial computational gain is achieved in using the parallel processing approach.  相似文献   

15.
Major problems are faced in the aerospace industry today concerning safety in the crowded skies around airports and continuing increases in fuel prices. Lockheed-California Company in collaboration with the NASA Langley Research Center has been working on a development that tackles both of these problems — an airborne four-dimensional computer capability for the L-1011 Tristar jetliner. A trial installation plane is flying with colour electronic displays on a portion of its instrument panel to achieve these ends. The display system is intended to control flights to such a degree that arrival times can be predicted to within a matter of seconds, substantially reducing the congestion and delays of today's airways. Accurate on-board prediction of arrival times in conjunction with an en route traffic metering technique should make air traffic flow much more efficient and lead to a substantial reduction in fuel consumption.  相似文献   

16.
Numerical experiments on the accuracy of ENO and modified ENO schemes   总被引:6,自引:0,他引:6  
In this paper we make further numerical experiments assessing an accuracy degeneracy phenomena reported by A. Rogerson and E. Meiburg (this issue, 1990). We also propose a modified ENO scheme, which recovers the correct order of accuracy for all the test problems with smooth initial conditions and gives results comparable to the original ENO schemes for discontinuous problems.Research supported by NSF grant No. DMS88-10150, NASA Langley contract No. NAS1-18605, and AFOSR grant No. 90-0093. Computation supported by NAS.  相似文献   

17.
Ultralightweight Thread (uThread) is a library package designed and optimized for user-level management of parallelism in a single application program running on distributed-memory computers. Existing process management systems incur an unnecessarily high cost when used for the type of parallelism exploited within an application. By reducing the overhead of ownership protection and frequent context switches, uThread encourages both simplicity and performance. In addition, uThread provides various scheduling support to balance the system load. The uThread package reduces the cost of parallelism management to nearly the lower bound. This package has been successfully running on most distributed-memory computers, such as the Intel iPSC/860, Touchstone Delta. NCUBE, and TMC CM-5.This research was supported by NSF grants CCR-9109114.  相似文献   

18.
We have developed a flexible hybrid decomposition parallel implementation of the first-principles molecular dynamics algorithm of Car and Parrinello. The code allows the problem to be decomposed either spatially, over the electronic orbitals, or any combination of the two. Performance statistics for 32, 64, 128 and 512 Si atom runs on the Touchstone Delta and Intel Paragon parallel supercomputers and comparison with the performance of an optimized code running the smaller systems on the Cray Y-MP and C90 are presented.  相似文献   

19.
The large number of protein sequences, provided by genomic projects at an increasing pace, constitutes a challenge for large scale computational studies of protein structure and thermodynamics. Grid technology is very suitable to face this challenge, since it provides a way to access the resources needed in compute and data intensive applications. In this paper, we show the procedure to adapt to the Grid an algorithm for the prediction of protein thermodynamics, using the GridWay tool. GridWay allows the resolution of large computational experiments by reacting to events dynamically generated by both the Grid and the application. Eduardo Huedo, Ph.D.: He is a Computer Engineer (1999) and Ph.D. in Computer Architecture (2004) by the Universidad Complutense de Madrid (UCM). He is Scientist in the Advanced Computing Laboratory at Centro de Astrobiología (CSIC-INTA), associated to NASA Astrobiology Institute. He had one appointment in 2000 as a Summer Student in High Performance Computing and Applied Mathematics at ICASE (NASA Langley Research Center). His research areas are Performance Management and Tuning, High Performance Computing and Grid Technology. Ugo Bastolla, Ph.D.: He received his degree and Ph.D. in Physics in Rome University, with L. Peliti and G. Parisi respectively. He was interested from the beginning in biologically motivated problems, therefore, studied models of Population Genetics, Boolean Networks, Neural Networks, Statistical Mechanics of Polymers, Ecological and Biodiversity. His main research interest is constituted by studies of protein folding thermodynamics and evolution. Thereby, he set up an effective energy function allowing prediction of protein folding thermodynamics, and applied it to protein structure prediction, to simulate protein evolution and to analyze protein sequences from a thermodynamical point of view. He is currently in the Bioinformatic Unit of the Centro de Astrobiología of Madrid. Rubén S. Montero, Ph.D.: He received his B.S. in Physics (1996), M.S in Computer Science (1998) and Ph.D. in Computer Architecture (2002) from the Universidad Complutense de Madrid (UCM). He is Assistant Professor of Computer Architecture and Technology at UCM since 1999. He has held several research appointments at ICASE (NASA Langley Research Center), where he worked on computational fluid dynamics, parallel multigrid algorithms and Cluster computing. Nowadays, his research interests lie mainly in Grid Technology, in particular in adaptive scheduling, adaptive execution and distributed algorithms. Ignacio M. Llorente, Ph.D.: He received his B.S. in Physics (1990), M.S in Computer Science (1992) and Ph.D. in Computer Architecture (1995) from the Universidad Complutense de Madrid (UCM). He is Executive M.B.A. by Instituto de Empresa since 2003. He is Associate Professor of Computer Architecture and Technology in the Department of Computer Architecture and System Engineering at UCM and Senior Scientist at Centro de Astrobiología (CSIC-INTA), associated to NASA Astrobiology Institute. He has held several appointments since 1997 as a Consultant in High Performance Computing and Applied Mathematics at ICASE (NASA Langley Research Center). His research areas are Information Security, High Performance Computing and Grid Technology.  相似文献   

20.
The complete exchange (or all-to-all personalized) communication pattern occurs frequently in many important parallel computing applications. It is the densest form of communication because all processors need to communicate with all other processors. This can result in severe link contention and degrade performance considerably. Hence, it is necessary to use efficient algorithms in order to get good performance over a wide range of message and multiprocessor sizes. In this paper we present several algorithms to perform complete exchange on the Thinking Machines CM-5 and the Intel Touchstone Delta multiprocessors. Since these machines have different architectures and communication capabilities, different algorithms are needed to get the best performance on each of them. We present four algorithms for the CM-5 and six algorithms for the Delta. Complete exchange algorithms generally assume that the number of processors is a power of two. However, on the Delta the number of processors allocated by a user need not be a power of two. We propose algorithms that are even applicable to non-power-of-two meshes on the Delta. We have developed analytical models to estimate the performance of the algorithms on the basis of system parameters. Performance results on the CM-5 and Delta are also presented and analyzed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号