首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
P systems are inherently parallel and non-deterministic theoretical computing devices defined inside the field of Membrane Computing. Many P system simulators have been presented in this area, but they are inefficient since they cannot handle the parallelism of these devices. Nowadays, we are witnessing the consolidation of the GPUs as a parallel framework to compute general purpose applications. In this paper, we analyse GPUs as an alternative parallel architecture to improve the performance in the simulation of P systems, and we illustrate it by using the case study of a family of P systems that provides an efficient and uniform solution to the SAT problem. Firstly, we develop a simulator that fully simulates the computation of the P system, demonstrating that GPUs are well suited to simulate them. Then, we adapt this simulator to the GPU architecture idiosyncrasies, improving the performance of the previous simulator.  相似文献   

2.
In this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects and easily allows various formulas to model execution and communication times of particular blocks of code. A simulator engine within the MERPSYS environment simulates execution of the application that consists of processes with various codes, to which distinct labels are assigned. The simulator runs one Java thread per label and scales computations and communication times adequately. This approach allows fast coarse-grained simulation of large applications on large-scale systems. We have performed tests and verification of results from the simulator for three real parallel applications implemented with C/MPI and run on real HPC clusters: a master-slave code computing similarity measures of points in a multidimensional space, a geometric single program multiple data parallel application with heat distribution and a divide-and-conquer application performing merge sort. In all cases the simulator gave results very similar to the real ones on configurations tested up to 1000 processes. Furthermore, it allowed us to make predictions of execution times on configurations beyond the hardware resources available to us.  相似文献   

3.
This work proposes an electro-mechanical simulator of the cardiac tissue, so that its main feature is the low computational cost. This is necessary to run real-time simulations and on the fly applications. In order to achieve this, we used cellular automata and mass-spring systems to model the cardiac behavior, and furthermore parallelize the code to run in graphics processing unit (GPU) with compute unified device architecture. Sequentially, our simulator was quite faster than traditional partial differential equations simulators. In addition, we performed different load tests to evaluate our code behavior in GPUs, and spotted its potentials and bottlenecks.  相似文献   

4.
Membrane Computing is a discipline aiming to abstract formal computing models, called membrane systems or P systems, from the structure and functioning of the living cells as well as from the cooperation of cells in tissues, organs, and other higher order structures. This framework provides polynomial time solutions to NP-complete problems by trading space for time, and whose efficient simulation poses challenges in three different aspects: an intrinsic massively parallelism of P systems, an exponential computational workspace, and a non-intensive floating point nature. In this paper, we analyze the simulation of a family of recognizer P systems with active membranes that solves the Satisfiability problem in linear time on different instances of Graphics Processing Units (GPUs). For an efficient handling of the exponential workspace created by the P systems computation, we enable different data policies to increase memory bandwidth and exploit data locality through tiling and dynamic queues. Parallelism inherent to the target P system is also managed to demonstrate that GPUs offer a valid alternative for high-performance computing at a considerably lower cost. Furthermore, scalability is demonstrated on the way to the largest problem size we were able to run, and considering the new hardware generation from Nvidia, Fermi, for a total speed-up exceeding four orders of magnitude when running our simulations on the Tesla S2050 server.  相似文献   

5.
Face detection is a key component in applications such as security surveillance and human–computer interaction systems, and real-time recognition is essential in many scenarios. The Viola–Jones algorithm is an attractive means of meeting the real time requirement, and has been widely implemented on custom hardware, FPGAs and GPUs. We demonstrate a GPU implementation that achieves competitive performance, but with low development costs. Our solution treats the irregularity inherent to the algorithm using a novel dynamic warp scheduling approach that eliminates thread divergence. This new scheme also employs a thread pool mechanism, which significantly alleviates the cost of creating, switching, and terminating threads. Compared to static thread scheduling, our dynamic warp scheduling approach reduces the execution time by a factor of 3. To maximize detection throughput, we also run on multiple GPUs, realizing 95.6 FPS on 5 Fermi GPUs.  相似文献   

6.
Recently, cellular neural networks (CNNs) have been demonstrated to be a highly effective paradigm applicable in a wide range of areas. Typically, CNNs can be implemented using VLSI circuits, but this would unavoidably require additional hardware. On the other hand, we can also implement CNNs purely by software; this, however, would result in very low performance when given a large CNN problem size. Nowadays, conventional desktop computers are usually equipped with programmable graphics processing units (GPUs) that can support parallel data processing. This paper introduces a GPU-based CNN simulator. In detail, we carefully organize the CNN data as 4-channel textures, and efficiently implement the CNN computation as fragment programs running in parallel on a GPU. In this way, we can create a high performance but low-cost CNN simulator. Experimentally, we demonstrate that the resultant GPU-based CNN simulator can run 8–17 times faster than a CPU-based CNN simulator.  相似文献   

7.
We present an open and flexible software infrastructure that embeds physical hosts in a simulated network. In real-time network simulation, where real-world implementations of distributed applications and network services can run together with the network simulator that operates in real-time, real network packets are injected into the simulation system and subject to the simulated network conditions computed as a result of both real and virtual traffic traversing the network and competing for network resources. Our real-time simulation infrastructure has been implemented based on Open Virtual Private Network (OpenVPN), modified and customized to bridges traffic between the physical hosts and the simulated network. We identify the performance advantages and limitations of our approach via a set of experiments. We also present two interesting application scenarios to show the capabilities of the real-time simulation infrastructure.  相似文献   

8.
In this work, we developed a parallel algorithm to speed up the resolution of differential matrix Riccati equations using a backward differentiation formula algorithm based on a fixed‐point method. The role and use of differential matrix Riccati equations is especially important in several applications such as optimal control, filtering, and estimation. In some cases, the problem could be large, and it is interesting to speed it up as much as possible. Recently, modern graphic processing units (GPUs) have been used as a way to improve performance. In this paper, we used an approach based on general‐purpose computing on graphics processing units. We used NVIDIA © GPUs with unified architecture. To do this, a special version of basic linear algebra subprograms for GPUs, called CUBLAS, and a package (three different packages were studied) to solve linear systems using GPUs have been used. Moreover, we developed a MATLAB © toolkit to use our implementation from MATLAB in such a way that if the user has a graphic card, the performance of the implementation is improved. If the user does not have such a card, the algorithm can also be run using the machine CPU. Experimental results on a NVIDIA Quadro FX 5800 are shown. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

9.
In this paper we propose an alternative use of commercial object-oriented discrete-event simulators. We attempt to bridge the intrinsic inaccuracy of simulators in modelling real systems affected by fuzziness. The strategy adopted, called pseudo-fuzzy discrete event simulation, models fuzziness through a set of several classic simulation runs to trace an output fuzzy performance function. The idea behind the approach proposed is to use the simulator as a fuzzy operator, which embeds some stochastic functions.A benchmark industrial setting has been used to build a reference simulation model and perform evaluations of the simulation strategy proposed for a specific working case.  相似文献   

10.
Debugging low level language software can be a difficult business—the bare microprocessor lacks the user interface; facilities, such as simulators, provided on another machine can be tedious to use on running or partially working programs. A resolution of this difficulty lies in the harmonious use of a mainframe-based debugging system and a real microprocessor. This paper describes tools and techniques for the development of low level software for the Intel 8080 microprocessor employing both simulated and real microprocessors. The debugging system allows the user to set up a virtual microprocessor into which programs can be loaded and run and from which a flexible tracing of the executing program can be obtained. The debugging system is built into a general purpose multi-access operating system and this approach makes it possible to provide the system cheaply to a large number of users. Additionally, users have access to all the general facilities (such as editors) provided by the operating system itself. The system has been implemented on a minicomputer with 60 VDU terminals all of which can be used for interaction with the 8080 simulator or for general interactions with the mainframe operating system. The system has also been implemented in a self-simulating version to run on a real Intel 8080 microprocessor. Many of the VDUs on the minicomputer are controlled by Intel 8080 microprocessors and it is possible to load and run large Intel 8080 programs which were assembled (and tested) on the mainframe or run the self simulator based debugging system. In this way a range of complementary debugging environments is provided: the simulator on the mainframe with its access to backing store (and, hence, the ability to save trace information), the self simulator on the microprocessor (with its better performance for the single user), and the actual Intel 8080 itself. The user can easily move a partially tested program between environments and thus use the environment which best suits his current phase of testing.  相似文献   

11.
Many-core accelerators are being more frequently deployed to improve the system processing capabilities. In such systems, application mapping must be enhanced to maximize utilization of the underlying architecture. Especially, in graphics processing units (GPUs), mapping kernels that are part of multi-kernel applications has a great impact on overall performance, since kernels may exhibit different characteristics on different CPUs and GPUs. While some kernels run faster on GPUs, others may perform better in CPUs. Thus, heterogeneous execution may yield better performance than executing the application only on a CPU or only on a GPU. In this paper, we investigate on two approaches: a novel profiling-based adaptive kernel mapping algorithm to assign each kernel of an application to the proper device, and a Mixed-Integer Programming (MIP) implementation to determine optimal mapping. We utilize profiling information for kernels on different devices and generate a map that identifies which kernel should run where in order to improve the overall performance of an application. Initial experiments show that our approach can efficiently map kernels on CPUs and GPUs, and outperforms CPU-only and GPU-only approaches.  相似文献   

12.
In this paper, we describe an interactive real-time simulation of granular, spherical particles which is able to run on a single workstation. The simulation is based on a discrete element method approach and fully implemented using Open Computing Language, enabling execution on CPUs and GPUs alike. The simulation results are visualised using DirectX 10 and instancing. Furthermore, we enable the user to control the visualisation and the simulation in a very intuitive way by supporting user tracking and speech recognition, both using the Microsoft Kinect sensor. We also compare the performance of different implementation strategies on both CPUs and GPUs, and, as a sample application, we simulate the Brazil nut effect.  相似文献   

13.
Many languages for computer systems simulation (like GPSS and CSim) use a stochastic model of systems with the provision of adding procedural code for those aspects of the system that cannot be captured easily by a stochastic model. However, they do not support the hierachical simulation of complex systems well. Complex computer systems may have to be simulated at various levels of abstraction in the interests of tractability: the flexibility of being able to freely move between the different levels of abstraction is very desirable. For example, in the area of computer architecture, one might have analytical models, detailed simulation models and trace-driven models. In addition, these languages do not have user-friendly interfaces for specification of the simulated system. In this paper, we discuss the design and implementation of a simulation package for hierachical simulation of non-real-time computer systems: a Simulator Generator from a Graphical System Specification (SIGGSYS}). A new language for system specification has been designed. In addition, the package has the following components: • A graphical user interface to aid specification of the system to be simulated. • A rear end that generates C++ code that implements a simulator for the specified system. • A complete object library along with the header files that implement a functionally complete set of C++ base classes which can be built upon. C++ has been chosen as the intermediate language so that the modeller can use its support for object oriented programming. © 1997 John Wiley & Sons, Ltd.  相似文献   

14.
Modern graphics processing units (GPUs) have been at the leading edge of increasing parallelism over the last 10 years. This fact has encouraged the use of GPUs in a broader range of applications, where developers are required to lever age this technology with new programming models which ease the task of writing programs to run efficiently on GPUs. In this paper, we discuss the main guidelines to assist the developer when porting sequential scientific code on modern GPUs. These guidelines were carried out by porting the L-BFGS, the (Limited memory-) BFGS algorithm for large scale optimization, available as Harwell routine VA15. The specific interest in the L-BFGS algorithm arises from the fact that this is the computational module with the longest running time of a Oceanographic Data Assimilation application software, on which some of the authors are working.  相似文献   

15.
A computer model of the patient end tidal CO2 controller system has been developed and tested in simulation trials. It is intended to aid in finding the appropriate PI (proportional-integral) controller settings by means of computer simulation instead of real experiments with the system. The latter approach is costly, time consuming and sometimes impossible to perform. The simulator consists of two equations: the patient equation and the PI controller equation. The software has been written in the C language and can be run on an IBM-PC/XT. Some examples of the simulation trials, illustrating the choice of controller settings, are given.  相似文献   

16.
17.
Graphics processing units (GPUs) pose an attractive choice for designing high-performance and energy-efficient software systems. This is because GPUs are capable of executing massively parallel applications. However, the performance of GPUs is limited by the contention in memory subsystems, often resulting in substantial delays and effectively reducing the parallelism. In this paper, we propose GRAB, an automated debugger to aid the development of efficient GPU kernels. GRAB systematically detects, classifies and discovers the root causes of memory-performance bottlenecks in GPUs. We have implemented GRAB and evaluated it with several open-source GPU kernels, including two real-life case studies. We show the usage of GRAB through improvement of GPU kernels on a real NVIDIA Tegra K1 hardware – a widely used GPU for mobile and handheld devices. The guidance obtained from GRAB leads to an overall improvement of up to 64%.  相似文献   

18.
This paper introduces the work done to improve on a sophisticated Underwater Robotic Vehicle (URV) inspection and repair system for submerged structures. It is undertaken as part of a research programme grant to pursue research and development of technologies and systems for the advancement of knowledge and for possible commercial exploitation relevant to the oil and gas industry. In particular, the paper focuses on the development of a unified pilot training and controls system that incorporates an advance man–machine interface for improving operator dexterity. Few formalised training procedures exist for URV pilots. In spite of the high cost, most URV pilots receive their training on-the-job. Training simulators can be viewed as a viable solution to this problem. Some attention has been made to address this problem. Notably are efforts by Imetrix URV-Mentor system, which focuses on VE simulation and on-line tutoring. Simulators, however, represents additional costs and in some ways lacks the realism of working on the real system. In the R 2 C the researchers proposed a novel simulator configuration. We have developed a dual-purpose topside control system configuration that can be used for training as well as for on-line operation of an actual URV. In the simulator configuration, the physical URV is replaced by a simulator module, which accepts actual commands from the control system and responds with a simulated URV status, using a dynamic model of the URV. The simulator module behaves much like the actual URV accepting commands and responds with status information. The advantage of such a system is perceived to be lower system cost as well as a more realistic testing and simulation of the relevant processes.  相似文献   

19.
Graphics Processing Units (GPUs) have become increasingly powerful over the last decade. Programs taking advantage of this architecture can achieve large performance gains and almost all new solutions and initiatives in high performance computing are aimed in that direction. To write programs that can offload the computation onto the GPU and utilize its power, new technologies are needed. The recent introduction of Open Computing Language (OpenCL), a standard for cross-platform, parallel programming of modern processors, has made a step in the right direction. Code written with OpenCL can run on a wide variety of platforms, adapting to the underlying architecture. It is versatile yet easy to learn due to similarities with the C programming language. In this paper, we will review the current state of the art in the use of GPUs and OpenCL for parallel computations. We use an implementation of the n-body simulation to illustrate some important considerations in developing OpenCL programs.  相似文献   

20.
Graphics processing units (GPUs) are becoming increasingly important in today’s platforms as their growing generality allows for them to be used as powerful co-processors. In previous work, the authors showed that GPUs may be integrated into real-time systems by treating GPUs as shared resources, allocated to real-time tasks through mutual exclusion locking protocols. In this paper, an asymptotically optimal k-exclusion locking protocol is presented for globally-scheduled job-level static-priority (JLSP) systems. This protocol may be used to manage a pool of resources, such as GPUs, in such systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号