首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
针对过程神经网络时空聚合运算机制复杂、学习周期长的问题,提出了一种基于数据并行的过程神经网络训练算法。该方法基于梯度下降的批处理训练方式,应用MPI并行模式进行算法设计,在局域网内实现多台计算机的机群并行计算。文中给出了基于数据并行的过程神经网络训练算法和实现机制,对不同规模的训练函数样本集和进程数进行了对比实验,并对加速比、并行效率等算法性质进行了分析。实验结果表明,根据网络和样本规模适当选取并行粒度,算法可较大提高过程神经网络的训练效率。  相似文献   

2.
The resolution of combinatorial optimization problems can greatly benefit from the parallel and distributed processing which is characteristic of neural network paradigms. Nevertheless, the fine grain parallelism of the usual neural models cannot be implemented in an entirely efficient way either in general-purpose multicomputers or in networks of computers, which are nowadays the most common parallel computer architectures. Therefore, we present a parallel implementation of a modified Boltzmann machine where the neurons are distributed among the processors of the multicomputer, which asynchronously compute the evolution of their subset of neurons using values for the other neurons that might not be updated, thus reducing the communication requirements. Several alternatives to allow the processors to work cooperatively are analyzed and their performance detailed. Among the proposed schemes, we have identified one that allows the corresponding Boltzmann Machine to converge to solutions with high quality and which provides a high acceleration over the execution of the Boltzmann machine in uniprocessor computers.  相似文献   

3.
针对目前神经网络在处理类似生物信息数据库这类较大规模数据时,遇到的大规模数据处理耗时过长、内存资源不足等问题.在分析当前神经网络分布式学习的基础上,提出了一种新的基于Agent和切片思想的分布式神经网络协同训练算法.通过对训练样本和训练过程的有效切分,整个样本集的学习被分配到一个分布式神经网络集群环境中进行协同训练,同时通过竞争筛选机制,使得学习性能较好的训练个体能有效地在神经网络群中迁移,以获得较多的资源进行学习.理论分析论证了该方法不仅能有效提高神经网络向目标解收敛的成功率,同时也具有较高的并行计算性能,以加快向目标解逼近的速度.最后,该方法被应用到了蛋白质二级结构预测这一生物信息学领域的问题上.结果显示,该分布式学习算法不仅能有效地处理大规模样本集的学习,同时也改进了训练得到的神经网络性能.  相似文献   

4.
针对过程神经网络时空聚合运算机制复杂、学习周期长的问题,提出了一种基于数据并行的过程神经网络训练算法。该方法基于梯度下降的批处理训练方式,应用MPI并行模式进行算法设计,在局域网内实现多台计算机的机群并行计算。文中给出了基于数据并行的过程神经网络训练算法和实现机制,对不同规模的训练函数样本集和进程数进行了对比实验,并对加速比、并行效率等算法性质进行了分析。实验结果表明,根据网络和样本规模适当选取并行粒度,算法可较大提高过程神经网络的训练效率。  相似文献   

5.
Multi-class pattern classification has many applications including text document classification, speech recognition, object recognition, etc. Multi-class pattern classification using neural networks is not a trivial extension from two-class neural networks. This paper presents a comprehensive and competitive study in multi-class neural learning with focuses on issues including neural network architecture, encoding schemes, training methodology and training time complexity. Our study includes multi-class pattern classification using either a system of multiple neural networks or a single neural network, and modeling pattern classes using one-against-all, one-against-one, one-against-higher-order, and P-against-Q. We also discuss implementations of these approaches and analyze training time complexity associated with each approach. We evaluate six different neural network system architectures for multi-class pattern classification along the dimensions of imbalanced data, large number of pattern classes, large vs. small training data through experiments conducted on well-known benchmark data.  相似文献   

6.
The development of intelligent transportation systems (ITS) and the resulting need for the solution of a variety of dynamic traffic network models and management problems require faster‐than‐real‐time computation of shortest path problems in dynamic networks. Recently, a sequential algorithm was developed to compute shortest paths in discrete time dynamic networks from all nodes and all departure times to one destination node. The algorithm is known as algorithm DOT and has an optimal worst‐case running‐time complexity. This implies that no algorithm with a better worst‐case computational complexity can be discovered. Consequently, in order to derive algorithms to solve all‐to‐one shortest path problems in dynamic networks, one would need to explore avenues other than the design of sequential solution algorithms only. The use of commercially‐available high‐performance computing platforms to develop parallel implementations of sequential algorithms is an example of such avenue. This paper reports on the design, implementation, and computational testing of parallel dynamic shortest path algorithms. We develop two shared‐memory and two message‐passing dynamic shortest path algorithm implementations, which are derived from algorithm DOT using the following parallelization strategies: decomposition by destination and decomposition by transportation network topology. The algorithms are coded using two types of parallel computing environments: a message‐passing environment based on the parallel virtual machine (PVM) library and a multi‐threading environment based on the SUN Microsystems Multi‐Threads (MT) library. We also develop a time‐based parallel version of algorithm DOT for the case of minimum time paths in FIFO networks, and a theoretical parallelization of algorithm DOT on an ‘ideal’ theoretical parallel machine. Performances of the implementations are analyzed and evaluated using large transportation networks, and two types of parallel computing platforms: a distributed network of Unix workstations and a SUN shared‐memory machine containing eight processors. Satisfactory speed‐ups in the running time of sequential algorithms are achieved, in particular for shared‐memory machines. Numerical results indicate that shared‐memory computers constitute the most appropriate type of parallel computing platforms for the computation of dynamic shortest paths for real‐time ITS applications.  相似文献   

7.
近年来,深度卷积神经网络在图像识别和语音识别等领域被广泛运用,取得了很好的效果。深度卷积神经网络是层数较多的卷积神经网络,有数千万参数需要学习,计算开销大,导致训练非常耗时。针对这种情况,本文提出深度卷积神经网络的多GPU并行框架,设计并实现模型并行引擎,依托多GPU的强大协同并行计算能力,结合深度卷积神经网络在训练中的并行特点,实现快速高效的深度卷积神经网络训练。   相似文献   

8.
Parallel programming is elusive. The relative performance of different parallel implementations varies with machine architecture, system and problem size. How to compare different implementations over a wide range of machine architectures and problem sizes has not been well addressed due to its difficulty. Scalability has been proposed in recent years to reveal scaling properties of parallel algorithms and machines. In this paper, the relation between scalability and execution time is carefully studied. The concepts of crossing point analysis and range comparison are introduced. Crossing point analysis finds slow/fast performance crossing points of parallel algorithms and machines. Range comparison compares performance over a wide range of ensemble and problem size via scalability and crossing point analysis. Three algorithms from scientific computing are implemented on an Intel Paragon and an IBM SP2 parallel computer. Experimental and theoretical results show how the combination of scalability, crossing point analysis, and range comparison provides a practical solution for scalable performance evaluation and prediction. While our testings are conducted on homogeneous parallel computers, the proposed methodology applies to heterogeneous and network computing as well.  相似文献   

9.
With the increasing speed of computers and communication links, and the successful convergence of both fields, computers connected by high speed links now represent an enormously large distributed computing system. At the same time, communication between man and machine is also becoming more diverse and personalized. Networking issues such as evolution of user services, seamless communication between hosts, failure recovery and integration of new technologies arise daily. Problem-specific approaches and corresponding solutions are available at considerable cost. However, a common requirement is adaptability of the computer network to a variety of changes. In this paper, we propose Flexible Computer Communication Networks (FN) as a uniform solution to most of these networking problems. The framework of Flexible Networks can be considered as an intelligent shell enclosing existing networking architectures. An agent-oriented implementation of a flexible network is outlined. The conversion of existing networks to flexible networks is shown to be incremental, and therefore practicable.  相似文献   

10.
A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications stand to gain in such a distributed environment in terms of increased availability, reliability and efficiency of computational resources. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods either focus on specific groups of proteins or reduce the size of the original data set and/or the number of attributes involved. Grid computing could potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. In this paper we introduce a unifying methodology coupling the strengths of the Grid with the specific needs and constraints of the major bioinformatics approaches. We also present a tool that implements this process and allows researchers to assess the computational needs for a specific task and optimize the allocation of available resources for its efficient completion.  相似文献   

11.
PJVM:基于Java的面向对象分布并行处理系统   总被引:3,自引:0,他引:3  
文中介绍了PJVM系统的组成及主要特点,讨论了如何扩充Java的对象库,解决异构环境下进行并行分布处理出现的多进程或多线程存取共享变量的同步、分布式共享存储管理等问题,并通过几个实例说明了系统提供的多种编程接口和多种计算模式是可行的,从而为Java应用开辟了新的领域,为在异构广域网络上实现超大问题的大规模分布并行处理奠定了基础。  相似文献   

12.
图的最小支配集问题和最小连通支配集问题在网络与并行分布式计算中有重要应用,计算上它们都属于NP难问题。OTIS网络是一类可以任意图为因子网络的复合网络,它能继承因子网络的良好特性,因而成为可扩展性、模块化、容错性的大规模并行计算机系统的体系结构形式之一。研究如何构建OTIS网络的较小支配集和连通支配集。基于OTIS网络构图规则,分别根据因子网络的支配集算法和连通支配集算法得到了求解OTIS网络的支配集算法和连通支配集算法。从理论上分析了这些算法的性能,并通过实例进行了验证。  相似文献   

13.
卷积神经网络通常使用标准误差逆传播算法进行串行训练,随着数据规模的增长,单机串行训练存在耗时长且占有较多的系统资源的问题。为有效实现海量数据的卷积神经网络训练,提出一种基于MapReduce框架的BP神经网络并行化训练模型。该模型结合了标准误差逆传播算法和累积误差逆传播算法,将大数据集分割成若干个子集,在损失少量准确率的条件下进行并行化处理,并扩展MNIST数据集进行图像识别测试。实验结果表明,该算法对数据规模有较好的适应性,能够提高卷积神经网络的训练效率。  相似文献   

14.
The use of virtualized parallel and distributed computing systems is rapidly becoming the mainstream due to the significant benefit of high energy-efficiency and low management cost. Processing network operations in a virtual machine, however, incurs a lot of overhead from the arbitration of network devices between virtual machines, inherently by the nature of the virtualized architecture. Since data transfer between server nodes frequently occurs in parallel and distributed computing systems, the high overhead of networking may induce significant performance loss in the overall system. This paper introduces the design and implementation of a novel networking mechanism with low overhead for virtualized server nodes. By sacrificing isolation between virtual machines, which is insignificant in distributed or parallel computing systems, our approach significantly reduces the processing overhead in networking operations by up to 29% of processor load, along with up to 36% of processor cache miss. Furthermore, it improves network bandwidth by up to 8%, especially when transmitting large packets. As a result, our prototype enhances the performance of real-world workloads by up to 12% in our evaluation.  相似文献   

15.

In the present article, delay and system of delay differential equations are treated using feed-forward artificial neural networks. We have solved multiple problems using neural network architectures with different depths. The neural networks are trained using the extreme learning machine algorithm for the satisfaction of delay differential equations and associated initial/boundary conditions. Further, numerical rates of convergence of the proposed algorithm are reported based on variation of error in the obtained solution for different number of training points. Emphasis is on analysing whether deeper network architectures trained with extreme learning machine algorithm can perform better than shallow network architectures for approximating the solutions of delay differential equations.

  相似文献   

16.
Bringing direct and protected network multiprogramming into mainstream cluster computing requires innovations in three key areas: application programming interfaces, network virtualization systems, and lightweight communication protocols for high-speed interconnects. The AM-II API extends traditional active messages with support for client-server computing and facilitates the construction of parallel clients and distributed servers. Our virtual network segment driver enables a large number of arbitrary sequential and parallel applications to access network interface resources directly in a concurrent but fully protected manner. The NIC-to-NIC communication protocols provide reliable and at-most-once message delivery between communication endpoints. The NIC-to-NIC protocols perform well as the number of endpoints and the number of hosts in the cluster are scaled. The flexibility afforded by the underlying protocols enables a diverse set of timely research efforts. Other Berkeley researchers are actively using this system to investigate implicit techniques for the coscheduling of communicating processes, an essential part of high-performance communications in multiprogrammed clusters of uni- and multiprocessor servers. Other researchers are extending the active message protocols described here for clusters of symmetric multiprocessors, using so-called multiprotocol techniques and multiple network interfaces per machine  相似文献   

17.
The paper deals with the identification of recurrent neural networks (RNNs) for simulating the air–fuel ratio (AFR) dynamics into the intake manifold of a spark ignition (SI) engine. RNN are derived from the well-established static multi layer perceptron feedforward neural networks (MLPFF), that have been largely adopted for steady-state mapping of SI engines. The main contribution of this work is the development of a procedure that allows identifying a RNN-based AFR simulator with high generalization and limited training data set. The procedure has been tested by comparing RNN simulations with AFR transients generated using a nonlinear-dynamic engine model. The results show how training the network making use of inputs that are uncorrelated and distributed over the entire engine operating domain allows improving model generalization and reducing the experimental burden.Potential areas of application of the procedure developed can be either the use of RNN as virtual AFR sensors (e.g. engine or individual AFR prediction) or the implementation of RNN in the framework of model-based control architectures.  相似文献   

18.
神经网络集成   总被引:175,自引:2,他引:175  
神经网络集成通过训练多个神经网络并将成结论进行合成,可以显著地提高学习系统的泛化能力。它不仅有助于科学家对机器学习和神经的深入研究,还有助于普通工程技术人员利用神经网络技术来解决真实世界中的问题。因此,它被视为一种广阔应用前景的工程化神经计算技术,已经成为机器学习和神经计算领域的研究热点。该文从实现方法、理论分析和应用成果等三个方面综述了神经网络集成的国际研究现状,并对该领域值得进一步研究的一些问题进行了讨论。  相似文献   

19.
Presented here is a set of methods and tools developed to provide transportable measurements of performance in heterogeneous networks of machines operating together as a single virtual heterogeneous machine (VHM). The methods are work-based rather than time-based, and yield significant analytic information. A technique for normalizing the measure of useful work performed across a heterogeneous network is proposed and the reasons for using a normalized measure are explored. It is shown that work-based performance measures are better than time-based ones because they may be (1) taken while a task is currently executing on a machine; (2) taken without interrupting production operation of the machine network; (3) used to compare disparate tasks, and (4) used to perform second-order analysis of machine network operation. This set of performance tools has been used to monitor the utilization of high-performance computing networks, provide feedback on algorithm design and determine the veracity of computing performance models.  相似文献   

20.
分布式并行计算环境:MPI   总被引:3,自引:0,他引:3  
1 引言在过去几十年里,大规模和超大规模并行机的可用性取得长足进步。由于各种因素,这些机器大多采用分布主存或分布共享主存结构,为了对用户提供必要的支持,厂商开发了各自专有的消息传递包或消息传递库如Intel的NX、IBM的EUI、Parasoft的Exress、橡树岭的PVM等。它们提供了相似的功能,并且在特定平台上具有优越的性能,但是在应用程序  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号