共查询到20条相似文献,搜索用时 0 毫秒
1.
传统遗传算法求解计算密集型任务时,适应度函数的执行时间增加相当快,致使当种群规模或者进化代数增大时,算法的收敛速度非常缓慢。基于此,设计了"粗粒度-主从式"混合式并行遗传算法(HBPGA),并在目前TOP500上排名第一的超级计算机神威"太湖之光"平台上实现。该算法模型采用两级并行架构,结合了MPI和Athread两种编程模型,与传统在单核或者一级并行构架的多核集群上实现的遗传算法相比,在申威众核处理器上实现了二级并行,并得到了更好的性能和更高的加速比。实验中,当从核数为16×64时,最大加速比达到544,从核加速比超过31。 相似文献
2.
Han Qingchang Yang Hailong Dun Ming Luan Zhongzhi Gan Lin Yang Guangwen Qian Depei 《The Journal of supercomputing》2021,77(5):4533-4564
The Journal of Supercomputing - Tile low-rank general matrix multiplication (TLR GEMM) is a novel method of matrix multiplication on large data-sparse matrices, which can significantly reduce... 相似文献
3.
三维地震声波理论与计算方法是地质勘探研究的基础,通过分析不同介质中声波的传播特性,完成三维地震声波正演模拟。针对三维地震声波有限差分交错网格方程正演过程中存在数值计算大、内存消耗大等实际问题,提出了基于神威·太湖之光超级计算机系统中国产异构众核处理器(申威26010)的三维地震声波正演模拟编程模型,完成了基于处理器间的进程级并行基于计算核心间的线程级并行优化策略。研究了DMA(直接内存读取)通信方式,提出2.5D流水线任务划分、通信与计算的相互掩盖的多角度优化策略。实验结果表明,该策略有效缓解了带宽瓶颈,发挥了处理器强大的计算能力,解决了程序在申威26010异构众核处理器处理有限差分问题时,并行效率过低的问题。在大规模测试下,使用266240个计算核心,程序仍能够保持稳定的计算性能,达到5.5 GFlops的场值更新。 相似文献
4.
基于分段线性动态时间弯曲的时间序列聚类算法研究 总被引:4,自引:0,他引:4
时间序列是一类重要的复杂类型数据,时间序列知识发现正成为知识发现的研究热点之一。欧几里德距离及其扩展作为相似测度被广泛应用于时间序列的比较中,但是这种距离测度时数据没有好的鲁棒性。动态时间弯曲技术是基于非线性动态编程的一种模式匹配算法,但是其计算复杂性相当高。本文提出了基于时间序列分段线性表示的动态时间弯曲算法,通过计算线性分段序列数据之间的最短弯曲路径来获得序列的匹配。对综合控制时间序列数据进行基于不同距离测度的聚类分析对比结果表明本文提出的算法有很高的精度和时振幅差异、嘈声和线性漂移有强的鲁棒性,大大降低计算复杂性,具有良好的应用价值。 相似文献
5.
6.
《Journal of Parallel and Distributed Computing》1993,18(4):411-422
The behavior of n interacting processes synchronized by the "Time Warp" rollback mechanism is analyzed under the constraint that the total amount of memory to execute the program is limited. In Time Warp, a protocol called "cancelback" has been proposed to reclaim storage when the system runs out of memory. A discrete state, continuous time Markov chain model for Time Warp augmented with the cancelback protocol is developed for a shared memory system with n homogeneous processors and homogeneous workload with constant message population. The model allows one to predict speedup as the amount of available memory is varied. The performance predicted by the model is validated through performance measurements on an operational Time Warp system executing on a shared-memory multiprocessor using a workload similar to that in the model. It is observed that if the sequential simulation requires m message buffers, Time Warp with a small fraction of message buffers beyond m performs almost as well as Time Warp with unlimited memory. 相似文献
7.
8.
Carothers C.D. Fujimoto R.M. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(3):299-317
Time Warp is an optimistic protocol for synchronizing parallel discrete event simulations. To achieve performance in a multiuser network of workstation (NOW) environment, Time Warp must continue to operate efficiently in the presence of external workloads caused by other users, processor heterogeneity, and irregular internal workloads caused by the simulation model. However, these performance problems can cause a Time Warp program to become grossly unbalanced, resulting in slower execution. The key observation asserted in this article is that each of these performance problems, while different in source, has a similar manifestation. For a Time Warp program to be balanced, the amount of wall clock time necessary to advance an LP one unit of simulation time should be about the same for all LPs. Using this observation, we devise a single algorithm that mitigates these performance problems and enables the “background” execution of Time Warp programs on heterogeneous distributed computing platforms in the presence of external as well as irregular internal workloads 相似文献
9.
Manfred Wamsler 《Engineering with Computers》2009,25(2):131-138
In modal frequency response analysis, the dynamic analyst is often faced with the structure’s dynamic behavior, the modal
contributions included, over a frequency window rather than at a single frequency. Therefore a new method in modal frequency
response analysis has been developed for computing both complex modal-contributions and real, actual modal-contributions over
a frequency range. Contributions from normal modes to displacement, velocity, or acceleration of a set of selected evaluation
points (grid-component combinations) are considered. The focus lies on identifying the major actual-contributions from normal
modes and the frequency range they are active in. The method is valid for all branches of mechanical engineering. With the
thorough knowledge of the dominant modal-contributions to the physical motion response of relevant structure locations and
the modal contributions’ frequency history, the traditional design process can substantially be enhanced. It is worthwhile
to notice that by the use of the presented procedure the dynamic analyst may find innovative redesigns which the automatic
structural optimizers are not able to find. Examples are given to demonstrate the application, the strength of the coupling
between modes, the influence of base and force excitation on the modal contributions and, finally, some recommendations on
how to reduce undesired structural responses. 相似文献
10.
《Journal of Parallel and Distributed Computing》1996,37(2):134-145
Time Warp is an optimistic synchronization protocol used for parallel discrete event simulation. While Time Warp has the potential to reduce the execution time of large simulations, it has been plagued by a variety of problems, namely: 1. Instability due to thrashing effects caused by echoing and cascading rollbacks. 2. Memory bottlenecks due to state saving and excessive optimism. 3. Inefficient scheduling algorithms for scheduling Time Warp processes on each processing node. These problems have inhibited the widespread use of Time Warp as a general purpose synchronization algorithm. The general trend of researchers attempting to solve these problems has been to statically limit the optimism of Time Warp. Unfortunately, these attempts have achieved only limited success. This is because a static set of parameters may perform well for one simulation but not for another. This paper attacks the problem using adaptive mechanisms to control optimism, using an index of performance called useful work. This research presents solutions for the above mentioned problems, by: 1. Stabilizing Time Warp using adaptive bounded time windows. 2. Reducing memory usage and overall execution time by using an adaptive mechanism to vary the checkpoint interval. 3. Scheduling Time Warp processes with the useful work parameter to favor more productive processes. Using this new performance index called Useful Work, several modifications to Time Warp are implemented to stabilize and improve Time Warp. Thus, this new improved Time Warp synchronization mechanism termed Parameterized Time Warp provides an integrated adaptive solution to optimistic Parallel Discrete Event Simulation. Empirical work showing that PTW outperforms an equivalent Time Warp simulation executing under similar partitioning and load conditions is also presented. 相似文献
11.
12.
One important problem in deterministic scheduling theory is to schedule a set of independent jobs on a set of parallel processors without any preemption. When the jobs have fixed due dates, the objective often is to minimize the maximum lateness. The problem is NP-Complete[7]. In this paper a fast heuristic procedure is developed to solve this problem. It is applicable to both equal and unequal processors. The “average” behavior of the procedure is tested against a truncated Branch and Bound algorithm in a large scale computational study consisting of about 10,000 different examples. The results show that the procedure is highly efficient. 相似文献
13.
Rhonda Righter 《Systems & Control Letters》1988,10(4)
We consider a sequencing problem in which there are n jobs to be processed nonpreemptively on m nonidentical processors. The processing time of the j-th processor is exponentially distributed with rate μj, where μ1μ2μm. Job i incurs a holding cost at rate ci per unit time while still in the system, where c1c2cn. We show that to minimize total expected holding costs (weighted flowtime), it is optimal to take the fastest (lowest indexed) available processor, say processor j, and assign job k to it if k>(Σij−1μi)/μj−j k−1. After each assignment the jobs are renumbered (so that job k+1 becomes job k, etc.), and the procedure is repeated with the next fastest available processor, etc. Note that the policy does not depend on the values of the holding costs ci. This result is a generalization of the result of Agrawala et al. (1984) for minimizing expected flowtime, i.e., minimizing total holding cost when the holding costs of all the jobs are the same. We give a simpler proof of the more general result. 相似文献
14.
15.
不平衡数据集分类为机器学习热点研究问题之一,近年来研究人员提出很多理论和算法以改进传统分类技术在不平衡数据集上的性能,其中用阈值判定标准确定神经网络中的阈值是重要的方法之一。常用的阈值判定标准存在一定缺点,如不能使少数类及多数类分类精度同时取得最好、过于偏好多数类的精度等。为此提出一种新的阈值判定标准,依据该标准能够使少数类及多数类分类精度同时取得最好而不受样例类别比例的影响。以神经网络与遗传算法相结合训练分类器,作为阈值选择条件和分类器的评价标准,新标准能够得到较好的结果。 相似文献
16.
17.
为了解决LSF调度算法在实时调度中由颠簸现象引起的调度实时性差、浪费系统资源的问题,在LSF算法中引入一个任务重要度系数,采用云模型对任务重要度系数和裕度进行定量表示,并通过由重要度系数云和裕度云两个任务特征参数云模型共同确定的二维云模型,为每个任务设定一个抢占阈值,当某一就绪任务要抢占当前任务时,必须要满足它的优先级高于当前任务的抢占阈值.仿真结果表明,采用云模型优化后的LSF算法不仅有效解决了颠簸现象,而且能使紧急且重要的任务优先运行. 相似文献
18.
Shifei Ding Yanan Zhang Jinrong Chen Weikuan Jia 《Neural computing & applications》2013,23(2):293-297
There is a function of dynamic mapping when processing non-linear complex data with Elman neural networks. Because Elman neural network inherits the feature of back-propagation neural network to some extent, it has many defects; for example, it is easy to fall into local minimum, the fixed learning rate, the uncertain number of hidden layer neuron and so on. It affects the processing accuracy. So we optimize the weights, thresholds and numbers of hidden layer neurons of Elman networks by genetic algorithm. It improves training speed and generalization ability of Elman neural networks to get the optimal algorithm model. It has been proved by instance analysis that new algorithm was superior to the traditional model in terms of convergence rate, predicted value error, number of trainings conducted successfully, etc. It indicates the effect of the new algorithm and deserves further popularization. 相似文献
19.