首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
All-to-all communication patterns occur in many important parallel algorithms. This paper presents new algorithms for all-to-all communication patterns (all-to-all broadcast and all-to-all personalized exchange) for wormhole switched 2D/3D torus- and mesh-connected multiprocessors. The algorithms use message combining to minimize message start-ups at the expense of larger message sizes. The unique feature of these algorithms is that they are the first algorithms that we know of that operate in a bottom-up fashion rather than a recursive, top-down manner. For a 2d×2d torus or mesh, the algorithms for all-to-all personalized exchange have time complexity of O(23d). An important property of the algorithms is the O(d) time due to message start-ups, compared with O(2d) for current algorithms. This is particularly important for modern parallel architectures where the start-up cost of message transmissions still dominates, except for very large block sizes. Finally, the 2D algorithms for all-to-all personalized exchange are extended to O(24d) algorithms in a 2d×2d×2d3D torus or mesh. These algorithms also retain the important property of O(d) time due to message start-ups  相似文献   

2.
给定m台平行机(同型机),n个工件,寻找一种分配方案,使得把这n个工件分配到m台机器后,整体完工时间尽可能短,这个NP-难问题被称为经典排序问题。如果每个工件的加工时间满足一定的条件,则有望能在多项式时间内有效地得到最优的分配方案。Yue等对加工时间满足整除性质的经典排序问题考虑了一种新的算法,该算法总是能得到这种特殊情况的最优分配。该算法在多项式时间内能够得到最优分配,是对于一般的经典排序问题的近似算法。文章在此基础上,考虑该新算法在一般问题上的近似比。文中考虑了这个新算法的两种版本,分别得到了3/2和2-1/2 q(q∈Z+)的近似比。紧例子表明,文中对算法的两个版本的分析都是最优的。  相似文献   

3.
Service is provided to a set of parallel queues by a single server. The service of queue i may be initiated only at certain time instances {tni}n=1 that constitute the connectivity instances for queue i. The service of different customers cannot overlap. Scheduling is required to resolve potential contention of services initiated at closely spaced, closer than the service time, connectivity instances. At any time t, the future connectivity instances are available for scheduling. An anticipative policy is given, which at time t schedules the transmissions until a certain future time t+h. The length of the scheduling horizon h is selected based on the backlog at t. The allocation of the server in the interval [t, t+h], is done in accordance to the backlogs of the individual queues at t. The throughput region of the system is characterized, and it is shown that the policy we propose achieves maximum throughput. The policy has a low implementation complexity which is bounded for all the achievable throughput vectors. The average delay and the scheduling complexity are studied by simulation, and the trade-off between the two is demonstrated. The above scheduling problem arises in the access layer of the cross-links of a satellite network  相似文献   

4.
In an enterprise grid computing environments, users have access to multiple resources that may be distributed geographically. Thus, resource allocation and scheduling is a fundamental issue in achieving high performance on enterprise grid computing. Most of current job scheduling systems for enterprise grid computing provide batch queuing support and focused solely on the allocation of processors to jobs. However, since I/O is also a critical resource for many jobs, the allocation of processor and I/O resources must be coordinated to allow the system to operate most effectively. To this end, we present a hierarchical scheduling policy paying special attention to I/O and service-demands of parallel jobs in homogeneous and heterogeneous systems with background workload. The performance of the proposed scheduling policy is studied under various system and workload parameters through simulation. We also compare performance of the proposed policy with a static space–time sharing policy. The results show that the proposed policy performs substantially better than the static space–time sharing policy.  相似文献   

5.
In this paper we consider the unbounded single machine parallel batch scheduling problem with family jobs and release dates to minimize makespan. We show that this problem is strongly NP-hard, and give an O(n(n/m+1)m) time dynamic programming algorithm and an O(mkk+1P2k−1) time dynamic programming algorithm, where n is the number of jobs, m is the number of families, k is the number of distinct release dates and P is the sum of the processing times of all families. We further give a heuristic with a performance ratio 2. We also give a polynomial-time approximation scheme for the problem.  相似文献   

6.
Recurrence formulations for various problems, such as finding an optimal order of matrix multiplication, finding an optimal binary search tree, and optimal triangulation of polygons, assume a similar form. A. Gibbons and W. Rytter (1988) gave a CREW PRAM algorithm to solve such dynamic programming problems. The algorithm uses O(n6/log n) processors and runs in O(log2 n) time. In this article, a modified algorithm is presented that reduces the processor requirement to O(n6/log 5n) while maintaining the same time complexity of O(log2 n)  相似文献   

7.
The mesh-connected computer with multiple buses (MC-CMB) is a well-known parallel organization, providing broadcast facilities in each row and each column. In this paper, we propose a 2D generalized MCCMB (2-GMCCMB) for the purpose of increasing the efficiency of executing some important applications of prefix computations such as solving Linear recurrences and tridiagonaI systems, etc. A k1n1 ×k1n2 2-GMCCMB is constructed from a k 1n1×k1n2 mesh organization by enhancing the power of each disjoint n1×n2 submesh with multiple buses (sub-2-MCCMB). Given n data, a prefix computation can be performed in O(n1/10) time on an n3/5×n2/5 2-GMCCMB, where each disjoint sub-2-MCCMB is of size n1/2×n3/10. This time bound is faster than the previous time bound of O(n1/8) for the same computation on an n5/8×n3/8 2-MCCMB. Furthermore, the time bound of our parallel prefix algorithm can be further reduced to O(n1/11) if fewer processors are used. Our result can be extended to the d-dimensional GMCCMB, giving a time bound of O(n1 (d2(d)+d)/) for any constant d; here, we omit the constant factors. This time bound is less than the previous time bound of O(n1(d2(d))/) on the d-dimensional MCCMB  相似文献   

8.
Many DAG scheduling algorithms generate schedules that require prohibitively large number of processors. To address this problem, we propose a generic algorithm, SC, to minimize the processor requirement of any given valid schedule. SC preserves the schedule length of the original schedule and reduces processor count by merging processor schedules and removing redundant duplicate tasks. To the best of our knowledge, this is the first algorithm to address this highly unexplored aspect of DAG scheduling. On average, SC reduced the processor requirement 91, 82, and 72 percent for schedules generated by PLW, TCSD, and CPFD algorithms, respectively. SC algorithm has a low complexity (O{N}3) compared to most duplication-based algorithms. Moreover, it decouples processor economization from schedule length minimization problem. To take advantage of these features of SC, we also propose a scheduling algorithm SDS, having the same time complexity as SC. Our experiments demonstrate that schedules generated by SDS are only 3 percent longer than CPFD (O{N}4), one of the best algorithms in that respect. SDS and SC together form a two-stage scheduling algorithm that produces schedules with high quality and low processor requirement, and has lower complexity than the comparable algorithms that produce similar high-quality results.  相似文献   

9.
Studies the complexity of the problem of allocating m modules to n processors in a distributed system to minimize total communication and execution costs. When the communication graph is a tree, Bokhari has shown that the optimum allocation can be determined in O(mn2) time. Recently, this result has been generalized by Fernandez-Baca, who has proposed an allocation algorithm in O(mnk+1) when the communication graph is a partial k-tree. The author shows that in the case where communication costs are uniform, the module allocation problem can be solved in O(mn) time if the communication graph is a tree. This algorithm is asymptotically optimum  相似文献   

10.
Recently it has been noticed that for semigroup computations and for selection, rectangular meshes with multiple broadcasting yield faster algorithms than their square counterparts. The contribution of the paper is to provide yet another example of a fundamental problem for which this phenomenon occurs. Specifically, we show that the problem of computing the convex hull of a set of n sorted points in the plane can be solved in O(n1/8 log 3/4) time on a rectangular mesh with multiple broadcasting of size n3/8 log1/4 n×n5/8/log1/4n. The fastest previously known algorithms on a square mesh of size √n×√n run in O(n1/6) time in case the n points are pixels in a binary image, and in O(n1/6log3/2 n) time for sorted points in the plane  相似文献   

11.
以无人机网络的资源分配为研究对象,研究了基于强化学习的多无人机网络动态时隙分配方案,在无人机网络中,合理地分配时隙资源对改善无人机资源利用率具有重要意义;针对动态时隙分配问题,根据调度问题的限制条件,建立了多无人机网络时隙分配模型,提出了一种基于近端策略优化(PPO)强化学习算法的时隙分配方案,并进行强化学习算法的环境映射,建立马尔可夫决策过程(MDP)模型与强化学习算法接口相匹配;在gym仿真环境下进行模型训练,对提出的时隙分配方案进行验证,仿真结果验证了基于近端策略优化强化学习算法的时隙分配方案在多无人机网络环境下可以高效进行时隙分配,提高网络信道利用率,提出的方案可以根据实际需求适当缩短训练时间得到较优分配结果。  相似文献   

12.
多星任务调度是具有NP-hard特性的优化问题,随着卫星资源规模和任务需求规模的双重增长,传统调度方法求解效率不高.在轨卫星在常年运行过程中积累了丰富的调度数据.针对大规模多星任务调度场景,建立多星多波束任务调度模型,并提出数据驱动的多星任务网络预测调度算法对其求解.以分割的思想,实现多星场景下任务可调度性预测.从历史调度数据中,提取设定的3个静态特征和5个动态特征,构建并训练预测网络,预测任务被不同卫星完成的概率,并以冲突避免、负载均衡等为原则,得到初始任务和资源卫星的分配方案.进一步设计双链结构的进化算法,以双链编码形式表征上述关系,配合设计的交叉、修复等进化算子,优化初始方案中的任务序列与资源分配关系,输出最终任务调度方案.仿真结果表明,与改进蚁群算法、混合遗传算法和数据驱动并行调度算法相比,所提出算法在运行时间、方案收益和卫星负载均衡3方面均有较好的表现.  相似文献   

13.
Given N matrices A1, A2,...,AN of size NtimesN, the matrix chain product problem is to compute A1timesA2times...timesAN. Given an NtimesN matrix A, the matrix powers problem is to calculate the first N powers of A, that is, A, A2, A3,..., AN. We solve the two problems on distributed memory systems (DMSs) with p processors that can support one-to-one communications in T(p) time. Assume that the fastest sequential matrix multiplication algorithm has time complexity O(Nalpha), where the currently best value of a is less than 2.3755. Let p be arbitrarily chosen in the range 1lesplesNalpha+1/(log N)2. We show that the two problems can be solved by a DMS with p processors in Tchain(N,p)=O((Nalpha+1/p)+T(p))((N2(2+1/alpha/p2/alpha)(log+p/N)1-2/alpha+log+((p log N)/Nalpha)) and Tpower (N,p)=O(Nalpha+1/p+T(p)((N2(1+1/alpha)/p2/alpha)(log+p/2 log N)1-2/alpha+(log N)2))) times, respectively, where the function log+ is defined as follows: log+ x=log x if xges1 and log+ x=1 if 0相似文献   

14.
A cross-bridge reconfigurable array of processors is a parallel processing system which has the ability to change dynamically the supported interconnection scheme during the execution of an algorithm. Based on this architecture, several O(1) time basic operations such as the transpose, the untranspose, the shift, the unshift and the prefix sum of a binary sequence are first proposed. Then, these basic operations can be used to find the kth smallest element of N m bits unsigned integers in O(m) time using N processors and to sort N data items in O(1) time using O(N5/3) processors instead of using O(N2) processors as those proposed by other researchers  相似文献   

15.
This paper presents a new parallel algorithm for routing unicast (one-to-one) assignments in Benes networks. Parallel routing algorithms for such networks were reported earlier, but these algorithms were designed primarily to route permutation assignments. The routing algorithm presented in this paper removes this restriction without an increase in the order of routing cost or routing time. We realize this new routing algorithm on two different topologies. The algorithm routes a unicast assignment involving O(k) pairs of inputs and outputs in O(lg 2 k+lg n) time on a completely connected network of n processors and in O(lg4 k+lg2 k lg n) time on an extended shuffle-exchange network of n processors. Using O(n lg n) professors, the same algorithm can be pipelined to route α unicast assignments each involving O(k) pairs of inputs and outputs, in O(lg2 k+lg n+(α-1) lg k) time on a completely connected network and in O(lg4 k+lg2 k lg n+(α-1)(lg 3 k+lg k lg n)) time on the extended shuffle-exchange network. These yield an average routing time of O(lg k) in the first case, and O(lg3 k+1g k lg n) in the second case, for all α⩾lg n. These complexities indicate that the algorithm given in this paper is as fast as Nassimi and Sahni's algorithm for unicast assignments, and with pipelining, it is faster than the same algorithm at least by a factor of O(lg n) on both topologies. Furthermore, for sparse assignments, i.e., when k=O(1), it is the first algorithm which has an average routing time of O(1g n) on a topology with O(n) links  相似文献   

16.
Processor allocation and job scheduling are two complementary techniques for improving the performance of multiprocessors. It has been observed that all the hypercube allocation policies with the FCFS scheduling provide only incremental performance improvement. A greater impact on the performance can be obtained by efficient job scheduling. This paper presents an effort in that direction by introducing a new scheduling algorithm called lazy scheduling for hypercubes, The motivation of this scheme is to eliminate the limitations of the FCFS scheduling. This is done by maintaining separate queues for different job sizes and delaying the allocation of a job if any other job(s) of the same dimension is(are) running in the system. Processor allocation is done using the buddy strategy. The scheduling and allocation complexity is O(n) for an n-cube. Simulation studies show that the performance is dramatically enhanced by using the lazy scheduling scheme as compared to the FCFS scheduling. Comparison with a recently proposed scheme called scan indicates that the lazy scheme performs better than the scan scheduling under a wide range of workloads.  相似文献   

17.
针对设备到设备(D2D)通信资源分配中的时隙调度时延以及信道增益变化导致吞吐率下降的问题,提出了一种公平性时隙调度(FTDS)算法。首先,基于频谱复用模式建立系统模型,并归纳为一组合优化问题;然后,在模型的次优求解中,FTDS算法将调度周期划分为多个等长的时隙,根据优先级策略将D2D用户分配至不同时隙调度,从而适应D2D用户多于蜂窝用户的应用场景;同时,为了权衡服务质量(QoS)与系统吞吐率的关系,构造一满足性权值与传输速率相互制约,共同决定用户调度优先级。仿真实验中,FTDS算法相比TDS、RANDOM算法,吞吐率平均增幅分别达到11.09%和40.64%,且FTDS算法下D2D用户被调度频次累积分布更为集中;同时,相比TDS算法调度时延最大降低31.22%。仿真实验表明,FTDS算法拥有更优的吞吐率性能、更公平的调度机制、更小的调度时延。  相似文献   

18.

Big data analytics in cloud environments introduces challenges such as real-time load balancing besides security, privacy, and energy efficiency. This paper proposes a novel load balancing algorithm in cloud environments that performs resource allocation and task scheduling efficiently. The proposed load balancer reduces the execution response time in big data applications performed on clouds. Scheduling, in general, is an NP-hard problem. Our proposed algorithm provides solutions to reduce the search area that leads to reduced complexity of the load balancing. We recommend two mathematical optimization models to perform dynamic resource allocation to virtual machines and task scheduling. The provided solution is based on the hill-climbing algorithm to minimize response time. We evaluate the performance of proposed algorithms in terms of response time, turnaround time, throughput metrics, and request distribution with some of the existing algorithms that show significant improvements.

  相似文献   

19.
A new approach for dynamic job scheduling in mesh-connected multiprocessor systems, which supports a multiuser environment, is proposed in this paper. Our approach combines a submesh reservation policy with a priority-based scheduling policy to obtain high performance in terms of high throughput, high utilization, and low turn-around times for jobs. This high performance is achieved at the expense of scheduling jobs in a strictly fair, FCFS fashion; in fact, the algorithm is parameterized to allow trade-offs between performance and (short-term) POPS fairness. The proposed scheduler can be used with any submesh allocation policy. A fast and efficient implementation of the proposed scheduler has also been presented. The performance of the proposed scheme has been compared with the FCFS policy, the only existing scheduling strategy for meshes, to demonstrate the effectiveness of the proposed approach. Simulation results indicate that our scheduling strategy outperforms the FCFS policy significantly. Specifically, our strategy significantly reduces the average waiting delay of jobs over the FCFS policy. The fast implementation of the proposed scheduler results in low allocation and deallocation time overhead, as well as low space overhead  相似文献   

20.
We compare five implementations of the Jacobi method for diagonalizing a symmetric matrix. Two of these, the classical Jacobi and sequential sweep Jacobi, have been used on sequential processors. The third method, the parallel sweep Jacobi, has been proposed as the method of choice for parallel processors. The fourth and fifth methods are believed to be new. They are similar to the parallel sweep method but use different schemes for selecting the rotations.

The classical Jacobi method is known to take O(n4) time to diagonalize a matrix of order n. We find that the parallel sweep Jacobi run on one processor is about as fast as the sequential sweep Jacobi. Both of these methods take O(n3 log2n) time. One of our new methods also takes O(n3 log2n) time, but the other one takes only O(n3) time. The choice among the methods for parallel processors depends on the degree of parallelism possible in the hardware. The time required to diagonalize a matrix on a variety of architectures is modeled.

Unfortunately for proponents of the Jacobi method, we find that the sequential QR method is always faster than the Jacobi method. The QR method is faster even for matrices that are nearly diagonal. If we perform the reduction to tridiagonal form in parallel, the QR method will be faster even on highly parallel systems.  相似文献   


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号