首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The problem of finding an optimal product sequence for sequential multiplication of a chain of matrices (the matrix chain ordering problem, MCOP) is well-known. We consider the problem of finding an optimal product schedule for evaluating a chain of matrix products on a parallel computer (the matrix chain scheduling problem, MCSP). The difference between MCSP and MCOP is that MCOP pertains to a product sequence for single processor systems and MCSP pertains to a sequence of concurrent matrix products for parallel systems. The approach of parallelizing each matrix product after finding an optimal product sequence for single processor systems does not always guarantee minimum evaluation time on parallel systems since each parallelized matrix product may use processors inefficiently. We introduce a new processor scheduling algorithm for MCSP which reduces the evaluation time of a chain of matrix products on a parallel computer, even at the expense of a slight increase in the total number of operations. Given a chain of n matrices and a matrix product utilizing at most P/k processors in a P-processor system, the proposed algorithm approaches k(n-1)/(n+klog(k)-k) times the performance of parallel evaluation using the optimal sequence found for MCOP. Experiments performed on a Fujitsu AP1000 multicomputer also show that the proposed algorithm significantly decreases the time required to evaluate a chain of matrix products in parallel systems.  相似文献   

2.
面向移动设备的3D图形处理器设计   总被引:2,自引:0,他引:2  
提出一种面向移动设备的3D图形处理器的设计方法,从图形算法和硬件架构两个层次进行优化.对图形算法进行C语言的仿真模拟,并设计高效的具有并行和流水线结构的图形处理器架构.该架构采用定点的数据通道,拥有一个可编程的顶点处理器和基于像素块的光栅扫描转换模块,降低电路复杂度的同时提高了整体性能.该设计已经在FPGA上验证,并给出了实验结果.实验结果显示该图形处理器结构可以满足移动设备的图形应用要求,具有可行性.  相似文献   

3.
The processor allocation problem requires recognizing and locating a free subcube that can accommodate a request for a subcube of a specified size for an incoming task. Methods reported in the literature fall into two strategies: bottom-up or bit mapped technique (BMT) and top-downer available cube technique (ACT). Our algorithm that solves the allocation problem in faulty hypercubes falls into the category of ACT's which offer the advantage over BMT's of quickly recognizing whether or not a requested subcube is available in the list of fault-free subcubes. We introduce new algebraic functions and the concept of separation factor to select a subcube for allocation. The notion of overlap-syndrome, defined in the text, quantifies the overlap among free subcubes. Our technique has full subcube recognition ability and thus recognizes more subcubes as compared to bit mapped techniques: Buddy, Gray code and its variants. The advantages of our approach over some of the existing ACT's in terms of fragmentation and overall completion time are described in the text and in simulation results  相似文献   

4.
Processor Design in 3D Die-Stacking Technologies   总被引:4,自引:0,他引:4  
Three-dimensional integration is an emerging fabrication technology that vertically stacks multiple integrated chips. The benefits include an increase in device density; much greater flexibility in routing signals, power, and clock; the ability to integrate disparate technologies; and the potential for new 3D circuit and microarchitecture organizations. This article provides a technical introduction to the technology and its impact on processor design. Although our discussions here primarily focus on high-performance processor design, most of the observations and conclusions apply to other microprocessor market segments.  相似文献   

5.
The quality of a 3D video display depends on virtual view synthesis process which is affected by the bit allocation criterion. The performance of a bit allocation algorithm is dependent on various encoding parameters like quantization parameter, motion vector, mode selection, and so on. Rate-distortion optimization (RDO) is used to efficiently allocate bits with minimum distortion. In 3D video, rate-distortion (RD) property of synthesized view is used to assign bits between texture video and depth map. Existing literature on bit allocation methods use mean square error (MSE) as distortion metric which is not suitable for measuring perceptual quality. In this paper, we propose structural similarity (SSIM)-based joint bit allocation scheme to enhance visual quality of 3D video. Perceptual quality of a synthesized view depends on texture and depth map quality. Thus, SSIM-based RDO is performed on both texture and depth map where SSIM is used as distortion metric in mode decision and motion estimation. SSIM-based distortion model for synthesized view is determined experimentally. As SSIM cannot be related to quantization step, SSIM-MSE relation is used to convert distortion model in terms of MSE. The Lagrange multiplier method is used to solve the bit allocation problem. The proposed algorithm is implemented using 3DV-ATM as well as HEVC. RD curves show reduction in bitrate with an improvement in SSIM of synthesized view.  相似文献   

6.
A new approach for dynamic processor allocation in hypercube multicomputers which supports a multi-user environment is proposed. A dynamic binary tree is used for processor allocation along with an array of free lists. Two algorithms are proposed based on this approach, capable of efficiently handling cubic as well as noncubic allocation. Time complexities for both allocation and deallocation are shown to be polynomial, a significant improvement over the existing exponential and even super-exponential algorithms. Unlike existing schemes, the proposed strategies are best-fit strategies within their search space. Simulation results indicate that the proposed strategies outperform the existing ones in terms of parameters such as average delay in honoring a request, average allocation time, average deallocation time, and memory overhead  相似文献   

7.
分析了MIMO下行链路中的多用户调度和功率分配方法,讨论了各种功率分配方法的多用户调度性能,并通过仿真实验对性能进行了比较。实验结果表明,在低信噪比条件下,二维注水功率分配的性能最佳,而在高信噪比条件下,各种功率分配方法的性能差别很小。  相似文献   

8.
Interaction between the phases of register allocation and instruction scheduling are often considered in publications devoted to optimizations for the final stage of compilation. Typically, it is proposed to adapt one of the phase for needs of another without their combination into a single unit. However, their integration can essentially reduce the time of operation and enhance the performance of the resulting code. This study describes an attempt to combine these phases as completely as possible with account for the features of static scheduling for VLIW-architectures.  相似文献   

9.
基于不同分配策略的云计算任务调度以及任务分配与调度的主要目的,提出了一种新的算法—求解3-SAT问题的基于任务分配与调度的GSAT算法。该算法将3-SAT问题中的每一个变量形成一个任务,在GSAT算法的基础上,引入任务分配与调度指导贪心搜索;同时,在保留原有贪心搜索的前提下,根据任务分配与调度的思想和3-SAT问题的特点,设计了两种新的策略—分配策略和调度策略共同完成整个贪心搜索过程。以标准的SATLAB库中变量个数从 20~250的3 700个不同规模的标准Uniform Random 3-SAT 问题对新的算法的性能进行了合理的测试,并与高效和普通性能改进的GSAT算法的结果作了比较,结果表明,该算法具有更高的成功率和更少的翻转次数。  相似文献   

10.
The performance of a multiprocessor architecture is determined both by the way the program is partitioned into processes and by the way these processes are allocated to different processors. In the fine-grain dataflow model, where each process consists of a single instruction, decomposition of a program into processes is achieved automatically by compilation. This paper investigates the effectiveness of fine-grain decomposition in the context of the prototype dataflow machine now in operation at the University of Manchester. The current machine is a uniprocessor, known as the Single-Ring Dataflow Machine, comprising a single processing element which contains several units connected together in a pipelined ring. A Multi-ring Dataflow Machine (MDM) containing several such processing elements connected together via an interprocessor switching network, is currently under investigation. The paper describes a method of allocating dataflow instructions to processing elements in the MDM, and examines the influence of this method on selection of a switching network. Results obtained from simulation of the MDM are presented. They show that programs are executed efficiently when their parallelism is matched to the parallelism of the machine hardware.  相似文献   

11.
Wu  Ying-Jhih  Yu  Shuo-Ting  Lai  Kuan-Chou  Chhabra  Amit  Chang  Hsi-Ya  Huang  Kuo-Chan 《The Journal of supercomputing》2020,76(12):10212-10239
The Journal of Supercomputing - Most modern parallel programs are written with the moldable property. However, most existing parallel computing systems treat such parallel programs as rigid jobs...  相似文献   

12.
Diverse applications in manufacturing, logistics, health care, telecommunications, and computing require that renewable resources be dynamically scheduled to handle distinct classes of job service requests arriving randomly over slotted time. These dynamic stochastic resource scheduling problems are analytically and computationally intractable even when the number of job classes is relatively small. In this paper, we formally introduce two types of problems called allocation and advanced scheduling, and formulate their Markov decision process (MDP) models. We establish that these MDPs are “weakly coupled” and exploit this structural property to develop an approximate dynamic programming method that uses Lagrangian relaxation and constraint generation to efficiently make good scheduling decisions. In fact, our method is presented for a general class of large-scale weakly coupled MDPs that we precisely define. Extensive computational experiments on hundreds of randomly generated test problems reveal that Lagrangian decisions outperform myopic decisions with a statistically significant margin. The relative benefit of Lagrangian decisions is much higher for advanced scheduling than for allocation scheduling.  相似文献   

13.
Berth allocation is an important port operation problem for container terminals. This paper studies how to develop a robust schedule for berth allocation that incorporates a degree of anticipation of uncertainty (e.g., vessels’ arrival time and operation time) during the schedule’s execution. This study proposes a bi-objective optimization model for minimizing cost and maximizing robustness of schedules. A heuristic is also developed for solving the bi-objective model in large-scale problem cases. Numerical experiments are conducted to validate the effectiveness and efficiency of the proposed model and method. Managerial implications are also discussed.  相似文献   

14.
针对嵌入式系统中大多数任务执行算法不考虑目标成本问题,提出了一种基于多目标全局约束的任务分配和调度算法。算法使用约束逻辑编程来对任务执行资源如处理单元、通信设备以及代码和数据存储量的使用进行多目标全局约束。算法假设ROM和RAM分别用于代码存储和数据存储,算法还考虑数据在数据存储器中的位置。实验结果表明,尽管在多个约束条件下,提出的任务分配和调度算法无论在代码存储和数据存储量使用方面,还是在对任务有效求解方面都能取得比普遍采用的贪婪调度算法更好的结果。  相似文献   

15.
Scheduling is a fundamental issue in achieving high performance on metacomputers and computational grids. For the first time, the job scheduling problem for grid computing on metacomputers is studied as a combinatorial optimization problem. A cost model is proposed for modeling communication heterogeneity on computational grids. A processor allocation algorithm is developed which always finds an optimal processor allocation that minimizes the effective execution time of a job when the job is being scheduled. It is proven that the list scheduling (LS) algorithm can achieve reasonable worst-case performance bound in grid environments supporting distributed supercomputing with large applications. We compare the performance of various job scheduling and processor allocation algorithms for grid computing on metacomputers. We evaluate the performance of 128 combinations of two job scheduling algorithms, four initial job ordering strategies, four processor allocation algorithms, and four metacomputers by extensive simulation. It is found that the combination of largest job first (LJF) initial job ordering and minimum effective execution time (MEET) or largest machine first (LMF) processor allocation algorithm yields the best average-case performance, and the choice of FCFS and LS depends on the range of job sizes. It is also observed that communication heterogeneity does have significant impact on schedule lengths.  相似文献   

16.
This work is motivated by the rapid increase in design complexity of many multi-function System on Chips. It proposes solutions to both resolve the hardware contention issues of non-preemptive processing elements shared among tasks, and to optimize cost. A software solution based on start time management is proposed to interleave task execution on processing elements. Algorithms are proposed to determine the required processing elements of selected types, when there is no knowledge on the release time of any task. For tasks whose release orders are known a priori, an optimal algorithm is presented if processing elements have the same cost, otherwise, if processing elements do not have the same cost, a pseudo polynomial-time algorithm based on dynamic programming is presented. The performance of the algorithms is also evaluated for general cases.  相似文献   

17.
QCDOC is a massively parallel supercomputer with tens of thousands of nodes distributed on a six-dimensional torus network. The 6D structure of the network provides the needed communication resources for many communication-intensive applications. In this paper, we present a parallel algorithm for three-dimensional Fast Fourier Transform and its implementation for a 4096-node QCDOC prototype. Two techniques have been used to increase its parallel performance: simultaneous multi-dimensional communication and communication-and-computation overlapping. Benchmarking experiments suggest that 3D FFTs of size 128×128×128 can scale well on such platforms up to 4096 nodes. Our performance results suggest stronger scalability on QCDOC than on IBM BlueGene/L supercomputer.  相似文献   

18.
Mapping linear workflow applications onto a set of homogeneous processors can be optimally solved in polynomial time for the throughput objective with fewer processors than stages. This result holds true even when setup times occur in the execution and homogeneous buffers are available for the storage of intermediate results. In this kind of application, several computation stages are interconnected as a linear application graph, and each stage holds a buffer of limited size where intermediate results are stored and a processor setup time occurs when passing from one stage to another. In this paper, we tackle the problem in which the buffer sizes are not given beforehand and must be fixed before the execution to maximize the throughput within each processor. The goal of this work is to minimize the cost induced by the setup times by allocating buffers that are proportinal in size to each other. We present a closed formula to compute the optimal buffer allocation in the case of nondecreasing setup costs in the linear application. For the case of unsorted setup times, we provide competitive heuristics that are validated via extensive simulation. Three nonscalable brute force algorithms are also provided to compare heuristic approaches to optimal ones for small applications and to evaluate the relevance of our approach.  相似文献   

19.
We study the fair allocation of bandwidth in multicast networks with multirate capabilities. In multirate transmission, each source encodes its signal in layers. The lowest layer contains the most important information and all receivers of a session should receive it. If a receiver's data path has additional bandwidth, it receives higher layers which leads to a better quality of reception. The bandwidth allocation objective is to distribute the layers fairly. We present a computationally simple, decentralized scheduling policy that attains the maxmin fair rates without using any knowledge of traffic statistics and layer bandwidths. This policy learns the congestion level from the queue lengths at the nodes, and adapts the packet transmissions accordingly. When the network is congested, packets are dropped from the higher layers; therefore, the more important lower layers suffer negligible packet loss. We present analytical and simulation results that guarantee the maxmin fairness of the resulting rate allocation, and upper bound the packet loss rates for different layers.  相似文献   

20.
Expert system for scheduling in an airline gate allocation   总被引:4,自引:0,他引:4  
Scheduling is an important technique encompassing a wide application area. Because of the complex interrelations among the resources, knowledge, and various other constraints, scheduling has many difficulties. Artificial Intelligence technology has been applied to solve the scheduling problem. As AI techniques are efficient in representing knowledge and dealing with heuristics, it is an adequate approach to model and to solve scheduling problems. We have implemented the ramp scheduling system, called RACES (Ramp Activity Coordination Expert System), to solve complex and dynamic aircraft parking problems. RACES was developed from the domain knowledge and experience which were acquired from the domain experts. Domain knowledge and experience are important factors in controlling the scheduling procedure. RACES divides the problem into sub-problems and experimental heuristics in the knowledge acquisition process. The system independently processes scheduling for the divided sub-problems and shares variables and domains. During the scheduling, the system selects or confines the search space with domain filtering techniques by exploiting the characteristics of various constraints and knowledge. RACES produces a user-driven near-optimal solution by means of a trade-off scheduling method using heuristics between the size of aircraft and the best-fit time. For 400 daily flights, RACES made parking schedules for aircraft in about 20 s compared with 4–5 h by human experts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号