首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Parallel merge sort is useful for sorting a large quantity of data progressively. The merge sort should be parallelized carefully since the conventional algorithm has poor performance due to the successive reduction of the number of participating processors by half, and down to one in the last merging stage. The proposed load-balanced merge sort utilizes all processors throughout the computation. It evenly distributes data to all processors in each stage. Thus every processor is forced to work in all phases. Significant performance enhancement has been achieved up to a speedup of (P–1)/log P where P is the number of processors. Experimental results demonstrate a speedup of 9.6 (upper bound of 10.7) on 32-processor Cray T3E when sorting 4M 32-bit integers, and a speed up of 2.3 (upper bound of 2.8) on an 8-node PC cluster.  相似文献   

2.
处理器阵列的容错重构技术是片上网络多核、众核高性能体系结构的可靠性技术之一。现有的最大逻辑阵列并行重构技术仅对单条逻辑列的构造实现了并行化,而对多条逻辑列的同步并行仍未见可行算法。依据处理器阵列的潜在并行性,在分治策略的基础上,提出了一种阵列分块的并行重构算法。算法对处理器阵列实施横向分块划分,对每个阵列块进行并行重构,并对所得逻辑子阵列进行归并,实现了多条逻辑列的同步并行重构。与现有的并行算法相比,新算法同样能够生成最大逻辑列,并且减少了通信开销与计算中的数据冗余,有效提高了运行速度。实验结果表明,在物理阵列大小为64×64的处理器阵列上,运行速度比现有并行算法提高39.55%,并且具有良好的可扩展性。  相似文献   

3.
陈宏建  陈崚  秦玲  徐晓华  屠莉 《计算机工程》2004,30(24):17-18,191
在Y.Pan提出的基于流水光总线阵列模型(LARPBS)上使用N个处理器对N个元素进行排序在最好情况下以O(logN)时间,最坏情况下以O(N)时间完成的并行排序算法的基础上,提出了一种LARPBS模型上的可扩展的快速并行排序算法,对N个元素进行排序,使用p(1≤P≤N)个处理器在最好情况下以O(NlogN/p)时间,最坏情况下以O(N^2/p)时间完成排序。另外还提出了一种LARPBS模型上改进的快速高效并行排序算法,该算法对N个元素进行排序使用N个处理器在最好情况下以O(log√N)时间、最坏情况下以O(√N)时间完成排序。  相似文献   

4.
The computational complexity of a parallel algorithm depends critically on the model of computation. We describe a simple and elegant rule-based model of computation in which processors apply rules asynchronously to pairs of objects from a global object space. Application of a rule to a pair of objects results in the creation of a new object if the objects satisfy the guard of the rule. The model can be efficiently implemented as a novel MIMD array processor architecture, the Intersecting Broadcast Machine. For this model of computation, we describe an efficient parallel sorting algorithm based on mergesort. The computational complexity of the sorting algorithm isO(nlog2 n), comparable to that for specialized sorting networks and an improvement on theO(n 1.5) complexity of conventional mesh-connected array processors.  相似文献   

5.
文章提出了一种LARPBS模型上的并行归并排序算法,利用该算法对长度为N的序列进行排序,使用N~(1+)着(0<着<1)个处理机可以在O((loglogN)~2)时间完成。  相似文献   

6.
In this paper we study the sorting performance of a 128-processor CRAY T3D and discuss the efficient use of the toroidal network connecting the processors. The problems we consider range from that of sorting one word per processor to sorting the entire memory of the machine, and we give efficient algorithms for each case. In addition, we give both algorithms that make assumptions about the distribution of the data and those that make no assumptions. The clear winner, if data can be assumed to be uniformly distributed, is a method that we call a hash-and-chain sort. The time for this algorithm to sort one million words per processor over 64 processors is less than two seconds, which compares favorably to about four seconds using a 4-processor CRAY C90 and about 17 seconds using a 64-processor Thinking Machines CM-5.  相似文献   

7.
划分点定位并行排序算法   总被引:5,自引:0,他引:5  
提出并分析了划分点定位并行排序(parallel sorting by divide-point locating)算法。在算法中,输入数据被平均划分并分配给所有处理机,因此每个处理机具有相同的工作负载。给出了网络分布计算环境下PSDL算法的实验结果,并与PSRS算法进行了对比。理论分析和实验结果表明,PSDL算法是一种高效率、高扩展性的并行排序算法。  相似文献   

8.
该文介绍了带有宽总线网络的可重构计算模型(RAPWBN)的基本结构及其二进制值的前缀和操作,提出了一种快速并行排序算法,对长度为N的序列进行排序,在具有N2个处理器和N条行总线的RAPWBN模型上,若总线带宽ω>logN字节,可以在O(1)时间完成排序。该算法的成本达到了最优。  相似文献   

9.
In this paper, we propose a new I/O overhead free Givens rotations based parallel algorithm for solving a system of linear equations. The algorithm uses a new technique called two-sided elimination and requires an N×(N+1) mesh-connected processor array to solve N linear equations in (5N-log N-4) time steps. The array is well suited for VLSI implementation as identical processors with simple and regular interconnection pattern are required. We also describe a fault-tolerant scheme based on an algorithm based fault tolerance (ABFT) approach. This scheme has small hardware and time overhead and can tolerate up to N processor failures  相似文献   

10.
We present two fast algorithms for sorting on a linear array with a reconfigurable pipelined bus system (LARPBS), one of the recently proposed parallel architectures based on optical buses. In our first algorithm, we sort N numbers in O(log N log log N) worst-case time using N processors. In our second algorithm, we sort N numbers in O((log log N)2) worst-case time using N1+ε processors, for any fixed ε such that 0 < ε < 1. Our algorithms are based on a novel deterministic sampling scheme for merging two sorted arrays of length N each in O(log log N) time on an LARPBS with N processors. To our knowledge, the previous best sorting algorithm on this architecture has a running time of O((log N)2) using N processors  相似文献   

11.
Many sorting algorithms have been studied in the past, but there are only a few algorithms that can effectively exploit both single‐instruction multiple‐data (SIMD) instructions and thread‐level parallelism. In this paper, we propose a new high‐performance sorting algorithm, called aligned‐access sort (AA‐sort), that exploits both the SIMD instructions and thread‐level parallelism available on today's multicore processors. Our algorithm consists of two phases, an in‐core sorting phase and an out‐of‐core merging phase. The in‐core sorting phase uses our new sorting algorithm that extends combsort to exploit SIMD instructions. The out‐of‐core algorithm is based on mergesort with our novel vectorized merging algorithm. Both phases can take advantage of SIMD instructions. The key to high performance is eliminating unaligned memory accesses that would reduce the effectiveness of SIMD instructions in both phases. We implemented and evaluated the AA‐sort on PowerPC 970MP and Cell Broadband Engine platforms. In summary, a sequential version of the AA‐sort using SIMD instructions outperformed IBM's optimized sequential sorting library by 1.8 times and bitonic mergesort using SIMD instructions by 3.3 times on PowerPC 970MP when sorting 32 million random 32‐bit integers. Also, a parallel version of AA‐sort demonstrated better scalability with increasing numbers of cores than a parallel version of bitonic mergesort on both platforms. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

12.
并行归并排序算法   总被引:3,自引:0,他引:3  
构造效率为O(1)的并行算法是一个引人注目的问题。[1]和[2]分别提出了并行度为O(logn)和O(n^1/2)的、效率为O(1)的并行排序算法。本文提出一种新的并行排序算法,其效率为O(1),而并行步数小于[1]和[2]的算法的并行步数。经过改进后,在保持效率为O(1)的情况下,可进一步将并行度扩大到O(n^1/2log n)。  相似文献   

13.
基于流水总线的可重构线性阵列系统(LARPBS)是一种建立在光总线上的并行计算模型,许多研究工作者已经在该模型上设计出了一些高效的并行算法。文章提出了一种基于LARPBS模型上Vnliant并行归并的实现算法,利用该法对长度为N的序列进行排序,最坏情况下可以使用N个处理器在O(logNloglogN)时间完成。  相似文献   

14.
A new parallel algorithm is proposed for fat image labeling using local operators on image pixels. The algorithm can be implemented on an n×n mesh-connected computer such that, for any integer k in the range [1, log (2n)], the algorithm requires Θ(kn1k/) bits of local memory per processor and takes Θ(kn) time. Bit-serial processors and communication links can be used without affecting the asymptotic time complexity of the algorithm. The time complexity of the algorithm has very small leading constant factors, which makes it superior to previous mesh computer labeling algorithms for most practical image sizes (e.g. up to 4096×4096 images). Furthermore, the algorithm is based on using stacks that can be realized using very fast shift registers within each processing element  相似文献   

15.
A sorting algorithm, dubbed MeshSort, for multidimensional mesh-connected multiprocessors is introduced. Bitonic Sort and ShearSort are shown to be special cases of MeshSort. MeshSort thus provides some insight into the operation of parallel sorting. It requires operations only along orthogonal vectors of processors, simplifying the control of the multiprocessor. This allows MeshSort to be used on any reduced architecture where a multidimensional memory structure is interconnected with a lower dimensional structure of processors. A modified version of MeshSort, called FastMeshSort, is presented. This algorithm applies the same basic principle as MeshSort, and is almost as simple to implement, but achieves much better performance. The modified algorithm is shown to be very efficient for reasonably sized meshes. FastMeshSort is presented as a practical sorting and routing algorithm for real multidimensional mesh-connected multiprocessors. The algorithms can easily be extended to other multiprocessor structures  相似文献   

16.
The problem of merging k (k⩾2) sorted lists is considered. We give an optimal parallel algorithm which takes O((n log k/p)+log n) time using p processors on a parallel random access machine that allows concurrent reads and exclusive writes, where n is the total size of the input lists. This algorithm achieves O(log n) time using p=n log k/log n processors. Most of the previous log n research for this problem has been focused on the case when k=2. Very recently, parallel solutions for the case when k=2 have been reported. Our solution is the first logarithmic time optimal parallel algorithm for the problem when k⩾2. It can also be seen as a unified optimal parallel algorithm for sorting and merging. In order to support the algorithm, a new processor assignment strategy is also presented  相似文献   

17.
基于流水光总线的可重构线性阵列系统(LARPBS)是一种建立在光总线上的并行计算模型。本文提出了一种基于LARPBS模型的快速排序并行算法,该算法使用n个处理器,对关 键字位数固定的n个记录可以在O(1)时间完成排序;对于关键字位数不固定的n个记录,可以在O(d)时间完成排序,这里d为关键字的最大位数。  相似文献   

18.
一种新的分"档"统计插入排序算法   总被引:10,自引:3,他引:7  
提出了一种谓之数据代码转换,分“档”统计,迁移插入的新排序方法,给出了该排序算法的描述,时间复杂度分析用C语言编写程序进行算法比较的实验结果,算法分析和实验结果都表明:在待排序数据均匀分布的情况下,分“档”统计插入排序方法的时间复杂度为O(N),并且排序速度明显优于快速排序,分段快速排序,按位段分块排序等算法。  相似文献   

19.
均匀分布数据的分"档"统计插入排序算法研究   总被引:20,自引:0,他引:20  
1.引言 排序(sorting)是计算机程序设计中的一种重要运算,它的功能是将一个数据元素(或记录)的任意序列,重新排列成一个接关键字有序的序列.其精确定义如下: 设{Ai}=(A1,A2,…,AN)为一有限的数据集合,其中人是集合中的元素,又叫做记录.若 Ai的某个标识特征为 Ki,称凡为 Ai的关键字,又称为健或标识符.{Ai}也称为记录集合.所谓排序,就是依照集合中元素的关键宇特征,把元素排列顺序重新加以组织,使新的有序集合{Aj}对于任意的,有或. 由于排序是计算机科学中一项复杂而重要的技术,…  相似文献   

20.
在介绍带有宽总线网络的可重构计算模型(RAPWBN)的二进制值的前缀和操作的基础上,提出了该模型上的抽取压缩操作算法,并由此得到了该模型上的并行归并排序算法。在具有N个处理器和N条行总线的RAPWBN模型上,若总线带宽ω>log N字节,对长度为N的序列进行归并排序,在最坏情况下以O(logN·loglogN)时间完成。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号