期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel on-line parsing in constant time per word

Klaas Sikkel 《Theoretical computer science》1993,120(2):303-310

An on-line parser processes each word as soon as it is typed by the user, without waiting for the end of the sentence. Thus, in an interactive system, a sentence will be parsed almost immediately after the last word has been presented.

The complexity of an on-line parser is determined by the resources needed for the analysis of a single word, as it is assumed that previous words have been processed already. Sequential parsing algorithms like CYK or Earley need O(n²) time for the nth word. A parallel implementation in O(n) time on O(n) processors is straightforward. In this paper a novel parallel on-line parser is presented that needs O(1) time on O(n²) processors. 相似文献

2.

Constant-time algorithm for computing the Euclidean distance maps of binary images on 2D meshes with reconfigurable buses

Yi Pan Keqin Li 《Information Sciences》1999,120(1-4):209-221

The computation of Euclidean distance maps (EDM), also called Euclidean distance transform, is a basic operation in computer vision, pattern recognition, and robotics. Fast computation of the EDM is needed since most of the applications using the EDM require real-time computation. It is shown in L. Chen and H.Y.H. Chuang [Information Processing Letters, 51, pp. 25–29 (1994)] that a lower bound Ω(n²) is required for any sequential EDM algorithm due to the fact that in any EDM algorithm each of the n² pixels has to be scanned at least once. Recently, many parallel EDM algorithms have been proposed to speedup its computation. Chen and Chuang proposed an algorithm for computing the EDM on an n×n mesh in O(n) time [L. Chen and H.Y.H. Chuang Parallel Computing, 21, pp. 841–852 (1995)]. Clearly, the VLSI complexities of both the sequential and the mesh algorithm described in L. Chen and H.Y.H. Chuang [Parallel Computing, 21, pp. 841–852 (1995)] are AT²=O(n⁴), where A is the VLSI layout area of the design and T is the computation time using area A when implemented in VLSI. In this paper, we propose a new and faster parallel algorithm for computing the EDM problem on the reconfigurable VLSI mesh model. For the same problem, our algorithm runs in O(1) time on a two-dimensional n²×n² reconfigurable mesh. We show that the VLSI complexity of our algorithm is the same as those of the above sequential algorithm and the mesh algorithm, while it uses much less time. To our best knowledge, this is the first constant-time EDM algorithm on any parallel computational model. 相似文献

3.

Spacetime-minimal systolic arrays for Gaussian elimination and the Algebraic path problem

Abdelhamid Benaini Yves Robert 《Parallel Computing》1990,15(1-3):211-225

In this paper, we derive time-minimal systolic arrays for Gaussian elimination and the Algebraic Path Problem (APP) that use a minimal number of processors. For a problem of size n, we obtain an execution time T(n) = 3n −1 using A(n) = n²/4+O(n) processors for Gaussian elimination, and T(n) = 5n −2 and A(n) = n³/+O(n) for the APP. 相似文献

4.

Parallel clustering algorithms 总被引：3，自引：0，他引：3

Xiaobo Li Zhixi Fang 《Parallel Computing》1989,11(3):275-290

Clustering techniques play an important role in exploratory pattern analysis, unsupervised learning and image segmentation applications. Many clustering algorithms, both partitional clustering and hierarchical clustering, require intensive computation, even for a modest number of patterns. This paper presents two parallel clustering algorithms. For a clustering problem with N = 2ⁿ patterns and M = 2^m features, the time complexity of the traditional partitional clustering algorithm on a single processor computer is O(MNK), where K is the number of clusters. The proposed algorithm on anSIMD computer with MN processors has a time complexity O(K(n + m)). The time complexity of the proposed single-link hierarchical clustering algorithm is reduced from O(MN²) of the uniprocessor algorithm to O(nN) with MN processors. 相似文献

5.

A parallel Householder tridiagonalization stratagem using scattered square decomposition

H. Y. Chang

S. Utku

M. SalamaD. Rapp 《Parallel Computing》1988,6(3):297-311

The parallel stratagem in this paper uses scattered square decomposition, introduced by G. Fox, for its data assignment and then exploits parallelism in the solution steps of the sequential Householder tridiagonalization algorithm. One may condense a real symmetric full matrix A of order n into a tridiagonal form by the stratagem in concurrent machines where N(= D²) processors are used. Expressions for efficiency and speedup are given for the evaluation of the stratagem. An alternative stratagem which requires less data transmission but more computations is also discussed. The results shown that the Householder Method of tridiagonalization may be implemented on a concurrent machine efficiently by scattered square decomposition provided that the number of matrix elements contained in each processor is much larger than the number of processors of the concurrent machine, and the ratio of the time to transmit one data item from one processor to any other processor to the time to perform a floating-point arithmetic operation is small enough. 相似文献

6.

Optimal and nearly optimal algorithms for approximating polynomial zeros

V.Y. Pan 《Computers & Mathematics with Applications》1996,31(12):97-138

We substantially improve the known algorithms for approximating all the complex zeros of an n^th degree polynomial p(x). Our new algorithms save both Boolean and arithmetic sequential time, versus the previous best algorithms of Schönhage [1], Pan [2], and Neff and Reif [3]. In parallel (NC) implementation, we dramatically decrease the number of processors, versus the parallel algorithm of Neff [4], which was the only NC algorithm known for this problem so far. Specifically, under the simple normalization assumption that the variable x has been scaled so as to confine the zeros of p(x) to the unit disc x : |x| ≤ 1, our algorithms (which promise to be practically effective) approximate all the zeros of p(x) within the absolute error bound 2^−b, by using order of n arithmetic operations and order of (b + n)n² Boolean (bitwise) operations (in both cases up to within polylogarithmic factors). The algorithms allow their optimal (work preserving) NC parallelization, so that they can be implemented by using polylogarithmic time and the orders of n arithmetic processors or (b + n)n² Boolean processors. All the cited bounds on the computational complexity are within polylogarithmic factors from the optimum (in terms of n and b) under both arithmetic and Boolean models of computation (in the Boolean case, under the additional (realistic) assumption that n = O(b)). 相似文献

7.

Parallel nested dissection

John M. Conroy 《Parallel Computing》1990,16(2-3):139-156

Nested dissection is a very popular direct method for solving sparse linear systems that arise from finite difference and finite element methods. Worley and Schreiber [16] give a fine grain algorithm for a square array of processors. Their algorithm uses O(N²) processors, each with O(N) memory, to factor an N² by N² sparse matrix whose graphs is an N × N mesh. The efficiency of their method is between 1/46 and 1/12. George et al. [6] [8] give a medium grain algorithm for hypercube architecture, while George et al. [7] give an algorithm for shared memory machines. These papers present a column oriented approach which can exploit O(N) parallelism and yield efficiencies up to 50%. Lucas [11] also gives a column oriented scheme which achieves up to 75% efficiency and O(N) parallelism. In this paper, we present a medium to fine grain algorithm for a P × P array of processors with local memory. This algorithm can exploit up to O(N²) parallelism. The efficiency of the fine grain version is comparable to [16] while as a medium grain algorithm achieves about 49% efficiency. The strength of the method is due to three factors: its ability to pipeline much of the computation, overlapping computation and communication, and the use of level 3 BLAS like primitives. In addition to its high efficiency its memory requirement is optimal, only O(N² log N/P²) words memory is needed per processor. 相似文献

8.

Linear rotation based algorithm and systolic architecture for solving linear system equations

I. -Chang Jou 《Parallel Computing》1989,11(3):367-379

A linear rotation based algorithm is proposed for solving linear system equations, Ax = b. This algorithm modified the conventional Gaussian elimination method and can avoid the problems of numerical singularity and ill condition. In this study, the implementation of a trapezoidal systolic array of n²/2 + n −2 processors as well as a linear array of n processors are accomplished for this algorithm. The trapezoidal systolic array performs the triangularization of a matrix A by using the modified linear rotation algorithm; while the linear array performs the backward substitution for evaluating the solution of x. The computing time for solving a linear equation system will be O(5n) time units. Also an implicit representation of the elimination factor by means of the sign parameter sequence instead of an numerical value is introduced for simplifying the hardware complexity. It is clear that this systolic architecture is simple, uniform, and regular, and therefore well suitable for the implementation of a VLSI chip. 相似文献

9.

A parallel two-list algorithm for the knapsack problem 总被引：10，自引：0，他引：10

Der-Chyuan Lou Chin-Chen Chang 《Parallel Computing》1997,22(14):1985-1996

An n-element knapsack problem has 2ⁿ possible solutions to search over, so a task which can be accomplished in 2″ trials if an exhaustive search is used. Due to the exponential time in solving the knapsack problem, the problem is considered to be very hard. In the past decade, much effort has been done in order to find techniques which could lead to practical algorithms with reasonable running time. In 1994, Chang et al. proposed a brilliant parallel algorithm, which needs O(2^n/8) processors to solve the knapsack problem in O(2^n/2) time; that is, the cost of Chang et al.'s parallel algorithm is O(2^5n/8). In this paper, we propose a parallel algorithm to improve Chang et al.'s parallel algorithm by reducing the time complexity to be O(2^3n/8) under the same O(2^n/8) processors available. Thus, the proposed parallel algorithm has a cost of O(2^n/2). It is an improvement over previous literature. We believe that the proposed parallel algorithm is pragmatically feasible at the moment when multiprocessor systems become more and more popular. 相似文献

10.

Two minimum spanning forest algorithms on fixed-size hypercube computers

Sajal K. Das Narsingh Deo Sushil Prasad 《Parallel Computing》1990,15(1-3):179-187

Two parallel algorithms for finding minimum spanning forest (MSF) of a weighted undirected graph on hypercube computers, consisting of a fixed number of processors, are presented. One algorithm is suited for sparse graphs, the other for dense graphs. Our design strategy is based on successive elimination of non-MSF edges. The input graph is partitioned equally among different processors, which then repeatedly eliminate non-MSF edges and merge results to gradually construct the desired MSF of the entire graph. Low communication overhead is achieved by restricting the message-flow to between the neighboring processors in the hypercube topology. The correctness of our approach is due to a theorem which states that with total-ordered edges, if an edge of an arbitrary subgraph does not belong to its MSF, then it does not belong to the MSF of the entire graph. For a graph of n vertices and m edges, our first algorithm finds an MSF in O(m log m)/p) time using p processors for p ≤ (mlog m)/n(1+log(m/n)). The second algorithm, efficient for dense graphs, requires O(n²/p) time for p≤n/log n. 相似文献

11.

Parallel marching Poisson solvers

Marian Vajter&#x;ic 《Parallel Computing》1984,1(3-4):325-330

The paper presents parallel algorithms for solving Poisson equation at N² mesh points. The methods based on marching techniques are structured for efficient parallel realization. Using orthogonal decomposition properties of arising matrices, the algorithms can be formulated in terms of transformed vectors. On a MIMD computer with not more than N processors, the computations can be performed in horizontal slices with minimal synchronization requirements. Considering an SIMD machine with N² processors, the complexity bound O(log N) has been achieved, whereby the single marching requires 10 log N steps only. 相似文献

12.

On fast planning of suboptimal paths amidst polygonal obstacles in plane

Nageswara S. V. Rao 《Theoretical computer science》1995,140(2):265-289

The problem of planning a path for a point robot from a source point s to a destination point d so as to avoid a set of polygonal obstacles in plane is considered. Using well-known methods, a shortest path from s to d can be computed with a time complexity of O(n²) where n is the total number of obstacle vertices. The focus here is in

1. (a) planning paths faster at the expense of setting for suboptimal path lengths and
2. (b) performance analysis of simple and/or well-known suboptimal methods.

A method that enables a hierarchical implementation of any path planning algorithm with no increase in the worst-case time complexity, is presented; this implementation enables fast planning of simple paths. Then methods are presented based on the Voronoi diagrams, trapezoidal decomposition and triangulation, which compute (suboptimal) paths in O(n√log n) time with the preprocessing costs of O(n log n), O(n²) and O(n log n), respectively. Using existing navigational algorithms for unknown terrains, algorithms that run in O(n log n) time (after preprocessing) and yield suboptimal paths, are presented. For all these algorithms, upper bounds on the path lengths are estimated in terms of the shortest of the obstacles, etc. 相似文献

13.

基于Nyström方法的偏好特征提取

杨美姣刘惊雷《计算机应用》2018,38(9):2515-2522

针对电影评分中特征提取效率较低的问题,提出了与QR分解相结合的Nyström方法。首先,利用自适应方法进行采样,然后对内部矩阵进行QR分解,将分解后的矩阵与内部矩阵进行重新组合并进行特征分解。Nyström方法的近似过程与标志点选取的数量以及选取标志点的过程密切相关,选取一系列具有标志性的点来保证采样后的近似性,自适应的采样方法能够保证近似的精度。QR分解能够保证矩阵的稳定性,提高偏好特征提取的精度。偏好特征提取的精度越高,推荐系统的稳定性就会越高,推荐的精度也会提高。最后在真实的观众对电影评分的数据集上进行了特征提取的实验,该电影数据集中包含480189个用户,17770部电影,实验结果表明,提取相同数目的标志点时,该算法的精度和效率都有了一定程度的提高：相对于采样前,时间复杂度由原来的O（n³）减少为O（nc²）（c<<n）;与标准的Nyström相比,误差控制在25%以下。相似文献

14.

Parallel processing approaches to edge relaxation

Eva Leung Xiaobo Li 《Pattern recognition》1988,21(6):547-558

This paper describes several parallel algorithms for image edge relaxation on array processors with different numbers of processing elements (PEs) connected by a mesh or hypercube network. The time complexity of Prager's original edge relaxation scheme is O(N²) per iteration using floating-point operations on a sequential machine, where N² is the number of pixels in the image. Modifications to the scheme are made so that no multiplications are employed and only integer operations are required. Moreover, with parallel processing, the time complexity per iteration is reduced to some constant value. A time complexity analysis on two parallel algorithms is performed. Although the algorithm on an array processor with 4N² PEs achieved higher degree of parallelism, the algorithm with N² PEs is preferred. Further modifications on the latter algorithm are made to accommodate to fewer PEs. 相似文献

15.

基于图勾勒的图链路预测方法

下载免费PDF全文

尤洁李劲张赛李婷《智能系统学报》2019,14(4):761-768

针对已有链路预测算法复杂度高,不适于在大规模图上进行链接预测的问题,本文基于图勾勒近似技术对已有链路预测方法进行优化,提出了基于图勾勒的链路预测方法。该方法将链路预测算法的计算复杂度由O（n³）降低至O（n²k²log²n）。为进一步提高链接预测效率,给出了基于Spark的并行化链路预测实现方法。在真实图数据集上进行测试,实验结果表明本文方法在保证链接预测精度的前提下,可有效提升算法效率。相似文献

16.

The unbounded single machine parallel batch scheduling problem with family jobs and release dates to minimize makespan 总被引：4，自引：0，他引：4

J. J. Yuan Z. H. Liu C. T. Ng T. C. E. Cheng 《Theoretical computer science》2004,320(2-3):199-212

In this paper we consider the unbounded single machine parallel batch scheduling problem with family jobs and release dates to minimize makespan. We show that this problem is strongly NP-hard, and give an O(n(n/m+1)^m) time dynamic programming algorithm and an O(mk^k+1P^2k−1) time dynamic programming algorithm, where n is the number of jobs, m is the number of families, k is the number of distinct release dates and P is the sum of the processing times of all families. We further give a heuristic with a performance ratio 2. We also give a polynomial-time approximation scheme for the problem. 相似文献

17.

On space-efficient algorithms for certain NP-complete problems

A. Ferreira 《Theoretical computer science》1993,120(2):311-315

Some recent results claimed the existence of a class of algorithms for certain NP-complete problems, with running time O(n^{1g k} 2^n/2) and storage requirements O(k 2^n/k), for 2 kn. In this note we show that those results do not hold, implying that an algorithm with time O(n 2^n/2) and space O(2^n/4) is still the best-known solution for such class of NP-complete problems. 相似文献

18.

Efficient enumeration of all minimal separators in a graph

Hong Shen Weifa Liang 《Theoretical computer science》1997,180(1-2):169-180

This paper presents an efficient algorithm for enumerating all minimal a-b separators separating given non-adjacent vertices a and b in an undirected connected simple graph G = (V, E), Our algorithm requires O(n³R_ab) time, which improves the known result of O(n⁴R_ab) time for solving this problem, where ¦V¦= n and R_ab is the number of minimal a-b separators. The algorithm can be generalized for enumerating all minimal A-B separators that separate non-adjacent vertex sets A, B < V, and it requires O(n²(n − n_A − n_b)R_AB) time in this case, where n_a = ¦A¦, n_B = ¦B¦ and r_AB is the number of all minimal A−B separators. Using the algorithm above as a routine, an efficient algorithm for enumerating all minimal separators of G separating G into at least two connected components is constructed. The algorithm runs in time O(n³R⁺_Σ + n⁴R_Σ), which improves the known result of O(n⁶R_Σ) time, where R_σ is the number of all minimal separators of G and R_ΣR⁺_Σ = ∑_1i, v_j) ER_{v_iv_j} n − 1)/2 − m)R_Σ. Efficient parallelization of these algorithms is also discussed. It is shown that the first algorithm requires at most O((n/log n)R_ab) time and the second one runs in time O((n/log n)R⁺_Σ+n log nR_Σ) on a CREW PRAM with O(n³) processors. 相似文献

19.

A new parallel algorithm for parsing arithmetic infix expressions

Y. N. Srikant Priti Shankar 《Parallel Computing》1987,4(3):291-304

A new parallel algorithm for transforming an arithmetic infix expression into a par se tree is presented. The technique is based on a result due to Fischer (1980) which enables the construction of the parse tree, by appropriately scanning the vector of precedence values associated with the elements of the expression. The algorithm presented here is suitable for execution on a shared memory model of an SIMD machine with no read/write conflicts permitted. It uses O(n) processors and has a time complexity of O(log²n) where n is the expression length. Parallel algorithms for generating code for an SIMD machine are also presented. 相似文献

20.

Distributed selectsort sorting algorithms on broadcast communication networks

Jau-Hsiung Huang Leonard Kleinrock 《Parallel Computing》1990,16(2-3):183-190

In this paper, a distributed selectsort algorithm and a parameterized selectsort algorithm are presented to be applied on distributed systems for cases when N P where N is the number of elements to be sorted and P is the number of processors in the system. The distributed system considered in this paper uses a broadcasting channel for communication between processors. We show that the number of messages required for the parameterized selectsort algorithm is independent of N and is of complexity O(P), which is optimal in a distributed system with P processors. Furthermore, the amount of communication required in terms of elements is N + O(P³) and the computation time complexity is O((N/P)lgN + P²lg(N/P)). Hence, when N P³, the computation time complexity is O((N/P)lgN), which is optimal using P processors. In addition, this parameterized algorithm provides us with a parameter K such that by choosing the value of K allows us to trade among processing requirement, memory requirement, and communication requirement. It is shown that this parameterized algorithm can reduce the communication requirements significantly while only slightly increasing the computation requirements. 相似文献