期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient mapping of backpropagation algorithm onto a network ofworkstations

Sudhakar V. Siva Ram Murthy C. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》1998,28(6):841-848

In this paper, we present an efficient technique for mapping a backpropagation (BP) learning algorithm for multilayered neural networks onto a network of workstations (NOW's). We adopt a vertical partitioning scheme, where each layer in the neural network is divided into p disjoint partitions, and map each partition onto an independent workstation in a network of p workstations. We present a fully distributed version of the BP algorithm and also its speedup analysis. We compare the performance of our algorithm with a recent work involving the vertical partitioning approach for mapping the BP algorithm onto a distributed memory multiprocessor. Our results on SUN 3/50 NOW's show that we are able to achieve better speedups by using only two communication sets and also by avoiding some redundancy in the weights computation for one training cycle of the algorithm. 相似文献

2.

Scalable distributed on-the-fly symbolic model checking

Shoham Ben-David Orna Grumberg Tamir Heyman Assaf Schuster 《International Journal on Software Tools for Technology Transfer (STTT)》2003,4(4):496-504

This paper presents a scalable method for parallel symbolic on-the-fly model checking in a distributed memory environment. Our method combines a scheme for on-the-fly model checking for safety properties with a scheme for scalable reachability analysis. We suggest an efficient, BDD-based algorithm for a distributed construction of a counterexample. The extra memory requirement for counterexample generation is evenly distributed among the processes by a memory balancing procedure. At no point during computation does the memory of a single process contain all the data. This enhances scalability. Collaboration between the parallel processes during counterexample generation reduces memory utilization for the backward step. We implemented our method on a standard, loosely- connected environment of workstations, using a high-performance model checker. Our initial performance evaluation, carried out on several large circuits, shows that our method can check models that are too large to fit in the memory of a single node. Our on-the-fly approach may find counterexamples even when the model is too large to fit in the memory of the parallel system. 相似文献

3.

Symbolic systems,explicit properties: on hybrid approaches for LTL symbolic model checking

Roberto Sebastiani Stefano Tonetta Moshe Y. Vardi 《International Journal on Software Tools for Technology Transfer (STTT)》2011,13(4):319-335

In this work we study hybrid approaches to LTL symbolic model checking; that is, approaches that use explicit representations of the property automaton, whose state space is often quite manageable, and symbolic representations of the system, whose state space is typically exceedingly large. We compare the effects of using, respectively, (i) a purely symbolic representation of the property automaton, (ii) a symbolic representation, using logarithmic encoding, of explicitly compiled property automaton, and (iii) a partitioning of the symbolic state space according to an explicitly compiled property automaton. We apply this comparison to three model-checking algorithms: the doubly-nested fixpoint algorithm of Emerson and Lei, the reduction of emptiness to reachability of Biere et al., and the singly-nested fixpoint algorithm of Bloem et al. for weak automata. The emerging picture from our study is quite clear, hybrid approaches outperform pure symbolic model checking, while partitioning generally performs better than logarithmic encoding. The conclusion is that the hybrid approaches benefit from state-of-the-art techniques in semantic compilation of LTL properties. Partitioning gains further from the fact that the image computation is applied to smaller sets of states. 相似文献

4.

Fault-Tolerant Parallel Scheduling of Tasks on a Heterogeneous High-Performance Workstation Cluster 总被引：1，自引：1，他引：0

Kwok Yu-Kwong 《The Journal of supercomputing》2001,19(3):299-314

We propose a new approach, called cluster-based search (CBS), for scheduling large task graphs in parallel on a heterogeneous cluster of workstations connected by a high-speed network (e.g., using an ATM switch at OC-3 speed). The CBS algorithm uses a parallel random neighborhood search which works by refining multiple different initial schedules simultaneously using different workstations. The workstations communicate periodically to exchange their best solutions found thus far in order to direct the search to more promising regions in the search space. Heterogeneity of machines is exploited by the biased partitioning of the search space. The parallel random neighborhood search is fault-tolerant in that the workload of a failed workstation is automatically redistributed to other workstations so that the search can continue. We have implemented the CBS algorithm as a core function of our on-going development of SSI middleware for a Sun workstation cluster. 相似文献

5.

Efficient distributed SAT and SAT-based distributed Bounded Model Checking

Malay K. Ganai Aarti Gupta Zijiang Yang Pranav Ashar 《International Journal on Software Tools for Technology Transfer (STTT)》2006,8(4-5):387-396

SAT-based Bounded Model Checking (BMC), though a robust and scalable verification approach, still is computationally intensive, requiring large memory and time. Even with the recent development of improved SAT solvers, the memory limitation of a single server rather than time can become a bottleneck for doing deeper BMC search for large designs. Distributing computing requirements of BMC over a network of workstations can overcome the memory limitation of a single server, albeit at increased communication cost. In this paper, we present (a) a method for distributed SAT over a network of workstations using a Master/Client model where each Client workstation has an exclusive partition of the SAT problem and uses knowledge of partition topology to communicate with other Clients, (b) a method for distributing SAT-based BMC using the distributed SAT. For the sake of scalability, at no point in the BMC computation does a single workstation have all the information. We experimented on a network of heterogeneous workstations interconnected with a standard Ethernet LAN. To illustrate, on an industrial design with ∼13 K FFs and ∼0.5 million gates, the non-distributed BMC on a single workstation (with 4 GB memory) ran out of memory after reaching a depth of 120; on the other hand, our SAT-based distributed BMC over 5 similar workstations was able to go up to 323 steps with a communication overhead of only 30%. 相似文献

6.

Scalable processing and autocovariance computation of big functional data

下载免费PDF全文

Nieves R. Brisaboa Ricardo Cao José R. Paramá Fernando Silva‐Coira 《Software》2018,48(1):123-140

This paper presents 2 main contributions. The first is a compact representation of huge sets of functional data or trajectories of continuous‐time stochastic processes, which allows keeping the data always compressed even during the processing in main memory. It is oriented to facilitate the efficient computation of the sample autocovariance function without a previous decompression of the data set, by using only partial local decoding. The second contribution is a new memory‐efficient algorithm to compute the sample autocovariance function. The combination of the compact representation and the new memory‐efficient algorithm obtained in our experiments the following benefits. The compressed data occupy in the disk 75% of the space needed by the original data. The computation of the autocovariance function used up to 13 times less main memory, and run 65% faster than the classical method implemented, for example, in the R package. 相似文献

7.

并行分布式体绘制算法的设计 总被引：1，自引：0，他引：1

潘志庚钱喜武《小型微型计算机系统》1994,15(10):35-39

体绘制是一种需要大量计算资源和内存资源的可视化任务，本文提出一种并行分布式体绘制算法，对绘制任务进行适当剖分，使用连网的工作站进行计算，有效地加快了图形绘制速度。相似文献

8.

Communication and memory optimal parallel data cube construction

Jin R. Vaidyanathan K. Yang G. Agrawal G. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(12):1105-1119

Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. This paper addresses a number of algorithmic issues in parallel data cube construction. First, we present an aggregation tree for sequential (and parallel) data cube construction, which has minimally bounded memory requirements. An aggregation tree is parameterized by the ordering of dimensions. We present a parallel algorithm based upon the aggregation tree. We analyze the interprocessor communication volume and construct a closed form expression for it. We prove that the same ordering of the dimensions in the aggregation tree minimizes both the computational and communication requirements. We also describe a method for partitioning the initial array and prove that it minimizes the communication volume. Finally, in the cases when memory may be a bottleneck, we describe how tiling can help scale sequential and parallel data cube construction. Experimental results from implementation of our algorithms on a cluster of workstations show the effectiveness of our algorithms and validate our theoretical results. 相似文献

9.

Parallel implementation of back-propagation algorithm in networks of workstations

Suresh S. Omkar S.N. Mani V. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(1):24-34

This work presents an efficient mapping scheme for the multilayer perceptron (MLP) network trained using back-propagation (BP) algorithm on network of workstations (NOWs). Hybrid partitioning (HP) scheme is used to partition the network and each partition is mapped on to processors in NOWs. We derive the processing time and memory space required to implement the parallel BP algorithm in NOWs. The performance parameters like speed-up and space reduction factor are evaluated for the HP scheme and it is compared with earlier work involving vertical partitioning (VP) scheme for mapping the MLP on NOWs. The performance of the HP scheme is evaluated by solving optical character recognition (OCR) problem in a network of ALPHA machines. The analytical and experimental performance shows that the proposed parallel algorithm has better speed-up, less communication time, and better space reduction factor than the earlier algorithm. This work also presents a simple and efficient static mapping scheme on heterogeneous system. Using divisible load scheduling theory, a closed-form expression for number of neurons assigned to each processor in the NOW is obtained. Analytical and experimental results for static mapping problem on NOWs are also presented. 相似文献

10.

Incremental Volume Rendering Using Hierarchical Compression

Michael B. Haley Edwin H. Blake 《Computer Graphics Forum》1996,15(3):45-55

We present a new algorithm here for efficient incremental rendering of volumetric datasets. The primary goal of this algorithm is to give average workstations the ability to efficiently render volume data received over relatively low bandwidth network links in such a way that rapid user feedback is maintained. Common limitations of workstation rendering of volume data include: large memory overheads, the requirement of expensive rendering hardware, and high speed processing ability. The rendering algorithm presented here overcomes these problems by making use of the efficient Shear-Warp Factorisation method which does not require specialised graphics hardware. However the original Shear-Warp algorithm suffers from a high memory overhead and does not provide for incremental rendering which is required should rapid user feedback be maintained. Our algorithm represents the volumetric data using a hierarchical data structure which provides for the incremental classification and rendering of volume data. This exploits the multiscale nature of the octree data structure. The algorithm reduces the memory footprint of the original Shear-Warp Factorisation algorithm by a factor of more than two, while maintaining good rendering performance. These factors make our octree algorithm more suitable for implementation on average desktop workstations for the purposes of interactive exploration of volume models over a network. Results from tests using typical volume datasets will be presented which demonstrate the ability of the algorithm to achieve high rendering rates for both incremental rendering and standard rendering while reducing the runtime memory requirements. 相似文献

11.

Dynamic Load Balancing for the Distributed Mining of Molecular Structures 总被引：1，自引：0，他引：1

Di Fatta G. Berthold M.R. 《Parallel and Distributed Systems, IEEE Transactions on》2006,17(8):773-785

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute's HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a nondedicated computational environment. These features make it suitable for large-scale, multidomain, heterogeneous environments, such as computational grids. 相似文献

12.

Distributed reachability testing of concurrent programs

Richard H. Carver Yu Lei 《Concurrency and Computation》2010,22(18):2445-2466

Reachability testing is an approach to verifying concurrent programs. During reachability testing, every partially ordered synchronization sequence of a program with a given input is exercised exactly once. In this paper, we present the design and implementation of a distributed reachability testing algorithm for a cluster of workstations. This algorithm allows different test sequences to be exercised concurrently by different workstations without any synchronization, and without any duplication of sequences among workstations. Dynamic load balancing is performed using a work‐stealing scheme. A novel aspect of this scheme is that work‐stealing requests progress in rounds. This round‐based structure identifies overloaded workstations to target for work stealing. Empirical studies show good speedup for four benchmark Java programs and one Lotos specification. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

13.

大图上跳数受限的可达查询算法研究与优化

邢耕源徐云《计算机系统应用》2021,30(10):164-170

判断有向图上两个顶点之间是否存在一条路径是一个经典问题,而对于一些路由规划和图分析等实际应用,要求查找是否存在跳数受限的可达路径,这是一个变种的图可达查询问题.对于大图上跳数受限的查询算法,不仅仅要对大图查询的时间效率和空间效率进行权衡,而且还要利用跳数受限的特性进行优化.普通的可达查询算法存在小度数顶点索引项占用空间过多的问题,造成空间浪费严重.为此我们提出了一种面向跳数受限的2-hop部分索引方法,采用改进的索引方法并结合局部搜索,实现跳数受限的有效可达性查询.实验结果表明,在Orkut社交网络数据集上与已有算法相比,该算法索引空间节省了32％,同时查询时间略微增加,使得我们算法可以计算更大规模图的跳数受限可达问题. 相似文献

14.

Computing the correlation dimension on a network of workstations

Angelo Corana 《Concurrency and Computation》1998,10(10):737-762

We present a parallel algorithm for computing the correlation dimension (D₂) from a time series generated by a dynamic system, using the method of correlation integrals, which essentially requires the computation of distances among a set of points in the state space. The parallelization is suitable for coarse-grained multiprocessor systems with distributed memory and is carried out using a virtually shared memory model. The algorithm simultaneously gives all the correlation integrals at various state space dimensions needed to estimate the D₂. Two versions are discussed: the first computes all distances between points; the second computes only distances less than a fixed ϵ, and employs a box-assisted approach and linked lists for an efficient search of neighbouring points. The algorithms, coded in Fortran 77, are tested on a heterogeneous network of workstations consisting of various DEC Alphas of different powers, interconnected by Ethernet; the Network Linda parallel environment is used. A detailed analysis of performance is carried out using the generalization of speed-up and efficiency for heterogeneous systems. The algorithms are fully asynchronous and so intrinsically balanced. In almost all the situations they provide a unitary efficiency. The second version greatly reduces the computational work, thus making it possible to tackle D₂ estimation even for medium and high-dimensional systems, where an extremely large number of points is involved. The algorithms can also be employed in other applicative contexts requiring the efficient computation of distances among a large set of points. The method proposed for the analysis of performance can be applied to similar problems. © 1998 John Wiley & Sons, Ltd. 相似文献

15.

一种基于MapReduce的OpenFlow网络属性并行验证算法

刘艺雷程张红旗杨英杰代向东《计算机应用研究》2016,33(7)

针对OpenFlow网络中流表配置错误引起转发回路、路由黑洞和访问控制规则失效等问题,提出一种并行的基于MapReduce的OpenFlow网络属性验证算法。通过在Map阶段划分规则等价类,在Reduce阶段为规则等价类构建基于交换机端口谓词的网络转发图并分析可达性,实现对网络属性的并行验证。同时,通过采用原子谓词将传统可达性分析中的规则匹配域多维集合运算转换为整数集合运算,以进一步提高可达性分析效率。此外,基于原子谓词的谓词表达方式可消除交换机端口谓词集合中的冗余项,降低存储开销。最后,通过理论分析和仿真实验验证了算法的正确性及在时间和存储开销方面的优越性。相似文献

16.

Distributed generic approximate sparse inverses

George A. Gravvanis Christos K. Filelis-Papadopoulos 《The Journal of supercomputing》2014,70(1):365-384

The need for accuracy in the solution of linear systems derived from the discretization of partial differential equations leads to large sparse linear systems. The solution of sparse linear systems requires efficient scalable methods. Iterative solvers require efficient parallel preconditioning methods to solve effectively sparse linear systems. Herewith, a new parallel algorithm for the generic approximate sparse inverse matrix method for distributed memory systems is proposed. The computation of the distributed generic approximate sparse inverse matrix is based on a column-wise approach, which allows the separation to independent problems that can be handled in parallel without synchronization points or intermediate communications. This is achieved by reforming the generic approximate sparse inverse matrix algorithm and its process of computation with a new partial solution method for the computation of the nonzero elements of each column dictated by the approximate inverse sparsity pattern. Moreover, an algorithmic scheme is proposed for the efficient distribution of data amongst the available workstations, along with a load balancing scheme for problems with large standard deviation in the number of nonzero elements per column. Numerical results are presented for the proposed schemes for various model problems. 相似文献

17.

A Parallel Adaptive Gauss-Jordan Algorithm

Melab N. Talbi E.-G. Petiton S. 《The Journal of supercomputing》2000,17(2):167-185

This paper presents a parallel adaptive version of the block-based Gauss-Jordan algorithm, utilized to invert large matrices. This version includes a characterization of the workload and a mechanism of its folding/unfolding. Furthermore, this paper proposes a work scheduling strategy and an application-oriented solution for the fault tolerance problem. The application is implemented and experimented with MARS¹ in dedicated and non-dedicated environments. The results show that an absolute efficiency of 92% is possible on a cluster of DEC/ALPHA processors interconnected by a Gigaswitch network and an absolute efficiency of 67% can be obtained on an Ethernet network of SUN-Sparc 4 workstations. Moreover, the algorithm is tested on a meta-system including both the two parks of machines. Finally, an out-of-core solution for the algorithm is proposed. This solution allows a gain of 66% of data input operations and reduces the central memory space required for storing the data space of the algorithm by a factor q, where q is the dimension of the matrix to be inverted in terms of data blocks. 相似文献

18.

Distributed Verification of Multi-threaded C++ Programs

Stefan Edelkamp Shahid Jabbar Damian Sulewski 《Electronic Notes in Theoretical Computer Science》2008,198(1):33

Verification of multi-threaded C++ programs poses three major challenges: the large number of states, states with huge sizes, and time intensive expansions of states. This paper presents our efforts in addressing these issues by combining an efficient use of hard disk with the distribution of the state space on several computing nodes. The approach is applicable to clusters and multi-core machines with single or multiple hard disks. We exploit the concept of a signature of a state that allows the full state vector to stay on secondary memory. For a distributed exploration of the state space, we report the lessons learned from using different partitioning schemes, including Holzmann and Bosnacki's [G. Holzmann and D. Bosnacki. The design of a multi-core extension of the Spin Model Checker. IEEE Trans. on Software Engineering, 2007. To Appear] depth-slicing method, and their effects on blind and directed search algorithms.Empirical evaluation is done on our experimental C++ verification tool StEAM, which is capable of detecting errors such as segmentation faults, deadlocks, access conflicts, etc. The distributed algorithms are realized through MPI on a cluster of workstations. 相似文献

19.

Partitioning and Handling Massive Models for Interactive Collision Detection 总被引：1，自引：0，他引：1

A. Wilson E. Larsen D. Manocha & M. C. Lin 《Computer Graphics Forum》1999,18(3):319-330

We describe an approach for interactive collision detection and proximity computations on massive models composed of millions of geometric primitives. We address issues related to interactive data access and processing in a large geometric database, which may not fit into main memory of typical desktop workstations or computers. We present a new algorithm using overlap graphs for localizing the "regions of interest" within a massive model, thereby reducing runtime memory requirements. The overlap graph is computed off-line, pre-processed using graph partitioning algorithms, and modified on the fly as needed. At run time, we traverse localized sub-graphs to check the corresponding geometry for proximity and pre-fetch geometry and auxiliary data structures. To perform interactive proximity queries, we use bounding-volume hierarchies and take advantage of spatial and temporal coherence. Based on the proposed algorithms, we have developed a system called IMMPACT and used it for interaction with a CAD model of a power plant consisting of over 15 million triangles. We are able to perform a number of proximity queries in real-time on such a model. In terms of model complexity and application to large models, we have improved the performance of interactive collision detection and proximity computation algorithms by an order of magnitude. 相似文献

20.

对于模糊ARTMAP的研究及改进

刘进涛王万森《微机发展》2006,16(11):16-18

在模糊artmap网络中,Fa2层每增加一个的神经元就表示增加了一个聚类,这是该网络的优点同时也增加了网络的负荷。文中对模糊artmap算法进行了简要介绍并分析了算法的收敛复杂度。在此基础上,提出数据集划分法以提高网络的收敛速度。最后通过试验比较算法改进前后的网络收敛速度的差异,证明改进后网络的收敛速度明显提高。相似文献