期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Determining the Castability of Simple Polyhedra

P. Bose D. Bremner M. van Kreveld 《Algorithmica》1997,19(1-2):84-113

A polyhedron P is castable if its boundary can be partitioned by a plane into two polyhedral terrains. Castable polyhedra can be manufactured easily using two cast parts, where each cast part can be removed from the object without breaking the cast part or the object. If we assume that the cast parts are each removed by a single translation, it is shown that for a simple polyhedron with n vertices, castability can be decided in time and linear space using a simple algorithm. A more complicated algorithm solves the problem in time and space, for any fixed ε > 0. In the case where the cast parts are to be removed in opposite directions, a simple O(n ² )-time algorithm is presented. Finally, if the object is a convex polyhedron and the cast parts are to be removed in opposite directions, a simple algorithm is presented. Received June 1, 1994; revised May 25, 1995. 相似文献

2.

High-performance cone beam reconstruction using CUDA compatible GPUs

Yusuke Okitsu Fumihiko Ino Kenichi Hagihara 《Parallel Computing》2010,36(2-3):129-141

Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper presents an acceleration method for cone beam reconstruction using CUDA compatible GPUs. The proposed method accelerates the Feldkamp, Davis, and Kress (FDK) algorithm using three techniques: (1) off-chip memory access reduction for saving the memory bandwidth; (2) loop unrolling for hiding the memory latency; and (3) multithreading for exploiting multiple GPUs. We describe how these techniques can be incorporated into the reconstruction code. We also show an analytical model to understand the reconstruction performance on multi-GPU environments. Experimental results show that the proposed method runs at 83% of the theoretical memory bandwidth, achieving a throughput of 64.3 projections per second (pps) for reconstruction of 512³-voxel volume from 360 512²-pixel projections. This performance is 41% higher than the previous CUDA-based method and is 24 times faster than a CPU-based method optimized by vector intrinsics. Some detailed analyses are also presented to understand how effectively the acceleration techniques increase the reconstruction performance of a naive method. We also demonstrate out-of-core reconstruction for large-scale datasets, up to 1024³-voxel volume. 相似文献

3.

Manifold-guaranteed out-of-core simplification of large meshes with controlled topological type 总被引：1，自引：1，他引：1

Yong-Jin Liu Matthew Ming-Fai Yuen Kai Tang 《The Visual computer》2003,19(7-8):565-580

In this paper, a simple and efficient algorithm is proposed for manifold-guaranteed out-of-core simplification of large meshes with controlled topological type. By dual-sampling the input large mesh model, the proposed algorithm utilizes a set of Hermite data (sample points with normals) as an intermediate model representation, which allows the topological structure of the mesh model to be encoded implicitly and thus makes it particularly suitable for out-of-core mesh simplification. Benefiting from the construction of an in-core signed distance field, the proposed algorithm possesses a set of features including manifoldness of the simplified meshes, toleration of nonmanifold mesh data input, topological noise removal, topological type control and, sharp features and boundary preservation. A novel, detailed implementation of the proposed algorithm is presented, and experimental results demonstrate that very large meshes can be simplified quickly on a low-cost off-the-shelf PC with tightly bounded approximation errors and with time and space efficiency. 相似文献

4.

基于连续HLOD的大规模场景快速绘制算法

高宇邓宝松陈鹏吴玲达《小型微型计算机系统》2007,28(7):1267-1271

大规模场景的快速绘制是虚拟现实技术重要的研究课题之一.为了加速场景的绘制,一般采用层次细节模型和可见性裁剪方法,但是现有算法在处理大规模场景时存在着局限性.本文提出了一种新的大规模场景快速绘制算法,该算法在场景层次划分的基础上,利用拓扑结构可变的网格简化方法为场景层次计算连续的分层层次细节模型（HLOD）;然后在实时绘制阶段,对场景分层层次细节模型进行视点相关的全局和局部细化,并结合快速有效的视域裁剪,从而大大加速了场景绘制速度.实验结果表明该算法是简单有效的,并且算法还可以进一步扩展到外存方式. 相似文献

5.

近乎最佳的Manhattan型Steiner树近似算法

马军杨波马绍汉《软件学报》2000,11(2):260-264

求解最佳的Manhattan型Steiner树问题(minimum rectilinear Steiner tree,简记为MRST问题)是在VLSI布线、网络通信中所遇到的组合优化问题,同时也是一个NP-难解问题.该文给出对该问题的O(n²)时间复杂性的近似算法.该算法在最坏情况下的近似比严格小于3/2.计算机实验结果表明,所求得的支撑树的平均费用与最佳算法的平均费用仅相差0.8%.该算法稍加修改,可应用到三维或多维的Manhattan空间对Steiner问题求解,且易于在并行与分布式环境下编程实现相似文献

6.

A unified framework for optimizing locality, parallelism, andcommunication in out-of-core computations

Kandemir M. Choudhary A. Ramanujam J. Kandaswamy M.A. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(7):648-668

This paper presents a unified framework that optimizes out-of-core programs by exploiting locality and parallelism, and reducing communication overhead. For out-of-core problems where the data set sizes far exceed the size of the available in-core memory, it is particularly important to exploit the memory hierarchy by optimizing the I/O accesses. We present algorithms that consider both iteration space (loop) and data space (file layout) transformations in a unified framework. We show that the performance of an out-of-core loop nest containing references to out-of-core arrays can be improved by using a suitable combination of file layout choices and loop restructuring transformations. Our approach considers array references one-by-one and attempts to optimize each reference for parallelism and locality. When there are references for which parallelism optimizations do not work, communication is vectorized so that data transfer can be performed before the innermost loop. Results from hand-compiles on IBM SP-2 and Inter Paragon distributed-memory message-passing architectures show that this approach reduces the execution times and improves the overall speedups. In addition, we extend the base algorithm to work with file layout constraints and show how it is useful for optimizing programs that consist of multiple loop nests 相似文献

7.

大规模外存场景的交互绘制

高宇魏迎梅吴玲达《计算机辅助设计与图形学学报》2007,19(6):792-797

在场景分割的基础上,提出一种连续分层层次细节模型组织场景,并对视点空间进行划分,为每个视点单元计算恰好满足单元内任意视点屏幕像素误差的场景图节点列表,称为cell-front;然后利用cell-front对外存模型的存储进行重新设计.在绘制时采用多线程技术,绘制线程对当前cell-front进行可见性剔除和视点相关的选择绘制;预取线程采用一种基于视点单元的预取策略,利用视点单元之间cell-front的变化控制预取数据的调度.该算法能够在保证场景绘制质量的前提下,在普通的PC机上实现大规模外存场景的实时交互显示. 相似文献

8.

MPI for Big Data: New tricks for an old dog

《Parallel Computing》2014,40(10):754-767

The processing of massive amounts of data on clusters with finite amount of memory has become an important problem facing the parallel/distributed computing community. While MapReduce-style technologies provide an effective means for addressing various problems that fit within the MapReduce paradigm, there are many classes of problems for which this paradigm is ill-suited. In this paper we present a runtime system for traditional MPI programs that enables the efficient and transparent out-of-core execution of distributed-memory parallel programs. This system, called BDMPI,¹ leverages the semantics of MPI’s API to orchestrate the execution of a large number of MPI processes on much fewer compute nodes, so that the running processes maximize the amount of computation that they perform with the data fetched from the disk. BDMPI enables the development of efficient out-of-core parallel distributed memory codes without the high engineering and algorithmic complexities associated with multiple levels of blocking. BDMPI achieves significantly better performance than existing technologies on a single node as well as on a small cluster, and performs within 30% of optimized out-of-core implementations. 相似文献

9.

Efficient Exact Algorithms through Enumerating Maximal Independent Sets and Other Techniques

Venkatesh Raman Saket Saurabh Somnath Sikdar 《Theory of Computing Systems》2007,41(3):563-587

We give substantially improved exact exponential-time algorithms for a number of NP-hard problems. These algorithms are obtained using a variety of techniques. These techniques include: obtaining exact algorithms by enumerating maximal independent sets in a graph, obtaining exact algorithms from parameterized algorithms and a variant of the usual branch-and-bound technique which we call the "colored" branch-and-bound technique. These techniques are simple in that they avoid detailed case analyses and yield algorithms that can be easily implemented. We show the power of these techniques by applying them to several NP-hard problems and obtaining new improved upper bounds on the running time. The specific problems that we tackle are: (1) the Odd Cycle Transversal problem in general undirected graphs, (2) the Feedback Vertex Set problem in directed graphs of maximum degree 4, (3) Feedback Arc Set problem in tournaments, (4) the 4-Hitting Set problem and (5) the Minimum Maximal Matching and the Edge Dominating Set problems. The algorithms that we present for these problems are the best known and are a substantial improvement over previous best results. For example, for the Minimum Maximal Matching we give an O^*(1.4425ⁿ) algorithm improving the previous best result of O^*(1.4422^m) [35]. For the Odd Cycle Transversal problem, we give an O^*(1.62ⁿ) algorithm which improves the previous time bound of O^*(1.7724ⁿ) [3]. 相似文献

10.

Applications of a two-dimensional hidden-line algorithm to other geometric problems

H. ElGindy D. Avis G. Toussaint 《Computing》1983,31(3):191-202

Recently ElGindy and Avis (EA) presented anO(n) algorithm for solving the two-dimensional hidden-line problem in ann-sided simple polygon. In this paper we show that their algorithm can be used to solve other geometric problems. In particular, triangulating anL-convex polygon and finding the convex hull of a simple polygon can be accomplished inO(n) time, whereas testing a simple polygon forL-convexity can be done inO(n ²) time. 相似文献

11.

A Parallel Adaptive Gauss-Jordan Algorithm

Melab N. Talbi E.-G. Petiton S. 《The Journal of supercomputing》2000,17(2):167-185

This paper presents a parallel adaptive version of the block-based Gauss-Jordan algorithm, utilized to invert large matrices. This version includes a characterization of the workload and a mechanism of its folding/unfolding. Furthermore, this paper proposes a work scheduling strategy and an application-oriented solution for the fault tolerance problem. The application is implemented and experimented with MARS¹ in dedicated and non-dedicated environments. The results show that an absolute efficiency of 92% is possible on a cluster of DEC/ALPHA processors interconnected by a Gigaswitch network and an absolute efficiency of 67% can be obtained on an Ethernet network of SUN-Sparc 4 workstations. Moreover, the algorithm is tested on a meta-system including both the two parks of machines. Finally, an out-of-core solution for the algorithm is proposed. This solution allows a gain of 66% of data input operations and reduces the central memory space required for storing the data space of the algorithm by a factor q, where q is the dimension of the matrix to be inverted in terms of data blocks. 相似文献

12.

Detecting Holes and Antiholes in Graphs

Stavros D. Nikolopoulos Leonidas Palios 《Algorithmica》2007,47(2):119-138

In this paper we study the problems of detecting holes and antiholes in general undirected graphs, and we present algorithms for these problems. For an input graph G on n vertices and m edges, our algorithms run in O(n + m²) time and require O(n m) space; we thus provide a solution to the open problem posed by Hayward et al. asking for an O(n⁴)-time algorithm for finding holes in arbitrary graphs. The key element of the algorithms is the use of the depth-first-search traversal on appropriate auxiliary graphs in which moving between any two adjacent vertices is equivalent to walking along a P₄ (i.e., a chordless path on four vertices) of the input graph or on its complement, respectively. The approach can be generalized so that for a fixed constant k ≥ 5 we obtain an O(n^k-1)-time algorithm for the detection of a hole (antihole resp.) on at least k vertices. Additionally, we describe a different approach which allows us to detect antiholes in graphs that do not contain chordless cycles on five vertices in O(n + m²) time requiring O(n + m) space. Again, for a fixed constant k ≥ 6, the approach can be extended to yield O(n^k-2)-time and O(n²)-space algorithms for detecting holes (antiholes resp.) on at least k vertices in graphs which do not contain holes (antiholes resp.) on k - 1 vertices. Our algorithms are simple and can be easily used in practice. Finally, we also show how our detection algorithms can be augmented so that they return a hole or an antihole whenever such a structure is detected in the input graph; the augmentation takes O(n + m) time and space. 相似文献

13.

QuickWalk:一个大规模场景交互漫游系统

高宇邓宝松李苏军吴玲达《小型微型计算机系统》2008,29(1):175-179

随着三维扫描与建模技术的飞速发展,三维场景的数据量急剧增大,无法一次性载入内存,而且难以进行交互绘制.研究开发了一个大规模外存场景的交互漫游系统QuickWalk,该系统集成了视点相关的层次细节技术、可见性剔除和场景数据的内外存调度,能够有效地减少运行时刻的内存需要,使CPU、GPU和I/O三者的效率得到充分发挥.实验表明,在保证场景绘制质量的前提下,该系统能够在目前普通的PC上实现大规模外存场景的实时显示和交互操纵. 相似文献

14.

Adaptive voxels: interactive rendering of massive 3D models

Fenglin Tian Wei Hua Zilong Dong Hujun Bao 《The Visual computer》2010,26(6-8):409-419

We present a novel approach for interactive rendering of massive 3D models. Our approach integrates adaptive sampling-based simplification, visibility culling, out-of-core data management and level-of-detail. We use a unified scene graph representation for acceleration techniques. In preprocessing, we subdivide large objects, and build a BVH clustering hierarchy. We make use of a novel adaptive sampling method to generate LOD models: AdaptiveVoxels. The AdaptiveVoxels reduces the preprocessing cost and our out-of-core rendering algorithm improves rendering efficiency. We have implemented our algorithm on a desktop PC. We can render massive CAD and isosurface models, consisting of hundreds of millions of triangles interactively with little loss in image quality. 相似文献

15.

Efficient implementation of point set reconstruction by multi-layer peeling algorithm

Lu Wang Tiow-Seng TanChi-Wan Lim Xiang-Xu MengCheng Du Zhen-fang Ji 《Computers & Graphics》2011,35(2):208-217

We present a memory and time efficient surface reconstruction algorithm for large sets of points, without normal information, that may not fit within the main memory. Our algorithm treats the input points as forming layers of surface patches, and then performs reconstruction by working on multiple layers of surface patches concurrently based on an out-of-core octree. Additionally, the memory usage of the algorithm can be pre-estimated through a few simple quantities in relation to the spatial coherence of the input points. Tests on the algorithm with large point sets verify that it produces good outputs too. 相似文献

16.

Algorithms for a Class of Isotonic Regression Problems 总被引：4，自引：0，他引：4

P. M. Pardalos G. Xue 《Algorithmica》1999,23(3):211-222

The isotonic regression problem has applications in statistics, operations research, and image processing. In this paper a general framework for the isotonic regression algorithm is proposed. Under this framework, we discuss the isotonic regression problem in the case where the directed graph specifying the order restriction is a directed tree with n vertices. A new algorithm is presented for this case, which can be regarded as a generalization of the PAV algorithm of Ayer et al. Using a simple tree structure such as the binomial heap, the algorithm can be implemented in O(n log n) time, improving the previously best known O(n ² ) time algorithm. We also present linear time algorithms for special cases where the directed graph is a path or a star. Received September 2, 1997; revised January 2, 1998, and February 16, 1998. 相似文献

17.

大规模三维地形可视化算法研究进展 总被引：11，自引：0，他引：11

张慧杰孙吉贵刘雪洁李泽海《计算机科学》2007,34(3):10-16

大规模地形可视化是大型户外环境模拟不可缺少的组成部分，也是近年来可视化领域的研究热点，在游戏、仿真、虚拟现实、地理信息系统等领域有着广泛的应用。本文重点讨论了国内外学者在该领域的研究方法和最新研究进展以及尚未解决的问题。从数据拟合和模型简化两个方面叙述了自适应地形可视化建模方法，根据对现代图形硬件是否友好，将地形模型简化算法归纳为面向CPU的细粒度LOD算法和面向GPU的粗粒度LOD算法两类，同时描述了建模过程中存在的空间不连续问题以及各种解决方案，详细阐述了支持大数据集绘制的out-of-core技术，最后总结并分析了地形可视化建模领域的发展趋势和今后的研究重点。相似文献

18.

大规模通信网络带宽优化分配问题研究

黄生叶叶梧冯穗力《计算机学报》2003,26(6):682-687

提出一种大规模通信网络带宽分配的新方法．作者将大系统理论中的分解—协调方法运用于解决大规模通信网络带宽分配的优化问题，大型网络的优化问题被分解成一些互相关联的小型子网的优化问题．整个带宽优化分配问题的解决分为三个阶段：分解、协调优化及合并优化．计算结果表明，与现有算法相比，分解—协调大规模带宽管理算法(DCLPBM)既保证了很高的计算精度，又降低了时间与空间复杂度．由于算法中用到的协调机制较简单，DCLPBM易于推广到分布式计算环境，从以网络来治理网络的角度看，它具有较广的前景．相似文献

19.

Distortion-Free Steganography for Polygonal Meshes

Alexander Bogomjakov Craig Gotsman Martin Isenburg 《Computer Graphics Forum》2008,27(2):637-642

We present a technique for steganography in polygonal meshes. Our method hides a message in the indexed rep‐resentation of a mesh by permuting the order in which faces and vertices are stored. The permutation is relative to a reference ordering that encoder and decoder derive from the mesh connectivity in a consistent manner. Our method is distortion‐free because it does not modify the geometry of the mesh. Compared to previous steganographic methods for polygonal meshes our capacity is up to an order of magnitude better. Our steganography algorithm is universal and can be used instead of the standard permutation steganography algorithm on arbitrary datasets. The standard algorithm runs in Ω (n² log² n log log n) time and achieves optimal O(nlog n) bit capacity on datasets with n elements. In contrast, our algorithm runs in O(n) time, achieves a capacity that is only one bit per element less than optimal, and is extremely simple to implement. 相似文献

20.

Synthesizing efficient out-of-core programs for block recursivealgorithms using block-cyclic data distributions

Zhiyong Li Reif J.H. Gupta S.K.S. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(3):297-315

In this paper, we present a framework for synthesizing I/O efficient out-of-core programs for block recursive algorithms, such as the fast Fourier transform (FFT) and block matrix transposition algorithms. Our framework uses an algebraic representation which is based on tensor products and other matrix operations. The programs are optimized for the striped Vitter and Shriver's two-level memory model in which data can be distributed using various cyclic(B) distributions in contrast to the normally used physical track distribution cyclic(B_d), where B_d is the physical disk block size. We first introduce tensor bases to capture the semantics of block-cyclic data distributions of out-of-core data and also data access patterns to out-of-core data. We then present program generation techniques for tensor products and matrix transposition. We accurately represent the number of parallel I/O operations required for the synthesized programs for tensor products and matrix transposition as a function of tensor bases and data distributions. We introduce an algorithm to determine the data distribution which optimizes the performance of the synthesized programs. Further, we formalize the procedure of synthesizing efficient out-of-core programs for tensor product formulas with various block-cyclic distributions as a dynamic programming problem. We demonstrate the effectiveness of our approach through several examples. We show that the choice of an appropriate data distribution can reduce the number of passes to access out-of-core data by as large as eight times for a tensor product and the dynamic programming approach can largely reduce the number of passes to access out-of-core data for the overall tensor product formulas 相似文献