期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

全文获取类型

收费全文	261篇
免费	27篇
国内免费	41篇

学科分类

工业技术

329篇

出版年

2022年	4篇
2021年	3篇
2020年	5篇
2019年	3篇
2018年	6篇
2017年	11篇
2016年	11篇
2015年	22篇
2014年	30篇
2013年	22篇
2012年	25篇
2011年	44篇
2010年	31篇
2009年	24篇
2008年	14篇
2007年	21篇
2006年	13篇
2005年	16篇
2004年	10篇
2003年	6篇
2002年	5篇
2001年	1篇
2000年	2篇

排序方式： 共有329条查询结果，搜索用时 15 毫秒

1 [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] 下一页 » 末页»

基于 MPI ＋OpenMP 的红外弱小目标检测并行计算

贺维维吴京曾瑶源《计算机与现代化》2014,(7):53-57

为有效监控红外弱小目标运动的全过程，必须采用多个波段同时探测，但是多波段探测必然带来计算时间的大幅增长，无法满足实际应用中对目标检测实时性的要求。针对这一问题，本文提出一种基于MPI＋OpenMP的层次化并行方法，充分利用消息传递模型和共享存储模型的优势，并基于多处理器节点集群进行测试。实验结果表明，该并行程序在保证相同的检测概率的情况下加速比达到8．61，极大地提高了目标检测的效率。相似文献

Huge-scale molecular dynamics simulation of multibubble nuclei

Hiroshi Watanabe Masaru Suzuki Nobuyasu Ito 《Computer Physics Communications》2013

We have developed molecular dynamics codes for a short-range interaction potential that adopt both the flat-MPI and MPI/OpenMP hybrid parallelizations on the basis of a full domain decomposition strategy. Benchmark simulations involving up to 38.4 billion Lennard-Jones particles were performed on Fujitsu PRIMEHPC FX10, consisting of 4800 SPARC64 IXfx 1.848 GHz processors, at the Information Technology Center of the University of Tokyo, and a performance of 193 teraflops was achieved, which corresponds to a 17.0% execution efficiency. Cavitation processes were also simulated on PRIMEHPC FX10 and SGI Altix ICE 8400EX at the Institute of Solid State Physics of the University of Tokyo, which involved 1.45 billion and 22.9 million particles, respectively. Ostwald-like ripening was observed after the multibubble nuclei. Our results demonstrate that direct simulations of multiscale phenomena involving phase transitions from the atomic scale are possible and that the molecular dynamics method is a promising method that can be applied to petascale computers. 相似文献

基于OpenMP的JPEG2000并行解码算法的实现 总被引：1，自引：1，他引：0

吴昊邓家先黄艳《通信技术》2011,44(4):10-12,15

为了提高JPEG2000的解码速度,在多核处理器平台上利用OpenMP（Open specifications for Multi Processing）实现了JPEG2000的高速并行解码。即利用OpenMP对JPEG2000解码过程中的T1解码器和离散小波逆变换进行多路并行解码,减少了这两部分的运行时间,从而降低JPEG2000的整体解码时间。实验结果表明,OpenMP是一种简单而有效的并行化编程工具,在保证解码图像质量不变的前提下,相对单线程串行算法,所提出的并行解码算法,解码速度有显著提高。相似文献

Compiling Vector Pascal to the XeonPhi

Mozhgan Chimeh Paul Cockshott Susanne B. Oehler Ashkan Tousimojarad Tian Xu 《Concurrency and Computation》2015,27(17):5060-5075

Intel's XeonPhi is a highly parallel x86 architecture chip made by Intel. It has a number of novel features which make it a particularly challenging target for the compiler writer. This paper describes the techniques used to port the Glasgow Vector Pascal Compiler to this architecture and assess its performance by comparisons of the XeonPhi with 3 other machines running the same algorithms. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

Low‐level PGAS computing on many‐core processors with TSHMEM

Bryant C. Lam Alan D. George Herman Lam Vikas Aggarwal 《Concurrency and Computation》2015,27(17):5288-5310

Diminishing returns from increased clock frequencies and instruction‐level parallelism have forced computer architects to adopt architectures that exploit wider parallelism through multiple processor cores. While emerging many‐core architectures have progressed at a remarkable rate, concerns arise regarding the performance and productivity of numerous parallel‐programming tools for application development. Development of parallel applications on many‐core processors often requires developers to familiarize themselves with unique characteristics of a target platform while attempting to maximize performance and maintain correctness of their applications. The family of partitioned global address space (PGAS) programming models comprises the current state of the art in balancing performance and programmability. One such PGAS approach is SHMEM, a lightweight, shared‐memory programming library that has demonstrated high performance and productivity potential for parallel‐computing systems with distributed‐memory architectures. In the paper, we present research, design, and analysis of a new SHMEM infrastructure specifically crafted for low‐level PGAS on modern and emerging many‐core processors featuring dozens of cores and more. Our approach (with a new library known as TSHMEM) is investigated and evaluated atop two generations of Tilera architectures, which are among the most sophisticated and scalable many‐core processors to date, and is intended to enable similar libraries atop other architectures now emerging. In developing TSHMEM, we explore design decisions and their impact on parallel performance for the Tilera TILE‐Gx and TILEPro many‐core architectures, and then evaluate the designs and algorithms within TSHMEM through microbenchmarking and applications studies with other communication libraries. Our results with barrier primitives provided by the Tilera libraries show dissimilar performance between the TILE‐Gx and TILEPro; therefore, TSHMEM's barrier design takes an alternative approach and leverages the on‐chip mesh network to provide consistent low‐latency performance. In addition, our experiments with TSHMEM show that naive collective algorithms consistently outperformed linear distributed collective algorithms when executed in an SMP‐centric environment. In leveraging these insights for the design of TSHMEM, our approach outperforms the OpenSHMEM reference implementation, achieves similar to positive performance over OpenMP and OSHMPI atop MPICH, and supports similar libraries in delivering high‐performance parallel computing to emerging many‐core systems. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

A comparative performance study of common and popular task‐centric programming frameworks

Artur Podobas Mats Brorsson Karl‐Filip Faxn 《Concurrency and Computation》2015,27(1):1-28

Programmers today face a bewildering array of parallel programming models and tools, making it difficult to choose an appropriate one for each application. An increasingly popular programming model supporting structured parallel programming patterns in a portable and composable manner is the task‐centric programming model. In this study, we compare several popular task‐centric programming frameworks, including Cilk Plus, Threading Building Blocks, and various implementations of OpenMP 3.0. We have analyzed their performance on the Barcelona OpenMP Tasking Suite benchmark suite both on a 48‐core AMD Opteron 6172 server and a 64‐core TILEPro64 embedded many‐core processor. Our results show that the OpenMP offers the highest flexibility for programmers, and this flexibility comes to a cost. Frameworks supporting only a specific and more restrictive model, such as Cilk Plus and Threading Building Blocks, are generally more efficient both in terms of performance and energy consumption. However, Intel's implementation of OpenMP tasks performs the best and closest to the specialized run‐time systems. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

基于OpenMP编程模型的多线程程序性能分析

李梅《电子设计工程》2014,(23):42-44

并行化程序的出现大大提高了应用程序的执行效率,多核程序设计时需要对程序的性能进行考虑。本文重点讨论OpenMP编程模型中多核多线程程序在并行化开销、负载均衡、线程同步开销方面对程序性能的影响。相似文献

CLOMP: Accurately Characterizing OpenMP Application Overheads

Greg Bronevetsky John Gyllenhaal Bronis R. de Supinski 《International journal of parallel programming》2009,37(3):250-265

Despite its ease of use, OpenMP has failed to gain widespread use on large scale systems, largely due to its failure to deliver sufficient performance. Our experience indicates that the cost of initiating OpenMP regions is simply too high for the desired OpenMP usage scenario of many applications. In this paper, we introduce CLOMP, a new benchmark to characterize this aspect of OpenMP implementations accurately. CLOMP complements the existing EPCC benchmark suite to provide simple, easy to understand measurements of OpenMP overheads in the context of application usage scenarios. Our results for several OpenMP implementations demonstrate that CLOMP identifies the amount of work required to compensate for the overheads observed with EPCC. We also show that CLOMP also captures limitations for OpenMP parallelization on SMT and NUMA systems. Finally, CLOMPI, our MPI extension of CLOMP, demonstrates which aspects of OpenMP interact poorly with MPI when MPI helper threads cannot run on the NIC. 相似文献

OpenMDSP: Extending OpenMP to Program Multi-Core DSPs

下载免费PDF全文

何江舟陈文光陈光日郑纬民汤志忠叶寒栋《计算机科学技术学报》2014,29(2):316-331

Abstract Multi-core digital signal processors （DSPs） are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, efficient multi-core DSP applications are very difficult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits （SDKs）, which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1） data placement directives that allow programmers to control the placement of global variables conveniently, 2） distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3） stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access （DMA） of a DSP. We implement the compiler and runtime system for OpenMDSP on PreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads. 相似文献

10.

基于P2P的保护定值在线校核混合并行算法

刘高明宋玮仇向东《南方电网技术》2014,8(2):60-64

考虑到大型互联电网规模的逐渐扩大,尤其是"三华"特高压同步电网的顺利投运,传统的集中式计算会遇到硬件计算能力的瓶颈,提出了一种基于P2P的保护定值在线校核混合并行算法。充分利用P2P网络技术的对等通信,实现区域间信息的对等交互,重点介绍了MPI+OpenMP的混合并行编程模型以及算法的设计,对在线校核进行了并行性分析,实现了在线校核进程级与线程级的两级并行。最后,在基于P2P技术的分布式并行计算平台上,对混合并行算法进行了测试比较,结果表明所提出的算法正确且有效。相似文献

1 [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] 下一页 » 末页»