期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

文敏华陈江胡广超韦建文王一超林新华《计算机工程与科学》2022,44(9):1550-1556

在科学计算领域,数据规模随着数值模拟精度要求的提高而快速增长,以DRAM为主存的传统方案由于成本高而难以扩展容量,近年来越来越被关注的持久内存技术有望解决这一问题。持久内存是在DRAM和SSD之间的补充,相比DRAM,持久内存具有容量大、性价比高的优点,但是性能也相对较低。为测试持久内存的应用性能,面向科学计算的一个重要领域——计算流体力学(CFD),对Intel持久内存进行性能评估。实验中,持久内存采用了最易于使用的内存模式,源码不需要任何修改,测试程序涵盖内存基准测试和3种常见的CFD算法,实验结果表明,在内存模式下,对不同CFD算法,相比纯DRAM的配置,持久内存的引入会带来一定的性能损失,且该损失随数据规模的增加而增大;另一方面,持久内存的部署使单服务器能支撑超大数据规模的数值模拟。相似文献

2.

嵌入式应用环境下的Cache性能分析

左琦付宇卓程秀兰黄洋《计算机工程》2006,32(1):237-239,275

为了提高性能，通用处理器中所广泛采用的cache技术被引入到了嵌入式处理器中。该文采用基于仿真的方法分析了嵌入式应用环境下几个主要的cache结构参数对cache性能的影响。在分析过程中，还考虑了不同主存实现方式带来的影响。相似文献

3.

Boolean算法的空间改进

陈娴《广东电脑与电讯》2009,(2)

本文针对Boolean算法在海量挖掘情况下的内存占用进行了改进,利用稀疏矩阵和界标来加速和节省内存空间.实验表明同等规模情况下改进的方法挖掘得到的结论与Boolean算法相同,而在数据项数达到100MB的情况下,常规Boolean算法已经因为内存耗尽而无法运行,本文的方法仍然能够正常运行. 相似文献

4.

面向嵌入式应用的数据并行语言设计

下载免费PDF全文

史英超张发存段敬红《计算机工程与应用》2011,47(4):61-63

根据基于PIM(Processor-In-Memory)技术的数据并行计算机体系结构的特点和面向多媒体计算的应用需求,提出了面向嵌入式SIMD(Single Instruction Multiple Data)计算的数据并行语言PIMC。简单讨论了PIMC语言的形式化定义,并以数据并行图像处理的均值滤波算法为例对语言的使用作了说明。结合其他大量的数据并行编程实例,说明了该语言能够在基于PIM技术的SIMD并行计算机上正确描述基本多媒体处理算法的数据并行实现。相似文献

5.

面向大数据的内存数据管理研究现状与展望

嵇智源潘巍《计算机工程与设计》2014,(10)

对面向大数据的内存数据管理技术的相关研究进行综述。梳理大数据环境下数据管理技术发展的脉络和格局的变化;分析新环境下的内存数据管理技术面临的发展机遇与研究挑战;介绍相关的前沿研究,其中包括分布式编程模型、混合存储体系结构、内存数据管理等;给出技术和管理上的发展展望。相似文献

6.

面向延迟敏感型物联网应用的计算迁移策略

郭棉李绮琦《计算机应用》2019,39(12):3590-3596

针对云计算网络延迟较长、能耗过高和边缘服务器计算资源有限的问题,提出了一种提高延迟敏感型物联网（IoT）应用服务质量（QoS）的边缘-云合作的漂移加惩罚计算迁移策略（DPCO）。首先,建立物联网-边缘-云系统模型,对业务模式、计算任务所经历的传输延迟和计算延迟、系统产生的计算能耗和传输能耗等进行数学建模;然后,以系统能耗和任务平均延迟为优化目标,以边缘服务器的队列稳定性为限制条件构建边缘-云合作的计算迁移优化模型;接着,以优化目标为惩罚函数,基于李雅普诺夫稳定性理论推导出计算迁移优化模型的漂移加惩罚函数特性。最后,基于推导结果提出了DPCO计算迁移算法,通过每时隙选择使当前漂移加惩罚函数最小化的计算迁移策略来降低长期的单位时间能耗和缩短系统平均延迟。与轻流雾处理（LFP）、基准边缘计算（EC）、基准云计算（CC）策略相比,DPCO的系统能耗最低,约是CC策略的2/3;任务平均延迟也最小,可减少为CC的1/5。实验结果表明,DPCO能够有效降低边缘-云计算系统的能量消耗,减少计算任务的端到端延迟,满足延迟敏感型IoT应用的QoS要求。相似文献

7.

面向大数据处理的基于Spark的异质内存编程框架

王晨曦吕方崔慧敏曹婷 John Zigman 庄良吉冯晓兵《计算机研究与发展》2018,55(2):246-264

随着大数据应用的发展,需要处理的数据量急剧增长,企业为了保证数据的及时处理并快速响应客户,正在广泛部署以Apache Spark为代表的内存计算系统.然而TB级别的内存不但造成了服务器成本的上升,也促进了功耗的增长.由于DRAM的功耗、容量密度受限于工艺瓶颈,无法满足内存计算快速增长的内存需求,因此研发人员将目光逐渐移向了新型的非易失性内存(non-volatile memory, NVM).由DRAM和NVM共同构成的异质内存,具有低成本、低功耗、高容量密度等特点,但由于NVM读写性能较差,如何合理布局数据到异质内存是一个关键的研究问题.系统分析了Spark应用的访存特征,并结合OpenJDK的内存使用特点,提出了一套管理数据在DRAM和NVM之间布局的编程框架.应用开发者通过对本文提供接口的简单调用,便可将数据合理布局在异质内存之中.仅需20%~25%的DRAM和大量的NVM,便可以达到使用等量的DRAM时90%左右的性能.该框架可以通过有效利用异质内存来满足内存计算不断增长的计算规模.同时,“性能/价格”比仅用DRAM时提高了数倍. 相似文献

8.

有限体积格子Boltzmann方法的算法改进及性能分析

武频曹啸鹏尚伟烈郑德群高升《计算机应用研究》2012,29(10):3706-3709

有限体积格子Boltzmann方法(LBM)能够将标准LBM的应用范围扩展到非结构网格,但是比起标准的LBM这个方法需要更多的内存用量和计算量。针对此问题采用了优化计算顺序、简化计算方程的方法对有限体积LBM算法进行改进。科学的分析和实验的结果表明,改进后的算法能够在不增加计算量的基础上减少内存用量,在一些情况下还可以大量减少计算时间。相似文献

9.

面向移动计算环境的连接查询处理模式

《计算机应用》2007,27(11)

相似文献

10.

滚动式密钥新型加密算法的分析与设计

尹晓霈付志娟矫立新《微型机与应用》2009,28(8)

从整体角度给出了IC卡信息加密的安全体系结构,对IC卡安全体系结构采用的加密技术进行了全面分析与研究,用标准算法DES和KEELOQ设计了一种更安全的、用于IC卡的混合加密技术,并对加密技术给予了软件实现,为研究和实施IC卡提供了一个更完整、更安全的解决方案. 相似文献

11.

Joint‐analysis of performance and energy consumption when enabling cloud elasticity for synchronous HPC applications

Rodrigo da Rosa Righi Cristiano Andr da Costa Vinicius Facco Rodrigues Gustavo Rostirolla 《Concurrency and Computation》2016,28(5):1548-1571

A key characteristic of cloud computing is elasticity, automatically adjusting system resources to an application's workload. Both reactive and horizontal approaches represent traditional means to offer this capability, in which rule‐condition‐action statements and upper and lower thresholds occur to instantiate or consolidate compute nodes and virtual machines. Although elasticity can be beneficial for many HPC (high‐performance computing) scenarios, it also imposes significant challenges in the development of applications. In addition to issues related to how we can incorporate this new feature in such applications, there is a problem associated with the performance and resource pair and, consequently, with energy consumption. Further exploring this last difficulty, we must be capable of analyzing elasticity effectiveness as a function of employed thresholds with clear metrics to compare elastic and non‐elastic executions properly. In this context, this article explores elasticity metrics in two ways: (i) the use of a cost function that combines application time with different energy models; (ii) the extension of speedup and efficiency metrics, commonly used to evaluate parallel systems, to cover cloud elasticity. To accomplish (i) and (ii), we developed an elasticity model known as AutoElastic, which reorganizes resources automatically across synchronous parallel applications. The results, obtained with the AutoElastic prototype using the OpenNebula middleware, are encouraging. Considering a CPU‐bound application, an upper threshold close to 70% was the best option for obtaining good performance with a non‐prohibitive elasticity cost. In addition, the value of 90% for this threshold was the best option when we plan an efficiency‐driven execution. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

12.

Lightweight authenticated encryption for embedded on-chip systems

《Information Security Journal: A Global Perspective》2013,22(4-6):151-161

ABSTRACT

Embedded systems are routinely deployed in critical infrastructures nowadays, therefore their security is increasingly important. This, combined with the pressing requirement of deploying massive numbers of low-cost and low-energy embedded devices, stimulates the evolution of lightweight cryptography and other green-computing security mechanisms. New crypto-primitives are being proposed that offer moderate security and produce compact implementations. In this article, we present a lightweight authenticated encryption scheme based on the integrated hardware implementation of the lightweight block cipher PRESENT and the lightweight hash function SPONGENT. The presented combination of a cipher and a hash function is appropriate for implementing authenticated encryption schemes that are commonly utilized in one-way and mutual authentication protocols. We exploit their inner structure to discover hardware elements usable by both primitives, thus reducing the circuit’s size. The integrated versions demonstrate a 27% reduction in hardware area compared to the simple combination of the two primitives. The resulting solution is ported on a field-programmable gate array (FPGA) and a complete security application with input/output from a universal asynchronous receiver/transmitter (UART) gate is created. In comparison with similar implementations in hardware and software, the proposed scheme represents a better overall status. 相似文献

13.

Speed optimal FPGA implementation of the encryption algorithms for telecom applications

《Microprocessors and Microsystems》2020

The last two decades have seen a revolution in telecom technology with the evolution of three wireless mobile communication standards, namely, GPRS to 3G, 3G to 4G, and 4G to 5G. 5G offers faster download speeds and enables high connectivity between devices such as mobile phones, displays, smart homes, and smart cars because of its high reliability and high bandwidths (up to 10 Gbps). However, at the same time, data and personal information are also more susceptible to theft because of the high connectivity. Such threats can be addressed using electronic data encryption using the advanced encryption standard (AES). Because of their reconfigurable and parallel architectures, Field-Programmable Gate Arrays (FPGAs) are getting popular in VLSI design flows to enable the pre-silicon validation of designs faster data rates in real-time. FPGAs also serve as platforms for software development in the pre-silicon environment owing to their faster speeds. The design community is also heavily relying on High-Level Synthesis (HLS) tools in VLSI design flows. HLS platforms enable the new designs to improve the process with sustained authentication between two analytical selections from conventional functional specifications. We propose a high-throughput FPGA implementation based on high-level Synthesis for the AES algorithm. The implementation uses a 128-bit key and is highly suited for telecom applications such as 5G. Researchers have developed and tested the setup and then used the Vivado HLS tool to evaluate various HLS guidelines as per the implementation. The generated Verilog RTL was verified and implemented on Xilinx Kintex 7 and Virtex 6 FPGAs. Since using the same resources, we have seen significant results than existing methods achieved by individual investigators. We have also verified the design for functionality by checking the ciphertext output from our design against a reference design output for the same input plaintext. 相似文献

14.

Model-driven monitoring support for the multi-view performance analysis of parallel embedded applications 总被引：1，自引：0，他引：1

J. Reference to Garcí a J. Reference to Entrialgo F. J. Reference to Su rez D. F. Reference to Garcí a 《Performance Evaluation》2000,39(1-4):81-98

This paper describes an approach to carry out performance analysis of parallel embedded applications. The approach is based on measurement, but in addition, the idea of driving the measurement process (application instrumentation and monitoring) by a behavioral model is introduced. Using this model, highly comprehensible performance information can be collected. The whole approach is based on this behavioral model, one instrumentation method and two tools, one for monitoring and the other for visualization and analysis. Each of these is briefly described, and the steps to carry out performance analysis using them are clearly defined. They are explained by means of a case study. Finally, one method to evaluate the intrusiveness of the monitoring approach is proposed, and the intrusiveness results for the case study are presented. 相似文献

15.

EJVM: an economic Java run‐time environment for embedded devices

Da‐Wei Chang Ruei‐Chuan Chang 《Software》2001,31(2):129-146

As network‐enabled embedded devices and Java grow in their popularity, embedded system researchers start seeking ways to make these devices Java‐enabled. However, it is a challenge to apply Java technology to these devices due to their shortage of resources. In this paper, we propose EJVM (Economic Java Virtual Machine), an economic way to run Java programs on network‐enabled and resource‐limited embedded devices. Espousing the architecture proposed by distributed JVM, we store all Java codes on the server to reduce the storage needs of the client devices. In addition, we use two novel techniques to reduce the client‐side memory footprints: server‐side class representation conversion and on‐demand bytecode loading. Finally, we maintain client‐side caches and provide performance evaluation on different caching policies. We implement EJVM by modifying a freely available JVM implementation, Kaffe. From the experiment results, we show that EJVM can reduce Java heap requirements by about 20–50% and achieve 90% of the original performance. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

16.

Simulation of SLA-based VM-scaling algorithms for cloud-distributed applications

《Future Generation Computer Systems》2016

Cloud Computing has evolved to become an enabler for delivering access to large scale distributed applications running on managed network-connected computing systems. This makes possible hosting Distributed Enterprise Information Systems (dEISs) in cloud environments, while enforcing strict performance and quality of service requirements, defined using Service Level Agreements (SLAs). SLAs define the performance boundaries of distributed applications, and are enforced by a cloud management system (CMS) dynamically allocating the available computing resources to the cloud services. We present two novel VM-scaling algorithms focused on dEIS systems, which optimally detect most appropriate scaling conditions using performance-models of distributed applications derived from constant-workload benchmarks, together with SLA-specified performance constraints. We simulate the VM-scaling algorithms in a cloud simulator and compare against trace-based performance models of dEISs. We compare a total of three SLA-based VM-scaling algorithms (one using prediction mechanisms) based on a real-world application scenario involving a large variable number of users. Our results show that it is beneficial to use autoregressive predictive SLA-driven scaling algorithms in cloud management systems for guaranteeing performance invariants of distributed cloud applications, as opposed to using only reactive SLA-based VM-scaling algorithms. 相似文献

17.

An analysis of definition and placement of virtual machines for high performance applications on Clouds

Giacomo Mc Evoy Antonio R. Mury Bruno Schulze 《Concurrency and Computation》2015,27(7):1789-1814

相似文献

18.

A fast and effective ellipse detector for embedded vision applications

Michele Fornaciari Andrea Prati Rita Cucchiara 《Pattern recognition》2014

Several papers addressed ellipse detection as a first step for several computer vision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware resources. This paper presents a novel algorithm for fast and effective ellipse detection and demonstrates its superior speed performance on large and challenging datasets. The proposed algorithm relies on an innovative selection strategy of arcs which are candidate to form ellipses and on the use of Hough transform to estimate parameters in a decomposed space. The final aim of this solution is to represent a building block for new generation of smart-phone applications which need fast and accurate ellipse detection also with limited computational resources. 相似文献