首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The study and development of chip multi-processors (CMPs) are of utmost importance for the creation of future technologies. Devising a theoretical analysis of the micro-architecture model for the power/performance on CMPs is still a challenge. This paper addresses this problem by (1) introducing an analytical model for measuring the power and performance of a processor quantitatively, (2) analyzing the effects of resource division on power consumption and performance when executing a given benchmark, and (3) predicting the optimum number of cores to run the benchmark on. Our proposed analytically derived results show that in order to achieve power/performance gains, the optimum number of cores must be between 8 and 16.  相似文献   

2.
With Moore’s law supplying billions of transistors on-chip, embedded systems are undergoing a transition from single-core to multi-core to exploit this high transistor density for high performance. However, the optimal layout of these multiple cores along with the memory subsystem (caches and main memory) to satisfy power, area, and stringent real-time constraints is a challenging design endeavor. The short time-to-market constraint of embedded systems exacerbates this design challenge and necessitates the architectural modeling of embedded systems to reduce the time-to-market by expediting target applications to device/architecture mapping. In this paper, we present a queueing theoretic approach for modeling multi-core embedded systems that provides a quick and inexpensive performance evaluation both in terms of time and resources as compared to the development of multi-core simulators and running benchmarks on these simulators. We verify our queueing theoretic modeling approach by running SPLASH-2 benchmarks on the SuperESCalar simulator (SESC). Results reveal that our queueing theoretic model qualitatively evaluates multi-core architectures accurately with an average difference of 5.6% as compared to the architectures’ evaluations from the SESC simulator. Our modeling approach can be used for performance per watt and performance per unit area characterizations of multi-core embedded architectures, with varying number of processor cores and cache configurations, to provide a comparative analysis.  相似文献   

3.
多核处理器直接互连构建多路并行系统,一直是提高高性能计算机并行性的主要方式.主要研究多核处理器直连接口的QoS设计,通过直连接口完成跨芯片的Cache一致性报文有效、可靠传输,实现共享主存的SMP系统.详细阐述了直连接口各个协议层的QoS设计的关键技术,基于UVM方法学构建了可重用验证平台,模拟验证了QoS设计的正确性...  相似文献   

4.
5.
Nowadays, multi-core processor is the main technology used in desktop PCs, laptop computers and mobile hardware platforms. As the number of cores on a chip keeps increasing, it adds up the complexity and impacts more on both power and performance of a processor. In multi-processors, the number of cores and various parameters, such as issue-width, number of instructions and execution time, are key design factors to balance the amount of thread-level parallelism and instruction-level parallelism. In this paper, we perform a comprehensive simulation study that aims to find the optimum number of processor cores in desktop/laptop computing processor models with shallow pipeline depth. This paper also explores the trade-off between the number of cores and different parameters used in multi-processors in terms of power–performance gains and analyzes the impact of 3D stacking on the design of simultaneous multi-threading and chip multiprocessing. Our analysis shows that the optimum number of cores varies with different classes of workloads, namely: SPEC2000, SPEC2006 and MiBench. Simulation study is presented using architectures with shorter pipeline depth, showing that (1) the optimum number of cores for power–performance is 8, (2) the optimum number of threads in the range [2, 4], and (3) for beyond 32 cores, multi-core processors are no longer efficient in terms of performance benefits and overall power consumption.  相似文献   

6.
7.
The Journal of Supercomputing - Branch prediction is essential for improving the performance of pipeline processors. As the number of pipeline stages in modern processors increases, an accurate...  相似文献   

8.
In the last decade, many papers have been published to present sequential connected component labeling (CCL) algorithms. As modern processors are multi-core and tend to many cores, designing a CCL algorithm should address parallelism and multithreading. After a review of sequential CCL algorithms and a study of their variations, this paper presents the parallel version of the Light Speed Labeling for connected component analysis (CCA) and compares it to our parallelized implementations of State-of-the-Art sequential algorithms. We provide some benchmarks that help to figure out the intrinsic differences between these parallel algorithms. We show that thanks to its run-based processing, the LSL is intrinsically more efficient and faster than all pixel-based algorithms. We show also, that all the pixel-based are memory-bound on multi-socket machines and so are inefficient and do not scale, whereas LSL, thanks to its RLE compression can scale on such high-end machines. On a 4 × 15-core machine, and for 8192 × 8192 images, LSL outperforms its best competitor by a factor ×10.8 and achieves a throughput of 42.4 gigapixel labeled per second.  相似文献   

9.
In this paper, an approach is developed to improve the power efficiency of Bluetooth. The better efficiency is achieved by reducing the unnecessary polling operations in the Basic Rate/Enhanced Data Rate (BR/EDR) controllers. An analysis of the current low power modes in the Bluetooth BR/EDR controller indicates that their activation requires a critical and challenging parameter negotiation phase. These parameters have a wide range of choices and as a result the associated low power modes are typically ignored by the Bluetooth application developers. The new approach is based upon multiple polling intervals. It is shown that three different polling intervals: small, medium and large are sufficient for a broad range of data traffic scenarios. As the kernel idea, each controller runs a common algorithm to choose among the three polling intervals and adaptively switches link state between the active data transfer state and idle. The state-transition rules are derived, and a system model is established based on the Hidden Markov Model (HMM), which is used to analyze and design the new Bluetooth link state-transition algorithm. The simulation and analysis demonstrates significant power saving and relatively low average end-to-end packet delay for this state-transition based approach, in comparison to the conventional polling system and the low power sniff mode. Moreover, the state-transition approach enables easier parameter setting that can be further optimized for a specific Bluetooth scenario.  相似文献   

10.
Performance measurement can only help to identify the problems existing in the current supply chain, while it is helpless in exploring the root causes of these problems and thus choosing corresponding actions to improve supply chain performance. The conflict between the top-down strategy decomposition and the bottom-up implementation process is serious. Therefore, in order to overcome the above issues, it is very necessary to link strategic objectives to operations, which could help managers, especially those operating at a strategic level, to know more operational mechanism of supply chains. In this study, an integrated approach which employs analytic hierarchy process (AHP) and technique for order preference by similarity to ideal solution (TOPSIS) together is proposed for the linking strategic objectives to operations. Supply chain operations reference model is used to model the linkage of the strategic objectives and operational metrics in a hierarchical way. The AHP is used to analyze this metric hierarchy and determine weights of the metrics, and TOPSIS method is used to make a normalization of metric values having different units, so a comparison will be available. Proposed approach is applied to a problem of decision making process in a manufacturing company. Company managers found the application and results satisfactory and implementable in their decisions.  相似文献   

11.
12.
受限于功耗,十多年前通用微处理器就停止追求更高的主频转而向集成更多处理器核的方向发展;同时,随着晶体管密度按摩尔定律不断提高,单片可集成的处理器核数成倍增长,片上多核、众核处理器已成为高性能微处理器发展的主流。未来千核级通用众核处理器支持共享存储编程模型是一种必然趋势,但传统的Cache一致性目录结构面临着查找延迟高、目录项替换频繁以及硬件代价和功耗可扩展性有限等问题。稀疏目录实现了传统目录结构硬件开销与一致性维护效率的折衷,被认为是众核处理器维护Cache一致性的一种高能效、可扩展结构。综述了近年来提高稀疏目录性能的相关研究与方法,并对其在面积、访问延迟、功耗和实现复杂性等方面进行分析,归纳出这些方法各自的优点和存在的不足,对创新设计未来高性能众核处理器共享存储体系结构具有一定的参考价值。  相似文献   

13.
The probabilistic evaluation of composite power system reliability is an important but computationally intense task that requires the sampling/searching of a large search space. While multiple methods have been used for performing these computations, a remaining area of research is the impact that modern platforms for parallel computation may have on this computation. Studies have been performed in the past, but they have been primarily limited to cluster-based computing. In addition, the most recent works in this area have used outdated technology or been evaluated using smaller test systems. In the modern era, a wide variety of platforms are available for achieving parallelism in computation including options like multi-core processors, clusters, and accelerators. Each of these platforms provides unique opportunities for accelerating computation and exploiting scalability. In order to fill this gap in the research, this study implements and evaluates two methods of parallel computation—batch parallelism and pipeline parallelism—using a multi-core architecture in a cloud computing environment on Amazon Web Services using up to 36 virtual compute cores. Further, the methodologies are contrasted and compared in terms of computation time, speedup, efficiency, and scalability. Results are collected using IEEE reliability test systems, and speedups upwards of 5x are demonstrated across multiple test systems.  相似文献   

14.
通过对热噪声源与抑制技术手段的分析,以供电方式隔绝外界干扰,筛选器件并优化控制电路,实现多级低热噪声偏压输出。测试结果表明,输出噪声均方根值可控制在0.6μV以内,能满足某高灵敏度电子器件的测试工作需要。  相似文献   

15.
1 Introduction Container supply chains (CSCs), with many com- plex physical and information ?ows, have contributed themselves to economic prosperity and also rendered themselves uniquely vulnerable by many risks. In the past decade, some specific events closely related to the risks include the Kobe earthquake which a?ected sup- ply chains across the globe in 1995; the Asian economic crisis in 1997; the Y2K-related IT problems at the end of the 20th century; the fuel protest of September 20…  相似文献   

16.
开关电源中功率MOSFET管损坏模式及分析   总被引:2,自引:0,他引:2  
结合功率MOSFET管不同的失效形态,论述了功率MOSFET管分别在过电流和过电压条件下损坏的模式,并说明了产生这样的损坏形态的原因,也分析了功率MOSFET管在关断及开通过程中发生失效形态的差别,从而为失效在关断或在开通过程中发生损坏提供了判断依据。给出了测试过电流和过电压的电路图。同时分析了功率MOSFET管在动态老化测试中慢速开通、在电池保护电路应用中慢速关断及较长时间工作在线性区时损坏的形态。最后,结合实际应用,论述了功率MOSFET通常会产生过电流和过电压二种混合损坏方式损坏机理和过程。  相似文献   

17.
Various statistical models have been constructed for analyzing the workload variables of a computer system, but most of these models fail to analyze each variable separately and identify job groups by hardware consumption patterns. In this paper we propose a compumetrical approach to analyze the computer system performance variables and to cluster the jobs into homogeneous groups. It involves using univariable and multivariable analysis and graphical methods for analyzing the variables. This approach enables us to explore data thoroughly, to look for patterns and clusters, to confirm or disprove the expected hardware consumption, and to discover new phenomena.  相似文献   

18.
As configurable processing advances, elements from the traditional approaches of both hardware and software development can be combined by incorporating customized, application-specific computational resources into the processor’s architecture, especially in the case of field-programmable-gate-array-based systems with soft-processors, so as to enhance the performance of embedded applications. This paper explores the use of several different microarchitectural alternatives to increase the performance of edge detection algorithms, which are of fundamental importance for the analysis of DNA microarray images. Optimized application-specific hardware modules are combined with efficient parallelized software in an embedded soft-core-based multi-processor. It is demonstrated that the performance of one common edge detection algorithm, namely Sobel, can be boosted remarkably. By exploiting the architectural extensions offered by the soft-processor, in conjunction with the execution of carefully selected application-specific instruction-set extensions on a custom-made accelerating co-processor connected to the processor core, we introduce a new approach that makes this methodology noticeably more efficient across various applications from the same domain, which are often similar in structure. With flexibility to update the processing algorithms, an improvement reaching one order of magnitude over all-software solutions could be obtained. In support of this flexibility, an effective adaptation of this approach is demonstrated which performs real-time analysis of extracted microarray data; the proposed reconfigurable multi-core prototype has been exploited with minor changes to achieve almost 5× speedup.  相似文献   

19.
反激式开关电源的环路分析与设计   总被引:1,自引:1,他引:0  
设计了一款反激式开关电源,依据理论计算出补偿器参数,通过实验调试证明计算参数能够使环路稳定,并接近于优化参数。  相似文献   

20.
研究了传统脉宽调制(PWM)控制开关变换器中一个重要现象:闭环调节器的输出信号与锯齿波比较信号发生多次截交导致开关频率升高且不能获得恒定控制频率,甚至系统不能稳定输出工作.以常见的Buck、Boost开关变换器设计为例,研究了基于PWM-准滑模控制理论的开关变换器大信号稳定性条件,最终所得结论与经典“斜波匹配”理论相吻合.仿真结果验证了所提出理论的正确性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号