首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
提出了一种基于多核DSP互联架构的SAR成像处理方案。首先,介绍了一种基于方位子块插值的PFA实时成像算法。其次,研究了TI多核DSP TMS320C6678的处理性能,介绍了一种典型的RapidIO互联架构,并进一步提出基于该架构的SAR成像处理方案。最后,通过给出SAR成像结果并对比传统解决方案,证明了该处理方案的有效性和先进性。  相似文献   

2.
An algorithm with guaranteed convergence for finding a zero of a function   总被引:3,自引:0,他引:3  
Brent  R. P. 《Computer Journal》1971,14(4):422-425
  相似文献   

3.
Programming heterogeneous multiprocessor architectures combining multiple processor cores and hardware accelerators is a real challenge. Computer-aided design and development tools try to reduce the large design space by simplifying hardware-software mapping mechanisms. However, energy consumption is not well supported in most of design space exploration methodologies due to the difficulty to estimate energy consumption fast and accurately. To this aim, this paper proposes and validates an exploration method for partitioning tilling-based parallel applications on software cores and hardware accelerators under energy-efficiency constraints. The methodology is based on energy and performance measurement of a tiny subset of the design space and an analytical formulation of the performance and energy of an application kernel mapped onto a heterogeneous architecture. This closed-form expression is captured and solved using Mixed Integer Linear Programming, which allows for very fast exploration and results in the best hardware and software partitioning under energy constraint. The approach is validated on two application kernels using a Zynq-based architecture showing more than 12% acceleration speed-up and energy saving compared to standard approaches. Results also show that the most energy-efficient solution is application- and platform-dependent and moreover hardly predictable, which highlights the need for fast exploration tools as in this paper.  相似文献   

4.
Several mesh-like coarse-grained reconfigurable architectures have been devised in the last few years accompanied with their corresponding mapping flows. One of the major bottlenecks in mapping algorithms on these architectures is the limited memory access bandwidth. Only a few mapping methodologies encountered the problem of the limited bandwidth while none has explored how the performance improvements are affected, from the architectural characteristics. We study in this paper the impact that the architectural parameters have on performance speedups achieved when the PEs’ local RAMs are used for storing the variables with data reuse opportunities. The data reuse values are transferred in the internal interconnection network instead of being fetched, from external memories, in order to reduce the data transfer burden on the bus network. A novel mapping algorithm is also proposed that uses a list scheduling technique. The experimental results quantified the trade-offs that exist between the performance improvements and the memory access latency, the interconnection network and the processing element’s local RAM size. For this reason, our mapping methodology targets on a flexible architecture template, which permits such an exploration. More specifically, the experiments showed that the improvements increase with the memory access latency, while a richer interconnection topology can improve the operation parallelism by a factor of 1.4 on average. Finally, for the considered set of benchmarks, the operation parallelism has been improved from 8.6% to 85.1% from the application of our methodology, and by having each PE’s Local RAM a size of 8 words.
Costas E. GoutisEmail:
  相似文献   

5.
6.
Chip Multiprocessors (CMP) have emerged during last decades as a very attractive solution in using the ever-increasing on-chip transistor count. However, classical parallelization techniques failed to fully exploit parallelization from existing sequential applications due to false data dependencies. This paper focuses on the Thread-level Speculation (TLS) technique, an alternative way to exploit the transistor budget in a CMP. With TLS, even possibly data dependent threads can run in parallel as long as the semantics of the sequential execution is preserved. A special hardware support monitors the actual data dependencies between threads at run time and, if they are violated, misspeculation effects are undone usually through replay. This kind of system is known as speculative CMP. However, the TLS mechanism requires complex protocols that integrate cache coherence and speculation to maintain program order among multiple versions of data. Current TLS protocol evaluations are usually inadequate because they are not done low-level enough. A realistic evaluation of speculative CMPs requires either to be performed on a real hardware or very detailed cycle-accurate simulator models.In this paper we are particularly focused on a low-level evaluation of the write-invalidate TLS protocol Speculation Integrated with Snoopy Coherence (SISC) protocol proposed in [1]. This evaluation relies on cycle-level simulation environment with detailed cycle-level cache memories, cache controller and system bus. On top of this, a speculative four core architecture is simulated and three new modules (Scheduler, Squash Arbiter and Supplier Arbiter) are provided to support low-level implementation of the SISC protocol. The overall cost of the SISC protocol is evaluated by means of CACTI tool for the three different domains: the access latency cost, the area cost, and the power cost. The evaluation goal was to keep the cache access time to remain below cycle latency as well as the area and power overheads below an acceptable budget overhead. The SISC protocol has been compared against regular MESI-based architecture in both 32-bit and 64-bit versions. We kept the cache access time below the cycle latency, and we managed to keep both data cache area and static power overheads respectively below 32% and 35%.  相似文献   

7.
A dynamic control model for processing computationally intensive tasks with unknown execution times that allow for arbitrary data parallelization in the heterogeneous computing system is considered. The obtained guaranteed estimates are used to form a scheduling rule such that problems with minimum proper processing time is solved first.  相似文献   

8.
张光辉  王耀南 《计算机应用》2014,34(10):3059-3064
为了获得性能优越、实用性强的空间机械臂控制软件,提出了一种C/S结构下基于多线程和循环队列的空间机械臂控制系统软件架构,并详细介绍了各线程及队列的实现过程。在分析空间机械臂控制软件的特点和功能需求的基础上,按照横向分块、纵向分层的原则,将机械臂控制软件的各项功能合理分配到四个并行线程中,借助两个循环队列构建缓存机制,以提高控制系统的数据处理能力并减少不必要的等待时间。四个线程及两个循环队列之间相互通信,协同工作。实验结果表明,该架构能够以较小的控制延迟实现机械臂的运动控制,架构性能满足实际控制需求,证明了方案的有效性和可行性。  相似文献   

9.
The purpose of this paper is to study the determination of stability regions for discrete-time linear systems with saturating controls through anti-windup schemes. Considering that a linear dynamic output feedback has been designed to stabilize the linear discrete-time system (without saturation), a method is proposed for designing an anti-windup gain that maximizes an estimate of the basin of attraction of the closed-loop system in the presence of saturation. It is shown that the closed-loop system obtained from the controller plus the anti-windup gain can be locally modeled by a linear system with a deadzone nonlinearity. Then, based on the use of a new sector condition and quadratic Lyapunov functions, stability conditions in an LMI form are stated. These conditions are then considered in a convex optimization problem in order to compute an anti-windup gain that maximizes an estimate of the basin of attraction of the closed-loop system. Moreover, considering asymptotically stable open-loop systems, it is shown that the conditions can be slightly modified in order to determine an anti-windup gain that ensures global stability. An extension of the proposed results to the case of dynamic anti-windup synthesis is also presented in the paper.  相似文献   

10.
Flight controllers for micro-air UAVs are generally designed using proportional-integral-derivative (PID) methods, where the tuning of gains is difficult and time-consuming, and performance is not guaranteed. In this paper, we develop a rigorous method based on the sliding mode analysis and nonlinear backstepping to design a PID controller with guaranteed performance. This technique provides the structure and gains for the PID controller, such that a robust and fast response of the UAV (unmanned aerial vehicle) for trajectory tracking is achieved. First, the second-order sliding variable errors are used in a rigorous nonlinear backstepping design to obtain guaranteed performance for the nonlinear UAV dynamics. Then, using a small angle approximation and rigorous geometric manipulations, this nonlinear design is converted into a PID controller whose structure is naturally determined through the backstepping procedure. PID gains that guarantee robust UAV performance are finally computed from the sliding mode gains and from stabilizing gains for tracking error dynamics. We prove that the desired Euler angles of the inner attitude controller loop are related to the dynamics of the outer backstepping tracker loop by inverse kinematics, which provides a seamless connection with existing built-in UAV attitude controllers. We implement the proposed method on actual UAV, and experimental flight tests prove the validity of these algorithms. It is seen that our PID design procedure yields tighter UAV performance than an existing popular PID control technique.  相似文献   

11.
High-level synthesis is comprised of interdependent tasks such as scheduling, allocation, and module selection. For today's very large-scale integration (VLSI) designs, the cost of solving the combined scheduling, allocation, and module selection problem by exhaustive search is prohibitive. However, to meet design objectives, an extensive design space exploration is often critical to obtaining superior designs. We present a framework for efficient design space exploration during high-level synthesis of datapaths for data-dominated applications. The framework uses a genetic algorithm (GA) to concurrently perform scheduling and allocation with the aim of finding schedules and module combinations that lead to superior designs while considering user-specified latency and area constraints. The GA uses a multichromosome representation to encode datapath schedules and module allocations and efficient heuristics to minimize functional and storage area costs, while minimizing circuit latencies. The framework provides the flexibility to perform resource-constrained scheduling, time-constrained scheduling, or a combination of the two, using a simple and fast list-scheduling technique. A graded penalty function is used as an objective function in evaluating the quality of designs to enable the GA to quickly reach areas of the search space where designs meeting user specified criteria are most likely to be found. Since GAs are population-based search heuristics, a unique feature of our framework is its ability to offer a large number of alternative datapath designs, all of which meet design specifications but differ in module, register, and interconnect configurations. Many experiments on well-known benchmarks show the effectiveness of our approach.  相似文献   

12.
Digital voting is used to support group decision-making in a variety of contexts ranging from politics to mundane everyday collaboration, and the rise in popularity of digital voting has provided an opportunity to re-envision voting as a social tool that better serves democracy. A key design goal for any group decision-making system is the promotion of participation, yet there is little research that explores how the features of digital voting systems themselves can be shaped to configure participation appropriately. In this paper we propose a framework that explores the design space of digital voting from the perspective of participation. We ground our discussion in the design of a social media polling tool called BallotShare; a first instantiation of our proposed framework designed to facilitate the study of decision-making practices in a workplace environment. Across five weeks, participants created and took part in non-standard polls relating to events and other spontaneous group decisions. Following interviews with participants we identified significant drivers and limitations of individual and collective participation in the voting process: social visibility, social inclusion, commitment and delegation, accountability, influence and privacy.  相似文献   

13.
The efficient design of computation intensive multidimensional signal processing applications requires dealing with three kinds of constraints: those implied by the data dependencies, the non-functional requirements (real-time, power consumption) and resources availability of the execution platform. Modeling and Analysis of Real-time and Embedded systems (MARTE) UML profile through its repetitive structure modeling (RSM) package is well suited to model the inherent parallelism within these applications, a compact representation of parallel execution platforms and the distributive mapping of one on another. The execution of such a specification respects the whole set of constraints defined upon, while the quality of the scheduling is directly linked to the quality of the mapping of the multidimensional structures (data arrays or parallel loop nests) into time and space. We propose here a strategy to use a refactoring tool dedicated to this kind of application that allows to find good trade-offs in the usage of storage and computation resources and in parallelism (both task and data parallelism) exploitation. This strategy is illustrated on an industrial radar application.  相似文献   

14.
Multi-objective evolutionary algorithms (MOEAs) have received increasing interest in industry because they have proved to be powerful optimizers. Despite the great success achieved, however, MOEAs have also encountered many challenges in real-world applications. One of the main difficulties in applying MOEAs is the large number of fitness evaluations (objective calculations) that are often needed before an acceptable solution can be found. There are, in fact, several industrial situations in which fitness evaluations are computationally expensive and the time available is very short. In these applications efficient strategies to approximate the fitness function have to be adopted, looking for a trade-off between optimization performance and efficiency. This is the case in designing a complex embedded system, where it is necessary to define an optimal architecture in relation to certain performance indexes while respecting strict time-to-market constraints. This activity, known as design space exploration (DSE), is still a great challenge for the EDA (electronic design automation) community. One of the most important bottlenecks in the overall design flow of an embedded system is due to simulation. Simulation occurs at every phase of the design flow and is used to evaluate a system which is a candidate for implementation. In this paper we focus on system level design, proposing an extensive comparison of the state-of-the-art of MOEA approaches with an approach based on fuzzy approximation to speed up the evaluation of a candidate system configuration. The comparison is performed in a real case study: optimization of the performance and power dissipation of embedded architectures based on a Very Long Instruction Word (VLIW) microprocessor in a mobile multimedia application domain. The results of the comparison demonstrate that the fuzzy approach outperforms in terms of both performance and efficiency the state of the art in MOEA strategies applied to DSE of a parameterized embedded system.  相似文献   

15.
The design of embedded systems is being challenged by their growing complexity and tight performance requirements. This paper presents the COMPLEX UML/MARTE Design Space Exploration methodology, an approach based on a novel combination of Model Driven Engineering (MDE), Electronic System Level (ESL) and design exploration technologies. The proposed framework enables capturing the set of possible design solutions, that is, the design space, in an abstract, standard and graphical way by relying on UML and the standard MARTE profile. From that UML/MARTE based model, the automated generation framework proposed produces an executable, configurable and fast performance model which includes functional code of the application components. This generated model integrates an XML-based interface for communication with the tool which steers the exploration. This way, the DSE loop iterations are efficiently performed, without user intervention, avoiding slow manual editions, or regeneration of the performance model. The novel DSE suited modelling features of the methodology are shown in detail. The paper also presents the performance model generation framework, including the enhancements with regard the previous simulation and estimation technology, and the exploration technology. The paper uses an EFR vocoder system example for showing the methodology and for demonstrative results.  相似文献   

16.
传统的航天器运行管控方式严重依赖地面指控、人力成本耗费大且对任务响应时间长,难以满足深空探测远距离通信时延突出的任务特点,航天器的智能化水平亟待进一步提高.本文通过对航天器自主运行问题进行系统地分析,提出一种基于多智能体的航天器自主运行的体系结构,面向自主系统结构、运行流程等进行了设计.重点针对小行星探测任务,提出了一种包含平台和载荷任务管理的两阶段航天器自主任务规划算法设计方案,能够实现平载一体的航天器自主任务管理,根据高级任务目标输出完整的指令序列.并通过自主任务规划仿真,对相关算法和模型的正确性和可行性进行了可视化验证和分析,为航天器自主运行技术研究提供了一种有益的思路和方法.  相似文献   

17.
18.
A novel automated design space exploration (DSE) approach of multi-cycle transient fault detectable datapath based on multi-objective user constraints (power and delay) for application specific computing is presented in this paper. To the best of the authors’ knowledge, this is the first work in the literature to solve this problem. The presented approach, driven by bacterial foraging optimization (BFO) algorithm provides easy flexibility to change direction in the design space through tumble/swim actions if a search path is found ineffective. The approach is highly capable of reaching true Pareto optimal curve indicated by the closeness of our non-dominated solutions to the true Pareto front and their uniform distribution over the Pareto curve (implying diversity). The contributions of this paper are as follows: (a) novel exploration approach for generating a high quality fault detectable structure based on user provided requirements of power-delay, which is capable of transient error detection in the datapath; (b) novel fault detectable algorithm for handling single and multi-cycle transient faults.The results of the proposed approach indicated an average improvement in Quality of Results (QoR) of >9% and reduction in hardware usage of >23% compared to recent approaches that are closer in solving a similar objective.  相似文献   

19.
In this paper we analyze the properties of the design of PID controller based on the modulus-optimum criterion for an important class of non-oscillating linear plants with dead time and present some important properties of the settings not published so far. The results are used to design a suitable correction of the settings, which ensures that a sufficient stability margin is preserved. After inclusion of these enhancements a robust design method is obtained, which provides good performance even for systems with long dead time and is easy to implement.  相似文献   

20.
FPGA与通用处理器同步数据传输接口的设计   总被引:1,自引:1,他引:0  
针对FPGA与通用处理器之间数据通信的方式,提出了基于包含SDRAM控制器的通用处理器与FPGA实现同步数据传输的方法。该方法通过在FPGA内部构建同步输入/输出接口STI(Synchronous Transmission Interface),将FPGA模拟为包含SDRAM控制器的通用处理器的外接SDRAM存储器,从而实现FPGA与通用处理器之间的同步数据传输。经理论分析和实际电路验证表明,对于FPGA与通用处理器之间的数据通信,在不增加任何硬件成本的前提下,采用该方法较传统异步传输方法传输速率得到显著的提升。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号