首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper is to develop a simplified optimized tracking control using reinforcement learning (RL) strategy for a class of nonlinear systems. Since the nonlinear control gain function is considered in the system modeling, it is challenging to extend the existing RL-based optimal methods to the tracking control. The main reasons are that these methods' algorithm are very complex; meanwhile, they also require to meet some strict conditions. Different with these exiting RL-based optimal methods that derive the actor and critic training laws from the square of Bellman residual error, which is a complex function consisting of multiple nonlinear terms, the proposed optimized scheme derives the two RL training laws from negative gradient of a simple positive function, so that the algorithm can be significantly simplified. Moreover, the actor and critic in RL are constructed by employing neural network (NN) to approximate the solution of Hamilton–Jacobi–Bellman (HJB) equation. Finally, the feasibility of the proposed method is demonstrated in accordance with both Lyapunov stability theory and simulation example.  相似文献   

2.
As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In this paper, a novel RL approach with fast policy search and adaptive basis function selection, which is called Continuous-action Approximate Policy Iteration (CAPI), is proposed for RL in MDPs with both continuous state and action spaces. In CAPI, based on the value functions estimated by temporal-difference learning, a fast policy search technique is suggested to search for optimal actions in continuous spaces, which is computationally efficient and easy to implement. To improve the generalization ability and learning efficiency of CAPI, two adaptive basis function selection methods are developed so that sparse approximation of value functions can be obtained efficiently both for linear function approximators and kernel machines. Simulation results on benchmark learning control tasks with continuous state and action spaces show that the proposed approach not only can converge to a near-optimal policy in a few iterations but also can obtain comparable or even better performance than Sarsa-learning, and previous approximate policy iteration methods such as LSPI and KLSPI.  相似文献   

3.
This paper is concerned with the networked control of a class of uncertain nonlinear systems. In this way, Takagi–Sugeno (T-S) fuzzy modelling is used to extend the previously proposed variable selective control (VSC) methodology to nonlinear systems. This extension is based upon the decomposition of the nonlinear system to a set of fuzzy-blended locally linearised subsystems and further application of the VSC methodology to each subsystem. To increase the applicability of the T-S approach for uncertain nonlinear networked control systems, this study considers the asynchronous premise variables in the plant and the controller, and then introduces a robust stability analysis and control synthesis. The resulting optimal switching-fuzzy controller provides a minimum guaranteed cost on an H2 performance index. Simulation studies on three nonlinear benchmark problems demonstrate the effectiveness of the proposed method.  相似文献   

4.
《Automatica》2014,50(12):3281-3290
This paper addresses the model-free nonlinear optimal control problem based on data by introducing the reinforcement learning (RL) technique. It is known that the nonlinear optimal control problem relies on the solution of the Hamilton–Jacobi–Bellman (HJB) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, most practical systems are too complicated to establish an accurate mathematical model. To overcome these difficulties, we propose a data-based approximate policy iteration (API) method by using real system data rather than a system model. Firstly, a model-free policy iteration algorithm is derived and its convergence is proved. The implementation of the algorithm is based on the actor–critic structure, where actor and critic neural networks (NNs) are employed to approximate the control policy and cost function, respectively. To update the weights of actor and critic NNs, a least-square approach is developed based on the method of weighted residuals. The data-based API is an off-policy RL method, where the “exploration” is improved by arbitrarily sampling data on the state and input domain. Finally, we test the data-based API control design method on a simple nonlinear system, and further apply it to a rotational/translational actuator system. The simulation results demonstrate the effectiveness of the proposed method.  相似文献   

5.
This paper compares reinforcement learning (RL) with model predictive control (MPC) in a unified framework and reports experimental results of their application to the synthesis of a controller for a nonlinear and deterministic electrical power oscillations damping problem. Both families of methods are based on the formulation of the control problem as a discrete-time optimal control problem. The considered MPC approach exploits an analytical model of the system dynamics and cost function and computes open-loop policies by applying an interior-point solver to a minimization problem in which the system dynamics are represented by equality constraints. The considered RL approach infers in a model-free way closed-loop policies from a set of system trajectories and instantaneous cost values by solving a sequence of batch-mode supervised learning problems. The results obtained provide insight into the pros and cons of the two approaches and show that RL may certainly be competitive with MPC even in contexts where a good deterministic system model is available.   相似文献   

6.
Robust motion control is fundamental to autonomous mobile robots. In the past few years, reinforcement learning (RL) has attracted considerable attention in the feedback control of wheeled mobile robot. However, it is still difficult for RL to solve problems with large or continuous state spaces, which is common in robotics. To improve the generalization ability of RL, this paper presents a novel hierarchical RL approach for optimal path tracking of wheeled mobile robots. In the proposed approach, a graph Laplacian-based hierarchical approximate policy iteration (GHAPI) algorithm is developed, in which the basis functions are constructed automatically using the graph Laplacian operator. In GHAPI, the state space of an Markov decision process is divided into several subspaces and approximate policy iteration is carried out on each subspace. Then, a near-optimal path-tracking control strategy can be obtained by GHAPI combined with proportional-derivative (PD) control. The performance of the proposed approach is evaluated by using a P3-AT wheeled mobile robot. It is demonstrated that the GHAPI-based PD control can obtain better near-optimal control policies than previous approaches.  相似文献   

7.
Safety critical control is often trained in a simulated environment to mitigate risk. Subsequent migration of the biased controller requires further adjustments. In this paper, an experience inference human-behavior learning is proposed to solve the migration problem of optimal controllers applied to real-world nonlinear systems. The approach is inspired in the complementary properties that exhibits the hippocampus, the neocortex, and the striatum learning systems located in the brain. The hippocampus defines a physics informed reference model of the real-world nonlinear system for experience inference and the neocortex is the adaptive dynamic programming (ADP) or reinforcement learning (RL) algorithm that ensures optimal performance of the reference model. This optimal performance is inferred to the real-world nonlinear system by means of an adaptive neocortex/striatum control policy that forces the nonlinear system to behave as the reference model. Stability and convergence of the proposed approach is analyzed using Lyapunov stability theory. Simulation studies are carried out to verify the approach.   相似文献   

8.
Attitude control of operational satellites is still predominantly performed by standard controllers such as Proportional plus Derivative (PD) control laws, which are still preferred for implementation to the computationally intensive nonlinear optimal control techniques, representing higher implementation complexity. In this paper, an inverse optimal control approach based on phase space geometry is presented, which is easy to implement and free from numerical and computational issues. The optimal control objective is to minimize a norm of the control torque subject to a rapidity constraint on the convergence rate of a Lyapunov function, under the effect of a benchmark controller. The proposed optimization method is shown to significantly enhance the torque-rapidity trade-off compared to the benchmark controller, chosen to be a PD law then a sliding mode controller. The inverse optimal control scheme is implemented on an air bearing table experimental platform.  相似文献   

9.
Multilinear model approach turns out to be an ideal candidate for dealing with nonlinear systems control problem. However, how to identify the optimal active state subspace of each linear subsystem is an open problem due to that the closed-loop performance of nonlinear systems interacts with these subspaces ranges. In this paper, a new systematic method of integrated state space partition and optimal control of multi-model for nonlinear systems based on hybrid systems is initially proposed, which can deal with the state space partition and associated optimal control simultaneously and guarantee an overall performance of nonlinear systems consequently. The proposed method is based on the framework of hybrid systems which synthesizes the multilinear model, produced by nonlinear systems, in a unified criterion and poses a two-level structure. At the upper level, the active state subspace of each linear subsystem is determined under the optimal control index of a hybrid system over infinite horizon, which is executed off-line. At the low level, the optimal control is implemented online via solving the optimal control of hybrid system over finite horizon. The finite horizon optimal control problem is numerically computed by simultaneous method for speeding up computation. Meanwhile, the model mismatch produced by simultaneous method is avoided by using the strategy of receding-horizon. Simulations on CSTR (Continuous Stirred Tank Reactor) confirm that a superior performance can be obtained by using the presented method.  相似文献   

10.
This paper presents the Generalized Predictive Control (GPC) strategy based on Artificial Neural Network (ANN) plant model. To obtain the step and the free process responses which are needed in the generalized predictive control strategy we iteratively use a multilayer feedforward ANN as a one-step-ahead predictor. A bioprocess was chosen as a realistic nonlinear SISO system to demonstrate the feasibility and the performance of this control scheme. A comparison was made between our approach and the adaptive GPC (AGPC).  相似文献   

11.
In this paper, a data-driven control approach is developed by reinforcement learning (RL) to solve the global robust optimal output regulation problem (GROORP) of partially linear systems with both static uncertainties and nonlinear dynamic uncertainties. By developing a proper feedforward controller, the GROORP is converted into a global robust optimal stabilization problem. A robust optimal feedback controller is designed which is able to stabilize the system in the presence of dynamic uncertainties. The closed-loop system is ensured to be input-to-output stable regarding the static uncertainty as the external input. This robust optimal controller is numerically approximated via RL. Nonlinear small-gain theory is applied to show the input-to-output stability for the closed-loop system and thus solves the original GROORP. Simulation results validates the efficacy of the proposed methodology.   相似文献   

12.
针对标准的灰狼优化算法GWO对于复杂优化问题的求解易陷入局部最优的缺点,从混沌初始化和非线性控制策略2个角度,提出一种基于Cubic映射和反向学习的灰狼优化算法COGWO。首先,利用Cubic映射和反向学习策略对种群进行初始化,并通过非线性参数控制策略来调节寻优过程中的参数;然后,对6种基准测试函数进行寻优实验,实验结果表明,COGWO算法具有更好的收敛精度、收敛速度和稳定性;最后,将COGWO算法应用到了实际的工程优化问题中。  相似文献   

13.
This paper presents the development of a new robust optimal decentralized PI controller based on nonlinear optimization for liquid level control in a coupled tank system. The proposed controller maximizes the closed-loop bandwidth for specified gain and phase margins, with constraints on the overshoot ratio to achieve both closed-loop performance and robustness. In the proposed work, a frequency response fitting model reduction technique is initially employed to obtain a first order plus dead time (FOPDT) model of each higher order subsystem. Furthermore, based on the reduced order model, a proposed controller is designed. The stability and performance of the proposed controller are verified by considering multiplicative input and output uncertainties. The performance of the proposed optimal robust decentralized control scheme has been compared with that of a decentralized PI controller. The proposed controller is implemented in real-time on a coupled tank system. From the obtained results, it is shown that the proposed optimal decentralized PI controller exhibits superior control performance to maintain the desired level, for both the nominal as well as the perturbed case as compared to a decentralized PI controller.   相似文献   

14.
邱兴兴  张珍珍  魏启明 《计算机应用》2014,34(10):2880-2885
在多目标进化优化中,使用分解策略的基于分解的多目标进化算法(MOEA/D)时间复杂度低,使用〖BP(〗强度帕累托策略的〖BP)〗强度帕累托进化算法-2(SPEA2)能得到分布均匀的解集。结合这两种策略,提出一种新的多目标进化算法用于求解具有复杂、不连续的帕累托前沿的多目标优化问题(MOP)。首先,利用分解策略快速逼近帕累托前沿;然后,利用强度帕累托策略使解集均匀分布在帕累托前沿,利用解集重置分解策略中的权重向量集,使其适配于特定的帕累托前沿;最后,利用分解策略进一步逼近帕累托前沿。使用的反向世代距离(IGD)作为度量标准,将新算法与MOEA/D、SPEA2和paλ-MOEA/D在12个基准问题上进行性能对比。实验结果表明该算法性能在7个基准问题上最优,在5个基准问题上接近于最优,且无论MOP的帕累托前沿是简单或复杂、连续或不连续的,该算法均能生成分布均匀的解集。  相似文献   

15.
非线性互联大系统的最优控制:逐次逼近法   总被引:3,自引:0,他引:3  
唐功友  孙亮 《自动化学报》2005,31(2):248-254
The optimal control problem for nonlinear interconnected large-scale dynamic systems is considered. A successive approximation approach for designing the optimal controller is proposed with respect to quadratic performance indexes. By using the approach, the high order, coupling, nonlinear two-point boundary value (TPBV) problem is transformed into a sequence of linear decoupling TPBV problems. It is proven that the TPBV problem sequence uniformly converges to the optimal control for nonlinear interconnected large-scale systems. A suboptimal control law is obtained by using a finite iterative result of the optimal control sequence.  相似文献   

16.
Direct optimal control algorithms first discretize the continuous-time optimal control problem and then solve the resulting finite dimensional optimization problem. If Newton type optimization algorithms are used for solving the discretized problem, accurate first as well as second order sensitivity information needs to be computed. This article develops a novel approach for computing Hessian matrices which is tailored for optimal control. Algorithmic differentiation based schemes are proposed for both discrete- and continuous-time sensitivity propagation, including explicit as well as implicit systems of equations. The presented method exploits the symmetry of Hessian matrices, which typically results in a computational speedup of about factor 2 over standard differentiation techniques. These symmetric sensitivity equations additionally allow for a three-sweep propagation technique that can significantly reduce the memory requirements, by avoiding the need to store a trajectory of forward sensitivities. The performance of this symmetric sensitivity propagation is demonstrated for the benchmark case study of the economic optimal control of a nonlinear biochemical reactor, based on the open-source software implementation in the ACADO Toolkit.  相似文献   

17.
Reinforcement learning (RL) has now evolved as a major technique for adaptive optimal control of nonlinear systems. However, majority of the RL algorithms proposed so far impose a strong constraint on the structure of environment dynamics by assuming that it operates as a Markov decision process (MDP). An MDP framework envisages a single agent operating in a stationary environment thereby limiting the scope of application of RL to control problems. Recently, a new direction of research has focused on proposing Markov games as an alternative system model to enhance the generality and robustness of the RL based approaches. This paper aims to present this new direction that seeks to synergize broad areas of RL and Game theory, as an interesting and challenging avenue for designing intelligent and reliable controllers. First, we briefly review some representative RL algorithms for the sake of completeness and then describe the recent direction that seeks to integrate RL and game theory. Finally, open issues are identified and future research directions outlined.  相似文献   

18.
In this paper, we propose a Bernstein polynomial based global optimization algorithm for the optimal feedback control of nonlinear hybrid systems using a multiple-model approach. Specifically, we solve at every sampling instant a polynomial mixed-integer nonlinear programming problem arising in the model predictive control strategy. The proposed algorithm uses the Bernstein polynomial form in a branch-and-bound framework, with new ingredients such as branching for integer decision variables and fathoming for each subproblem in the branch-and-bound tree. The performance of the proposed algorithm is tested and compared with existing algorithms on a benchmark three-spherical tank system. The test results show the superior performance of the proposed algorithm.  相似文献   

19.
基于PSO的预测控制及在聚丙烯中的应用   总被引:1,自引:0,他引:1  
输入输出受限非线性系统的预测控制问题,可以看作是一个难以直接求解的约束非线性优化问题。针对预测控制在解决此类优化问题时,存在易收敛到局部极小或者非可行解,对初始值敏感等缺点,提出了一种基于微粒群优化方法的非线性预测控制算法。采用微粒群优化算法(PSO)作为模型预测控制的滚动优化方法,在线实时求解最优控制律。将PSO与序贯二次规划(SQP)算法进行对比仿真实验,求解两个标准函数优化问题,结果表明PSO能够快速有效地求得全局最小点,而SQP则很容易陷入局部极小点。将该算法应用于丙烯聚合反应过程的温度控制中,仿真结果显示了该方法的有效性。  相似文献   

20.
A novel approach is presented for the analysis and the design of a controller for a bioreactor. It is based on the model reference control theory, assisted by a neural network identifier. The control objectives specified in the paper require the controller to be a nonlinear one, however, it is shown that it is stable in the sense of bounded input bounded output and locally stabilizing in the sense of Lyapunov. The feasibility and the efficacy of the proposed approach are tested on the benchmark problem. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号