首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Reinforcement learning is one of the fastest growing areas in machine learning, and has obtained great achievements inbiomedicine, Internet of Things (IoT), logistics, robotic control, etc. However, there are still many challenges for engineeringapplications, such as how to speed up the learning process, how to balance the trade-off between exploration and exploitation.Quantum technology, which can solve complex problems faster than classical methods, especially in supercomputers,provides us a new paradigm to overcome these challenges in reinforcement learning. In this paper, a quantum-enhancedreinforcement learning is pictured for optimal control. In this algorithm, the states and actions of reinforcement learningare quantized by quantum technology. And then, a probability amplification method, which can effectively avoid thetrade-off between exploration and exploitation via quantized technology, is presented. Finally, the optimal control policy islearnt during the process of reinforcement learning. The performance of this quantized algorithm is demonstrated in bothMountainCar reinforcement learning environment and CartPole reinforcement learning environment—one kind of classicalcontrol reinforcement learning environment in the OpenAI Gym. The preliminary study results validate that, compared withQ-learning, this quantized reinforcement learning method has better control performance without considering the trade-offbetween exploration and exploitation. The learning performance of this new algorithm is stable with different learning ratesfrom 0.01 to 0.10, which means it is promising to be employed in unknown dynamics systems.  相似文献   

2.
This article presents an event-triggered H∞ consensus control scheme using reinforcement learning (RL) for nonlinearsecond-order multi-agent systems (MASs) with control constraints. First, considering control constraints, the constrained H∞consensus problem is transformed into a multi-player zero-sum game with non-quadratic performance functions. Then, anevent-triggered control method is presented to conserve communication resources and a new triggering condition is developedfor each agent to make the triggering threshold independent of the disturbance attenuation level. To derive the optimal controllerthat can minimize the cost function in the case of worst disturbance, a constrained Hamilton–Jacobi–Bellman (HJB) equationis defined. Since it is difficult to solve analytically due to its strongly non-linearity, reinforcement learning (RL) is implementedto obtain the optimal controller. In specific, the optimal performance function and the worst-case disturbance are approximatedby a time-triggered critic network; meanwhile, the optimal controller is approximated by event-triggered actor network. Afterthat, Lyapunov analysis is utilized to prove the uniformly ultimately bounded (UUB) stability of the system and that thenetwork weight errors are UUB. Finally, a simulation example is utilized to demonstrate the effectiveness of the controlstrategy provided.  相似文献   

3.
In this study, an adaptive neuro-observer-based optimal control (ANOPC) policy is introduced for unknown nonaffinenonlinear systems with control input constraints. Hamilton–Jacobi–Bellman (HJB) framework is employed to minimize anon-quadratic cost function corresponding to the constrained control input. ANOPC consists of both analytical and algebraicparts. In the analytical part, first, an observer-based neural network (NN) approximates uncertain system dynamics,and then another NN structure solves the HJB equation. In the algebraic part, the optimal control input that does not exceedthe saturation bounds is generated. The weights of two NNs associated with observer and controller are simultaneouslyupdated in an online manner. The ultimately uniformly boundedness (UUB) of all signals of the whole closed-loop systemis ensured through Lyapunov’s direct method. Finally, two numerical examples are provided to confirm the effectiveness ofthe proposed control strategy.  相似文献   

4.
    
This paper presents a novel distributed multi-agent temporal-difference learning framework for value function approximation, which allows agents using all the neighbor information instead of the information from only one neighbor. With full neighbor information, the proposed framework (1) has a faster convergence rate, and (2) is more robust compared to the state-of-the-art approaches. Then we propose a distributed multi-agent discounted temporal difference algorithm and a distributed multi-agent average cost temporal difference learning algorithm based on the framework. Moreover, the two proposed algorithms’ theoretical convergence proofs are provided. Numerical simulation results show that our proposed algorithms are superior to the gossip-based algorithm in convergence speed, robustness to noise and time-varying network topology.  相似文献   

5.
In this paper, to solve the consensus control problem of multi-manipulator systems under Markov switching topologies,we propose a distributed consensus control strategy based on disturbance observer. In multi-manipulator systems, externaldisturbance described by heterogeneous exogenous systems is considered, and all communication topologies are directed.First, a disturbance observer is presented to suppress the influence of unknown external disturbance, and the equivalentcompensation is introduced into the control protocol in multi-manipulator systems. Then, a novel control protocol based onneighbor information is designed, which guarantees that multi-manipulator systems reach consensus under Markov switchingtopologies. Finally, two simulation examples verify the validity of the theoretical result.  相似文献   

6.
    
This paper studies a distributed policy evaluation in multi-agent reinforcement learning. Under cooperative settings, eachagent only obtains a local reward, while all agents share a common environmental state. To optimize the global return asthe sum of local return, the agents exchange information with their neighbors through a communication network. The meansquared projected Bellman error minimization problem is reformulated as a constrained convex optimization problem witha consensus constraint; then, a distributed alternating directions method of multipliers (ADMM) algorithm is proposed tosolve it. Furthermore, an inexact step for ADMM is used to achieve efficient computation at each iteration. The convergenceof the proposed algorithm is established.  相似文献   

7.
    
Distributed matrix-scaled consensus is a kind of generalized cooperative control problem and has broad applications in thefield of social network and engineering. This paper addresses the robust distributed matrix-scaled consensus of perturbedmulti-agent systems suffering from unknown disturbances. Distributed discontinuous protocols are first proposed to driveagents to achieve cluster consensus and suppress the effect of disturbances. Adaptive protocols with time-varying gainsobeying differential equations are also designed, which are completely distributed and rely on no global information. Usingthe boundary layer technique, smooth protocols are proposed to avoid the unexpected chattering effect due to discontinuousfunctions. As a cost, under the designed smooth protocols, the defined matrix-scaled consensus error tends to a residual setrather than zero, in which the residual bound is arbitrary small by choosing proper parameters. Moreover, distributed dynamicevent-based matrix-scalar consensus controllers are also proposed to avoid continuous communications. Simulation examplesare provided to further verify the designed algorithms.  相似文献   

8.
    
This paper presents a novel model-free method to solve linear quadratic (LQ) mean-field control problems with onedimensionalstate space and multiplicative noise. The focus is on the infinite horizon LQ setting, where the conditionsfor solution either stabilization or optimization can be formulated as two algebraic Riccati equations (AREs). The proposedapproach leverages the integral reinforcement learning technique to iteratively solve the drift-coefficient-dependent stochasticARE (SARE) and other indefinite ARE, without requiring knowledge of the system dynamics. A numerical example is givento demonstrate the effectiveness of the proposed algorithm.  相似文献   

9.
    
In this paper, we present an output regulation method for unknown cyber-physical systems (CPSs) under time-delay attacks in both the sensor-to-controller (S-C) channel and the controller-to-actuator (C-A) channel. The proposed approach is designed using control inputs and tracking errors which are accessible data. Reinforcement learning is leveraged to update the control gains in real time using policy or value iterations. A thorough stability analysis is conducted and it is found that the proposed controller can sustain the convergence and asymptotic stability even when two channels are attacked. Finally, comparison results with a simulated CPS verify the effectiveness of the proposed output regulation method.  相似文献   

10.
This study concentrates on solving the output consensus problem for a class of heterogeneous uncertain nonstrict-feedbacknonlinear multi-agent systems under switching-directed communication topologies, in which all followers are subjected tomulti-type input constraints such as unknown asymmetric saturation, unknown dead-zone and their integration. A unifiedrepresentation is presented to overcome the difficulties originating from multi-agent input constraints. Moreover, the uncertainsystem functions in a non-lower triangular form and the interaction terms among agents are dealt with by exploitingthe fuzzy logic systems and their special property. Furthermore, by introducing a nonlinear filter to alleviate the problem of“explosion of complexity” during the backstepping design, a distributed common adaptive control protocol is proposed toensure that the synchronization errors converge to a small neighborhood of the origin despite the existence of multiple inputconstraints and arbitrary switching communication topologies. Both stability analysis and simulation results are conductedto show the effectiveness and performance of the proposed control methodology.  相似文献   

11.
    
This paper reviews recent developments in learning-based adaptive optimal output regulation that aims to solve the problem of adaptive and optimal asymptotic tracking with disturbance rejection. The proposed framework aims to bring together two separate topics—output regulation and adaptive dynamic programming—that have been under extensive investigation due to their broad applications in modern control engineering. Under this framework, one can solve optimal output regulation problems of linear, partially linear, nonlinear, and multi-agent systems in a data-driven manner. We will also review some practical applications based on this framework, such as semi-autonomous vehicles, connected and autonomous vehicles, and nonlinear oscillators.  相似文献   

12.
This paper considers the problem of distributed online regularized optimization over a network that consists of multiple interacting nodes. Each node is endowed with a sequence of loss functions that are time-varying and a regularization function that is fixed over time. A distributed forward–backward splitting algorithm is proposed for solving this problem and both fixed and adaptive learning rates are adopted. For both cases, we show that the regret upper bounds scale as O( √T ), where T is the time horizon. In particular, those rates match the centralized counterpart. Finally, we show the effectiveness of the proposed algorithms over an online distributed regularized linear regression problem.  相似文献   

13.
In this paper, the asymptotic stability of Port-Hamiltonian (PH) systems with constant inputs is studied. Constant inputsare useful for stabilizing systems at their nonzero equilibria and can be realized by step signals. To achieve this goal, twomethods based on integral action and comparison principle are presented in this paper. These methods change the convexHamiltonian function and the restricted damping matrix of the previous results into a Hamiltonian function with a localminimum and a positive semidefinite matrix, respectively. Due to common conditions of Hamiltonian function and dampingmatrix, the proposed method asymptotically stabilizes more classes of PH systems with constant inputs than the existingmethods. Finally, the validity and advantages of the presented methods are shown in an example.  相似文献   

14.
    
In this paper, we propose a model predictive control (MPC) strategy for accelerated offset-free tracking piece-wise constant reference signals of nonlinear systems subject to state and control constraints. Some special contractive constraints on tracking errors and terminal constraints are embedded into the tracking nonlinear MPC formulation. Then, recursive feasibility and closed-loop convergence of the tracking MPC are guaranteed in the presence of piece-wise references and constraints by deriving some sufficient conditions. Moreover, the local optimality of the tracking MPC is achieved for unreachable output reference signals. By comparing to traditional tracking MPC, the simulation experiment of a thermal system is used to demonstrate the acceleration ability and the effectiveness of the tracking MPC scheme proposed here.  相似文献   

15.
In this survey, we present single-photon states of electromagnetic fields, discuss discrete measurements of a single-photonfield, show how a linear quantum system responds to a single-photon input, investigate how a coherent feedback network canbe used to manipulate the temporal pulse shape of a single-photon state, present single-photon filter and master equations,and finally discuss the generation of Schr?dinger cat states by means of photon addition and subtraction.  相似文献   

16.
This paper presents an in-depth analytical and empirical assessment of the performance of DoubleBee, a novel hybrid aerial–ground robot. Particularly, the dynamic model of the robot with ground contact is analyzed, and the unknown parameters inthe model are identified. We apply an unscented Kalman filter-based approach and a least square-based approach to estimatethe parameters with given measurements and inputs at every time step. Real data are collected and used to estimate theparameters; test data verify that the values obtained are able to model the rotation of the robot accurately. A gain-scheduledfeedback controller is proposed, which leverages the identified model to generate accurate control inputs to drive the systemto the desired states. The system is proven to track a constant-velocity reference signal with bounded error. Simulations andreal-world experiments using the proposed controller show improved performance than the PID-based controller in trackingstep commands and maintaining attitude under robot movement.  相似文献   

17.
In this paper, a data-driven method for disturbance estimation and rejection is presented. The proposed approach is divided into two stages: an inner stabilization loop, to set the desired reference model, together with an outer loop for disturbance estimation and compensation. Inspired by the active disturbance rejection control framework, the exogenous and endogenous disturbances are lumped into a total disturbance signal. This signal is estimated using an on-line algorithm based on a datadriven predictor scheme, whose parameters are chosen to satisfy high robustness-performance criteria. The above process is presented as a novel enhancement to design a disturbance observer, which constitutes the main contribution of the paper. In addition, the control strategy is completely presented in discrete time, avoiding the use of discretization methods for its digital implementation. As a case study, the voltage control of a DC-DC synchronous buck converter afected by disturbances in the input voltage and the load is considered. Finally, experimental results that validate the proposed strategy and some comparisons with the classical disturbance observer-based control are presented.  相似文献   

18.
Actuator faults usually cause security problem in practice. This paper is concerned with the security control of positivesemi-Markovian jump systems with actuator faults. The considered systems are with mode transition-dependent sojourntimedistributions, which may also lead to actuator faults. First, the time-varying and bounded transition rate that satisfiesthe mode transition-dependent sojourn-time distribution is considered. Then, a stochastic co-positive Lyapunov function isconstructed. Using matrix decomposition technique, a set of state-feedback controllers for positive semi-Markovian jumpsystems with actuator faults are designed in terms of linear programming. Under the designed controllers, stochastic stabilizationof the systems with actuator faults are achieved and the security of the systems can be guaranteed. Furthermore, theproposed results are extended to positive semi-Markovian jump systems with interval and polytopic uncertainties. By virtueof a segmentation technique of the transition rates, a less conservative security control design is also proposed. Finally,numerical examples are provided to demonstrate the validity of the presented results.  相似文献   

19.
Eco-driving has always been an ongoing topic. In urban driving conditions, traffic regulations, other vehicle behaviors, andspecial driving scenarios will have a major impact on the energy consumption of autonomous vehicles. As a representativealgorithm of artificial intelligence, reinforcement learning has the ability to perform well under complex tasks. This paperuses deep reinforcement learning algorithms to design the economical driving strategies of autonomous vehicles in threedriving scenarios: driving at signalized intersection under free traffic flow, car-following on ramps, and driving at signalizedintersection considering queue effects. In the above three driving scenarios, the driving strategy proposed in this paper achieveseconomical driving performance while satisfying the driving scenario requirements.  相似文献   

20.
An extended state observer (ESO)-based loop flter is designed for the phase-locked loop (PLL) involved in a disturbed gridconnected converter (GcC). This ESO-based design enhances the performances and robustness of the PLL, and, therefore, improves control performances of the disturbed GcCs. Besides, the ESO-based LF can be applied to PLLs with extra flters for abnormal grid conditions. The unbalanced grid is particularly taken into account for the performance analysis. A tuning approach based on the well-designed PI controller is discussed, which results in a fair comparison with conventional PItype PLLs. The frequency domain properties are quantitatively analysed with respect to the control stability and the noises rejection. The frequency domain analysis and simulation results suggest that the performances of the generated ESO-based controllers are comparable to those of the PI control at low frequency, while have better ability to attenuate high-frequency measurement noises. The phase margin decreases slightly, but remains acceptable. Finally, experimental tests are conducted with a hybrid power hardware-in-the-loop benchmark, in which balanced/unbalanced cases are both explored. The obtained results prove the efectiveness of ESO-based PLLs when applied to the disturbed GcC.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号

京公网安备 11010802026262号