期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

殷苌茗王汉兴陈焕文谢丽娟《电力科学与技术学报》2003,18(4):12-16

智能体通过学习最优决策来解决其决策问题.激励学习方法是智能体通过与其所处的环境交互来改进它自身的行为.Markov决策过程(MDP)模型是求解激励学习问题的一般框架,瞬时差分TD(λ)是在MDP模型下与策略相关的学习值函数的一种算法.一般情况下,智能体必须记住其所有的值函数的值,当状态空间非常大时,这种记忆的量是大得惊人的.为了解决这个问题,给出了一种遗忘算法,这种算法把心理学的遗忘准则引入到了激励学习之中.利用遗忘算法,可以解决智能体在大状态空间中的激励学习问题. 相似文献

2.

基于UCB算法的交替深度Q网络

下载免费PDF全文

吴卿源谭晓阳《南京师范大学学报》2022,(1):024-29

在深度强化学习中,智能体需要与环境进行交互学习,这就需要智能体能够很好地去平衡利用与探索. 因此如何提升算法的样本有效性,增加算法的探索能力,一直是深度强化学习领域中非常重要的研究方向. 结合已有研究成果,提出了一种交替使用多个不同初始化深度Q网络方法,使用网络随机初始化带来的探索性能. 基于最大置信度上界算法先构造一种交替选择深度Q网络策略. 并将该调度网络策略与多个随机初始化的深度Q网络结合,得到基于最大置信度上界的交替深度Q网络算法. 在多个不同的标准强化学习实验环境上的实验结果表明,该算法比其他基准算法有更高的样本效率和算法学习效率. 相似文献

3.

基于强化学习的多智能体协作实现

陈雪江杨东勇《浙江工业大学学报》2004,32(5):516-520

基于马尔科夫过程的强化学习作为一种在线学习方式,能够很好地应用于单智能体环境中.但是由于强化学习理论的限制,在多智能体系统中马尔科夫过程模型不再适用,因此强化学习不能直接用于多智能体的协作学习问题.本文提出了多智能体协作的两层强化学习方法.该方法主要通过在单个智能体中构筑两层强化学习单元来实现.第一层强化学习单元负责学习智能体的联合任务协作策略,第二层强化学习单元负责学习在本智能体看来是最有效的行动策略.所提出的方法应用于3个智能体协作抬起圆形物体的计算机模拟中,结果表明所提出的方法比采用传统强化学习方法的智能体协作得更好. 相似文献

4.

A new accelerating algorithm for multi-agent reinforcement learning

张汝波仲宇顾国昌《哈尔滨工业大学学报(英文版)》2005,12(1):48-51

In multi-agent systems, joint-action must be employed to achieve cooperation because the evaluation of the behavior of an agent often depends on the other agents‘ behaviors. However, joint-action reinforcement learning algorithms suffer the slow convergence rate because of the enormous learning space produced by jointaction. In this article, a prediction-based reinforcement learning algorithm is presented for multi-agent cooperation tasks, which demands all agents to learn predicting the probabilities of actions that other agents may execute. A multi-robot cooperation experiment is run to test the efficacy of the new algorithm, and the experiment results show that the new algorithm can achieve the cooperation policy much faster than the primitive reinforcement learning algorithm. 相似文献

5.

Reinforcement learning with partitioning function system

李伟叶庆泰朱昌明《哈尔滨工业大学学报(英文版)》2004,11(4):377-381

The size of state-space is the limiting factor in applying reinforcement learning algorithms to practical cases. A reinforeement learning system with partitioning function (RLWPF) is established, in which statespace is partitioned into several regions. Inside the performance principle of RLWPF is based on a Semi-Markov decision process and has general significance. It can be applied to any reinforcement learning with a large statespace. In RLWPF, the partitioning module dispatches agents into different regions in order to decrease the state-space of each agent. This article proves the convergence of the SARSA algorithm for a Semi-Markov decision process, ensuring the convergence of RLWPF by analyzing the equivalence of two value functions in two Semi-Markov decision processes before and after partitioning. This article can show that the optimal policy learned by RLWPF is consistent with prior domain knowledge. An elevator group system is devised to decrease the average waiting time of passengers. Four agents control four elevator cars respectively. Based on RLWPF, a partitioning module is developed through defining a uniform round trip time as the partitioning criteria, making the wait time of most passengers more or less identical then elevator cars should only answer hall calls in their own region. Compared with ordinal‘ elevator systems and reinforcement learning systems without partitioning module, the performance results show the advantage of RLWPF. 相似文献

6.

Multi-Agent Reinforcement Learning Algorithm Based on Action Prediction

童亮陆际联《北京理工大学学报(英文版)》2006,15(2):133-137

Multi-agent systems composed of concurrent re-inforcement learners have attracted increasing atten-tionin recent years . Multiagent reinforcement learn-ing[1]is much harder than the single-agent case . Thehardness mainly comesfromthefact that the environ-ment is not stationary fromthe viewof an agent be-cause of the existence of other learning agents .Based on stochastic games ,a multi-agent rein-forcement learning algorithmfor zero-sumstochasticgames was proposed by Littman[2]andit was extend… 相似文献

7.

Multi-agent reinforcement learning with cooperation based on eligibility traces

杨玉君程君实陈佳品《哈尔滨工业大学学报(英文版)》2004,11(5):564-568

The application of reinforcement learning is widely used by multi-agent systems in recent years. An agent uses a multi-agent system to cooperate with other agents to accomplish the given task, and one agent‘s behavior usually affects the others‘ behaviors. In traditional reinforcement learning, one agent takes the others location, so it is difficult to consider the others‘ behavior, which decreases the learning efficiency. This paper proposes multi-agent reinforcement learning with cooperation based on eligibility traces, i.e. one agent estimates the other agent‘s behavior with the other agent‘s eligibility traces. The results of this simulation prove the validity of the proposed learning method. 相似文献

8.

A multiagent reinforcement learning approach based on different states

李珺潘启树《哈尔滨工业大学学报(英文版)》2010,17(3):419-423

In this paper we describe a new reinforcement learning approach based on different states. When the multiagent is in coordination state,we take all coordinative agents as players and choose the learning approach based on game theory. When the multiagent is in indedependent state,we make each agent use the independent learning. We demonstrate that the proposed method on the pursuit-evasion problem can solve the dimension problems induced by both the state and the action space scale exponentially with the number of agents and no convergence problems,and we compare it with other related multiagent learning methods. Simulation experiment results show the feasibility of the algorithm. 相似文献

9.

一种改进dueling网络的机器人避障方法

周翼陈渤《西安电子科技大学学报(自然科学版)》2019,46(1):46-50

针对传统增强学习方法在运动规划领域,尤其是机器人避障问题上存在容易过估计、难以适应复杂环境等不足,提出了一种基于深度增强学习的提升机器人避障性能的新算法模型。该模型将dueling神经网络架构与传统增强学习算法Q学习相结合,并利用两个独立训练的dueling网络处理环境数据来预测动作值,在输出层分别输出状态值和动作优势值,并将两者结合输出最终动作值。该模型能处理较高维度数据以适应复杂多变的环境,并输出优势动作供机器人选择以获得更高的累积奖励。实验结果表明,该新算法模型能有效地提升机器人避障性能。相似文献

10.

一种提升机器人强化学习开发效率的训练模式研究

下载免费PDF全文

叶伟杰高军礼蒋丰郭靖《广东工业大学学报》2020,37(5):46-50

强化学习与深度学习结合的深度强化学习（Deep Reinforcement Learning,DRL）模型,目前被广泛应用于机器人控制领域。机器人强化学习需要在3D仿真环境中训练模型,然而在缺乏环境先验知识的情况下,在3D环境中进行试错学习会导致训练周期长、开发成本高的问题。因此提出一种贯通2D到3D的机器人强化学习训练模式,将计算量大、耗时多的工作部署到2D环境中,再把算法结果迁移到3D环境中进行测试。实验证明,这种训练模式能使基于个人电脑的机器人强化学习的开发效率提升5倍左右。相似文献

11.

进港航班排序强化学习模型研究

武喜萍杨红雨杨波《四川大学学报(工程科学版)》2017,49(Z2):173-178

为了解决进港航班排序中智能化程度不高的现实问题,提出了进港航班排序强化学习模型。首先确定了进港航班排序强化学习模型的状态、动作、智能体、环境、奖赏函数、约束条件、Q学习等,进港航班排序强化模型中的状态是各进港航班的到达时刻,动作是对航班到达时间的调整,智能体对航班的到达时刻进行调整,环境对动作做出反应,一个新的到达时间和奖赏值传给智能体。奖赏函数考虑了延误时间、经济成本、对后续航班的影响。该模型考虑了航班不能提前降落,分配的到达时间不早于计划的到达时间,进港航班流量不能超过机场的到达容量值等约束条件。使用双流机场进港航班数据对该模型进行了验证。对比分析了先到先服务和强化学习模型的排序、延误时间、延误成本、后续航班延误成本和奖赏值。先到先服务算法的奖赏函数值为3164,强化学习算法的奖赏函数为2880,强化学习模型更优。模型中奖惩函数的评价指标、权重、约束条件可以根据管制工作实际情况进行设置,该模型可以为空中交通管制人员进行进港航班排序提供决策支持。相似文献

12.

基于免疫聚类的自动分层强化学习方法研究 总被引：1，自引：0，他引：1

沈晶顾国昌刘海波《哈尔滨工程大学学报》2007,28(4):423-428

为解决分层强化学习中现有的自动分层方法对环境和状态空间特性依赖过强的问题，提出了一种基于免疫聚类的自动分层强化学习方法．该方法以Sutton提出的Option框架为基础，在学习的初始阶段，每个Option仅包含一个人口状态且执行平坦策略，经过若干个学习周期对环境进行充分探测后，应用免疫聚类方法对状态空间进行聚类，同时生成每个聚类空间下的Option，并在学习过程中完成内部策略的学习，从而实现自动分层．以二维有障碍栅格空间内路径规划为问题背景进行了仿真实验，仿真结果表明该方法不受状态空间的结构性和可分割性以及强化信号延迟的影响．相似文献

13.

基于关系转移和增强学习的时空大数据动态预测

郑子君冯翔虞慧群李修全《山东大学学报(工学版)》2021,51(2):105-114

为了解决较大时空范围内的动态预测无法获得精确解的问题,采用支持较复杂工作流模式的群智计算方式,提出一种基于关系转移和增强学习的动态预测算法,解决时空数据中的优化问题。设计一个关系转移块,通过对时空数据进行特征提取来学习关系转移概率。建立一个预测增强学习块,随时间序列并行处理转移关系概率,根据特征偏好对时空数据进行优先排序,进而预测问题状态趋势。采用一种深度多步迭代策略优化方法,获得合理的解。从理论上详细地分析和讨论所提出算法的收敛性和收敛速率。在专利转移数据上的试验结果验证了该方法的优势,并证明通过应用关系转移块和预测增强学习块排序精度能得到明显地改善。相似文献

14.

MEC计算卸载与资源分配联合智能优化方案

杜梅周军华李敦桥陈士钊魏翼飞《北京邮电大学学报》2022,45(2):65-71

移动边缘计算(MEC)中的分布式基站部署、有限的服务器资源和动态变化的终端用户使得计算卸载方案的设计极具挑战。鉴于深度强化学习在处理动态复杂问题方面的优势,设计了最优的计算卸载和资源分配策略,目的是最小化系统能耗。首先考虑了云边端协同的网络框架;然后将联合计算卸载和资源分配问题定义为一个马尔可夫决策过程,提出一种基于多智能体深度确定性策略梯度的学习算法,以最小化系统能耗。仿真结果表明,该算法在降低系统能耗方面的表现明显优于深度确定性策略梯度算法和全部卸载策略。相似文献

15.

Study and application of reinforcement learning based on DAI in cooperative strategy of robot soccer

郭琦张达志杨永田《哈尔滨工业大学学报(英文版)》2009,16(4):513-519

A dynamic cooperation model of multi-agent is established by combining reinforcement learning with distributed artificial intelligence(DAI),in which the concept of individual optimization loses its meaning because of the dependence of repayment on each agent itself and the choice of other agents.Utilizing the idea of DAI,the intellectual unit of each robot and the change of task and environment,each agent can make decisions independently and finish various complicated tasks by communication and reciprocation between each other.The method is superior to other reinforcement learning methods commonly used in the multi-agent system.It can improve the convergence velocity of reinforcement learning,decrease requirements of computer memory,and enhance the capability of computing and logical ratiocinating for agent.The result of a simulated robot soccer match proves that the proposed cooperative strategy is valid. 相似文献

16.

A special hierarchical fuzzy neural-networks based reinforcement learning for multi-variables system

张文志吕恬生《哈尔滨工业大学学报(英文版)》2005,12(6):661-666

Proposes a reinforcement learning scheme based on a special Hierarchical Fuzzy Neural-Networks （HFNN） for solving complicated learning tasks in a continuous multi-variables environment. The output of the previous layer in the HFNN is no longer used as if-part of the next layer, but used only in then-part. Thus it can deal with the difficulty when the output of the previous layer is meaningless or its meaning is uncertain. The proposed HFNN has a minimal number of fuzzy rules and can successfully solve the problem of rules combination explosion and decrease the quantity of computation and memory requirement. In the learning process, two HFNN with the same structure perform fuzzy action composition and evaluation function approximation simultaneously where the parameters of neural-networks are tuned and updated on line by using gradient descent algorithm. The reinforcement learning method is proved to be correct and feasible by simulation of a double inverted pendulum system. 相似文献

17.

增强蚁群算法的机器人最优路径规划 总被引：2，自引：0，他引：2

齐勇魏志强殷波费云瑞于忠达庄晓东《哈尔滨工业大学学报》2009,41(3):130-133

为解决复杂环境中机器人最优路径规划问题,本文结合增强学习和人工势场法的原理,提出一种基于增强势场优化的机器人路径规划方法,引入增强学习思想对人工势场法进行自适应路径规划.再把该规划结果作为先验知识,对蚁群算法进行初始化,提高了蚁群算法的优化效率,同时克服了传统人工势场法的局部极小问题.仿真实验结果表明,该方法在复杂环境中,对机器人的路径规划效果令人满意. 相似文献

18.

基于强化学习的多路口可变车道协同控制方法

徐小高夏莹杰朱思雨邝砾《浙江大学学报(工学版)》2022,56(5):987

为了解决传统的可变导向车道控制方法无法适应多路口场景下的复杂交通流的问题,提出基于多智能体强化学习的多路口可变导向车道协同控制方法来缓解多路口的交通拥堵状况. 该方法对多智能体强化学习 (QMIX)算法进行改进,针对可变导向车道场景下的全局奖励分配问题,将全局奖励分解为基本奖励与绩效奖励,提高了拥堵场景下对车道转向变化的决策准确性. 引入优先级经验回放算法,以提升经验回放池中转移序列的利用效率,加速算法收敛. 实验结果表明,本研究所提出的多路口可变导向车道协同控制方法在排队长度、延误时间和等待时间等指标上的表现优于其他控制方法,能够有效协调可变导向车道的策略切换,提高多路口下路网的通行能力. 相似文献

19.

应用深度强化学习的压边力优化控制

张新艳郭鹏余建波《哈尔滨工业大学学报》2020,52(7):20-28

为改善板料拉深制造的成品质量,采用深度强化学习的方法进行拉深过程的压边力优化控制. 提出一种基于深度强化学习与有限元仿真集成的压边力控制模型,结合深度神经网络的感知能力与强化学习的决策能力,进行压边力控制策略的学习优化. 基于深度强化学习的压边力优化算法,利用深度神经网络处理巨大的状态空间,避免了系统动力学的拟合,并且使用一种新的网络结构来构建策略网络,将压边力策略划分为全局与局部两部分,提高了压边力策略的控制效果. 将压边力的理论知识用于初始化回放经验池,提高了深度强化学习算法在压边力控制任务中的学习效率. 实验结果表明,与传统深度强化学习算法相比,所提出的压边力控制模型能够更有效地进行压边力控制策略优化,成品在内部应力、成品厚度以及材料利用率3个质量评价指标的综合表现优于传统深度强化学习算法. 将深度强化学习中的策略网络划分为线性部分与非线性部分,并结合理论压边力知识来初始化回放经验,能够提高深度强化学习在压边力优化控制中的控制效果,提高算法的学习效率. 相似文献

20.

连续动作强化学习及其在机器人中的应用研究

张健沛王醒策张岩张汝波温丽华《哈尔滨工程大学学报》2000,21(3):78-81

讨论了连续动作的强化学习系统实现及学习方法。首先介绍了连续动作的强化学习系统的组成原理,讨论了采用神经网络实现强化学习系统的方法,然后,介绍了强化学习机制在智能机器人避碰行为学习系统中的应用,并给出了系统的仿真结果。仿真结果表明机器人具有较好的避碰能力。相似文献