期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

全文获取类型

收费全文	213篇
免费	41篇
国内免费	60篇

学科分类

工业技术

314篇

出版年

2024年	4篇
2023年	11篇
2022年	28篇
2021年	25篇
2020年	25篇
2019年	11篇
2018年	7篇
2017年	10篇
2016年	6篇
2015年	8篇
2014年	14篇
2013年	13篇
2012年	14篇
2011年	21篇
2010年	15篇
2009年	17篇
2008年	18篇
2007年	12篇
2006年	11篇
2005年	7篇
2004年	4篇
2003年	6篇
2002年	7篇
2001年	4篇
2000年	1篇
1999年	4篇
1998年	5篇
1997年	2篇
1996年	2篇
1994年	2篇

排序方式： 共有314条查询结果，搜索用时 15 毫秒

1 [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] 下一页 » 末页»

基于Q-learning的不确定环境BDI Agent最优策略规划研究

万谦刘玮徐龙龙郭竞知《计算机工程与科学》2019,41(1):166-172

BDI模型能够很好地解决在特定环境下的Agent的推理和决策问题,但在动态和不确定环境下缺少决策和学习的能力。强化学习解决了Agent在未知环境下的决策问题,却缺少BDI模型中的规则描述和逻辑推理。针对BDI在未知和动态环境下的策略规划问题,提出基于强化学习Q-learning算法来实现BDI Agent学习和规划的方法,并针对BDI的实现模型ASL的决策机制做出了改进,最后在ASL的仿真平台Jason上建立了迷宫的仿真,仿真实验表明,在加入Q-learning学习机制后的新的ASL系统中,Agent在不确定环境下依然可以完成任务。相似文献

基于博弈论及Q学习的多Agent协作追捕算法

郑延斌樊文鑫韩梦云陶雪丽《计算机应用》2020,40(6):1613-1620

多Agent协作追捕问题是多Agent协调与协作研究中的一个典型问题。针对具有学习能力的单逃跑者追捕问题，提出了一种基于博弈论及Q学习的多Agent协作追捕算法。首先,建立协作追捕团队，并构建协作追捕的博弈模型；其次,通过对逃跑者策略选择的学习，建立逃跑者有限的Step-T累积奖赏的运动轨迹，并把运动轨迹调整到追捕者的策略集中；最后,求解协作追捕博弈得到Nash均衡解，每个Agent执行均衡策略完成追捕任务。同时,针对在求解中可能存在多个均衡解的问题，加入了虚拟行动行为选择算法来选择最优的均衡策略。C#仿真实验表明，所提算法能够有效地解决障碍环境中单个具有学习能力的逃跑者的追捕问题，实验数据对比分析表明该算法在同等条件下的追捕效率要优于纯博弈或纯学习的追捕算法。相似文献

Self-learning energy management for plug-in hybrid electric bus considering expert experience and generalization performance

Hongqiang Guo Fengrui Zhao Hongliang Guo Qinghu Cui Erlei Du Kun Zhang 《国际能源研究杂志》2020,44(7):5659-5674

A self-learning energy management is proposed for plug-in hybrid electric bus, by combining Q-Learning (QL) and Pontryagin's minimum principle algorithms. Different from the existing strategies, the expert experience and generalization performance are focused in the proposed strategy. The expert experience is designed as the approximately optimal reference state-of-charge (SOC) trajectories, and the generalization performance is enhanced by a multiply driving cycle training method. In specific, an efficient zone of SOC is firstly designed based on the approximately optimal reference SOC trajectories. Then, the agent of the QL is trained off-line by taking the expert experience as reference SOC trajectories. Finally, an adaptive strategy is proposed based on the well-trained agent. Specially, two different reward functions are defined. That is, the reward function in the off-line training mainly considers the tracking performance between the expert experience and the SOC, while mainly considering the punishment in the adaptive strategy. Simulation results show that the proposed strategy has good generalization performance and can improve the fuel economy by 22.49%, compared to a charge depleting-charge sustaining (CDCS) strategy. 相似文献

Path Selection in Disaster Response Management Based on Q-learning

Zhao-Pin Su Jian-Guo Jiang Chang-Yong Liang Guo-Fu Zhang 《Canadian Metallurgical Quarterly》2011,8(1)

Suitable rescue path selection is very important to rescue lives and reduce the loss of disasters,and has been a key issue in the field of disaster response management.In this paper,we present a path selection algorithm based on Q-learning for disaster response applications.We assume that a rescue team is an agent,which is operating in a dynamic and dangerous environment and needs to find a safe and short path in the least time.We first propose a path selection model for disaster response management,and deduce that path selection based on our model is a Markov decision process.Then,we introduce Q-learning and design strategies for action selection and to avoid cyclic path.Finally,experimental results show that our algorithm can find a safe and short path in the dynamic and dangerous environment,which can provide a specific and significant reference for practical management in disaster response applications. 相似文献

基于Q学习的互联电网动态最优CPS控制 总被引：3，自引：1，他引：2

余涛周斌陈家荣《中国电机工程学报》2009,29(19):13-19

控制性能标准(control performance standard，CPS)下互联电网自动发电控制(automatic generation control，AGC)系统是一个典型的不确定随机系统，应用基于马尔可夫决策过程(Markov decision process，MDP)理论的Q学习算法可有效地实现控制策略的在线学习和动态优化决策。将CPS值作为包含AGC的电力系统“环境”所给的“奖励”，依靠Q值函数与CPS控制动作形成的闭环反馈结构进行交互式学习，学习目标为使CPS动作从环境中获得的长期积累奖励值最大。提出一种实用的半监督群体预学习方法，解决了Q学习控制器在预学习试错阶段的系统镇定和快速收敛问题。仿真研究表明，引入基于Q学习的CPS控制可显著增强整个AGC系统的鲁棒性和适应性，有效提高了CPS的考核合格率。相似文献

Data-Based Optimal Tracking of Autonomous Nonlinear Switching Systems

下载免费PDF全文

Xiaofeng Li Lu Dong Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》2021,8(1):227-238

In this paper, a data-based scheme is proposed to solve the optimal tracking problem of autonomous nonlinear switching systems. The system state is forced to track the reference signal by minimizing the performance function. First, the problem is transformed to solve the corresponding Bellman optimality equation in terms of the Q-function (also named as action value function). Then, an iterative algorithm based on adaptive dynamic programming (ADP) is developed to find the optimal solution which is totally based on sampled data. The linear-in-parameter (LIP) neural network is taken as the value function approximator. Considering the presence of approximation error at each iteration step, the generated approximated value function sequence is proved to be boundedness around the exact optimal solution under some verifiable assumptions. Moreover, the effect that the learning process will be terminated after a finite number of iterations is investigated in this paper. A sufficient condition for asymptotically stability of the tracking error is derived. Finally, the effectiveness of the algorithm is demonstrated with three simulation examples. 相似文献

基于RDC-Q学习算法的移动机器人路径规划

王子强武继刚《计算机工程》2014,(6):211-214

传统Q算法对于机器人回报函数的定义较为宽泛,导致机器人的学习效率不高。为解决该问题,给出一种回报详细分类Q(RDC-Q)学习算法。综合机器人各个传感器的返回值,依据机器人距离障碍物的远近把机器人的状态划分为20个奖励状态和15个惩罚状态,对机器人每个时刻所获得的回报值按其状态的安全等级分类,使机器人趋向于安全等级更高的状态,从而帮助机器人更快更好地学习。通过在一个障碍物密集的环境中进行仿真实验,证明该算法收敛速度相对传统回报Q算法有明显提高。相似文献

基于Q学习的DDoS攻防博弈模型研究 总被引：1，自引：0，他引：1

史云放武东英刘胜利高翔《计算机科学》2014,41(11):203-207,226

新形势下的DDoS攻防博弈过程和以往不同,因此利用现有的方法无法有效地评估量化攻防双方的收益以及动态调整博弈策略以实现收益最大化。针对这一问题,设计了一种基于Q学习的DDoS攻防博弈模型,并在此基础上提出了模型算法。首先,通过网络熵评估量化方法计算攻防双方收益;其次,利用矩阵博弈研究单个DDoS攻击阶段的攻防博弈过程;最后,将Q学习引入博弈过程,提出了模型算法,用以根据学习效果动态调整攻防策略从而实现收益最大化。实验结果表明,采用模型算法的防御方能够获得更高的收益,从而证明了算法的可用性和有效性。相似文献

基于状态聚类的多站点CSPS系统的协同控制方法

唐昊裴荣周雷谭琦《自动化学报》2014,40(5):901-908

单站点传送带给料加工站（Conveyor-serviced production station,CSPS）系统中,可运用强化学习对状态——行动空间进行有效探索,以搜索近似最优的前视距离控制策略.但是多站点CSPS系统的协同控制问题中,系统状态空间的大小会随着站点个数的增加和缓存库容量的增加而成指数形式（或几何级数）增长,从而导致维数灾,影响学习算法的收敛速度和优化效果.为此,本文在站点局域信息交互机制的基础上引入状态聚类的方法,以减小每个站点学习空间的大小和复杂性.首先,将多个站点看作相对独立的学习主体,且各自仅考虑邻近下游站点的缓存库的状态并纳入其性能值学习过程;其次,将原状态空间划分成多个不相交的子集,每个子集用一个抽象状态表示,然后,建立基于状态聚类的多站点反馈式Q学习算法.通过该方法,可在抽象状态空间上对各站点的前视距离策略进行优化学习,以寻求整个系统的生产率最大.仿真实验结果说明,与一般的多站点反馈式Q学习方法相比,基于状态聚类的多站点反馈式Q学习方法不仅具有收敛速度快的优点,而且还在一定程度上提高了系统生产率. 相似文献

10.

集装箱码头集卡调度模型与Q学习算法 总被引：1，自引：0，他引：1

曾庆成杨忠振《哈尔滨工程大学学报》2008,29(1):1-4

研究集装箱码头装卸过程中集卡调度问题,建立了集卡调度动态模型,目的是使装卸桥等待时间最小.设计了基于Q学习算法的求解方法,获得在不同状态下的集卡调度策略.提出了应用Q学习算法求解集卡最优调度时系统状态、动作规则、学习步长与折扣因子的选择方法.结果表明,随着集卡数量的增加,Q学习算法获得的结果优于最长等待时间、最远距离、固定分配集卡等调度策略. 相似文献

1 [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] 下一页 » 末页»