首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 279 毫秒
1.
多智能体系统在自动驾驶、智能物流、医疗协同等多个领域中广泛应用,然而由于技术进步和系统需求的增加,这些系统面临着规模庞大、复杂度高等挑战,常出现训练效率低和适应能力差等问题。为了解决这些问题,将基于梯度的元学习方法扩展到多智能体深度强化学习中,提出一种名为多智能体一阶元近端策略优化(MAMPPO)方法,用于学习多智能体系统的初始模型参数,从而为提高多智能体深度强化学习的性能提供新的视角。该方法充分利用多智能体强化学习过程中的经验数据,通过反复适应找到在梯度下降方向上最敏感的参数并学习初始参数,使模型训练从最佳起点开始,有效提高了联合策略的决策效率,显著加快了策略变化的速度,面对新情况的适应速度显著加快。在星际争霸II上的实验结果表明,MAMPPO方法显著提高了训练速度和适应能力,为后续提高多智能强化学习的训练效率和适应能力提供了一种新的解决方法。  相似文献   

2.
多智能体深度强化学习的若干关键科学问题   总被引:6,自引:0,他引:6  
孙长银  穆朝絮 《自动化学报》2020,46(7):1301-1312
强化学习作为一种用于解决无模型序列决策问题的方法已经有数十年的历史, 但强化学习方法在处理高维变量问题时常常会面临巨大挑战. 近年来, 深度学习迅猛发展, 使得强化学习方法为复杂高维的多智能体系统提供优化的决策策略、在充满挑战的环境中高效执行目标任务成为可能. 本文综述了强化学习和深度强化学习方法的原理, 提出学习系统的闭环控制框架, 分析了多智能体深度强化学习中存在的若干重要问题和解决方法, 包括多智能体强化学习的算法结构、环境非静态和部分可观性等问题, 对所调查方法的优缺点和相关应用进行分析和讨论. 最后提供多智能体深度强化学习未来的研究方向, 为开发更强大、更易应用的多智能体强化学习控制系统提供一些思路.  相似文献   

3.
针对当前强化学习算法在无人机升空平台路径规划任务中样本效率低、算法鲁棒性较差的问题,提出一种基于模型的内在奖励强化学习算法。采用并行架构将数据收集操作和策略更新操作完全解耦,提升算法学习效率,并运用内在奖励的方法提高智能体对环境的探索效率,避免收敛到次优策略。在策略学习过程中,智能体针对模拟环境的动态模型进行学习,从而在有限步内更好地预测状态、奖励等信息。在此基础上,通过结合有限步的规划计算以及神经网络的预测,提升价值函数的预测精准度,以利用较少的经验数据完成智能体的训练。实验结果表明,相比同样架构的无模型强化学习算法,该算法达到相同训练水平所需的经验数据量减少近600幕数据,样本效率和算法鲁棒性都有大幅提升,相比传统的非强化学习启发类算法,分数提升接近8 000分,与MVE等主流的基于模型的强化学习算法相比,平均分数可以提升接近2 000分,且在样本效率和稳定性上都有明显提高。  相似文献   

4.
针对现有多信道接入策略较难适应信道环境动态性问题,提出基于深度强化学习的多信道智能接入方法.首先,通过将多信道接入模型描述成马尔可夫决策过程,提出Q-learning方法以实现多信道的智能接入.在此基础上,针对Q-learn-ing状态空间大和收敛慢等特点,通过设计深度神经网络,以获得近似最优的多信道智能接入策略.最后,通过搭建NS3仿真平台,以验证本文提出多信道智能接入方法的性能.仿真结果表明,提出的基于深度强化学习多信道智能接入方法,较之现有强化学习方法,能够在动态的多信道环境中,以较快收敛速度获得更优的接入性能.  相似文献   

5.
投资组合问题是量化交易领域中的热点问题。针对现有基于深度强化学习的投资组合模型无法实现自适应的交易策略和有效利用有监督信息的缺陷,提出一种集成的深度强化学习投资组合模型(IDRLPM)。首先,采用多智能体方法构造多个基智能体并设计不同交易风格的奖励函数,以表示不同的交易策略;其次,利用集成学习方法对基智能体的策略网络进行特征融合,得到自适应市场环境的集成智能体;然后,在集成智能体中嵌入基于卷积块注意力模块(CBAM)的趋势预测网络,趋势预测网络输出引导集成策略网络自适应选择交易比重;最后,在有监督深度学习和强化学习交替迭代训练下,IDRLPM有效利用训练数据中的监督信息以增强模型盈利能力。在上证50的成分股和中证500的成分股数据集中,IDRLPM的夏普比率(SR)达到了1.87和1.88,累计收益(CR)达到了2.02和1.34;相较于集合式的深度强化学习(EDRL)交易模型,SR提高了105%和55%,CR提高了124%和79%。实验结果表明,IDRLPM能够有效解决投资组合问题。  相似文献   

6.
多智能体深度强化学习研究综述   总被引:1,自引:0,他引:1       下载免费PDF全文
多智能体深度强化学习是机器学习领域的一个新兴的研究热点和应用方向,涵盖众多算法、规则、框架,并广泛应用于自动驾驶、能源分配、编队控制、航迹规划、路由规划、社会难题等现实领域,具有极高的研究价值和意义。对多智能体深度强化学习的基本理论、发展历程进行简要的概念介绍;按照无关联型、通信规则型、互相合作型和建模学习型4种分类方式阐述了现有的经典算法;对多智能体深度强化学习算法的实际应用进行了综述,并简单罗列了多智能体深度强化学习的现有测试平台;总结了多智能体深度强化学习在理论、算法和应用方面面临的挑战和未来的发展方向。  相似文献   

7.
多智能体深度强化学习方法可应用于真实世界中需要多方协作的场景,是强化学习领域内的研究热点。在多目标多智能体合作场景中,各智能体之间具有复杂的合作与竞争并存的混合关系,在这些场景中应用多智能体强化学习方法时,其性能取决于该方法是否能够充分地衡量各智能体之间的关系、区分合作和竞争动作,同时也需要解决高维数据的处理以及算法效率等应用难点。针对多目标多智能体合作场景,在QMIX模型的基础上提出一种基于目标的值分解深度强化学习方法,并使用注意力机制衡量智能体之间的群体影响力,利用智能体的目标信息实现量两阶段的值分解,提升对复杂智能体关系的刻画能力,从而提高强化学习方法在多目标多智能体合作场景中的性能。实验结果表明,相比QMIX模型,该方法在星际争霸2微观操控平台上的得分与其持平,在棋盘游戏中得分平均高出4.9分,在多粒子运动环境merge和cross中得分分别平均高出25分和280.4分,且相较于主流深度强化学习方法也具有更高的得分与更好的性能表现。  相似文献   

8.
强化学习(Reinforcement Learning,RL)作为机器学习领域中与监督学习、无监督学习并列的第三种学习范式,通过与环境进行交互来学习,最终将累积收益最大化。常用的强化学习算法分为模型化强化学习(Model-based Reinforcement Lear-ning)和无模型强化学习(Model-free Reinforcement Learning)。模型化强化学习需要根据真实环境的状态转移数据来预定义环境动态模型,随后在通过环境动态模型进行策略学习的过程中无须再与环境进行交互。在无模型强化学习中,智能体通过与环境进行实时交互来学习最优策略,该方法在实际任务中具有更好的通用性,因此应用范围更广。文中对无模型强化学习的最新研究进展与发展动态进行了综述。首先介绍了强化学习、模型化强化学习和无模型强化学习的基础理论;然后基于价值函数和策略函数归纳总结了无模型强化学习的经典算法及各自的优缺点;最后概述了无模型强化学习在游戏AI、化学材料设计、自然语言处理和机器人控制领域的最新研究现状,并对无模型强化学习的未来发展趋势进行了展望。  相似文献   

9.
作为机器学习和人工智能领域的一个重要分支,多智能体分层强化学习以一种通用的形式将多智能体的协作能力与强化学习的决策能力相结合,并通过将复杂的强化学习问题分解成若干个子问题并分别解决,可以有效解决空间维数灾难问题。这也使得多智能体分层强化学习成为解决大规模复杂背景下智能决策问题的一种潜在途径。首先对多智能体分层强化学习中涉及的主要技术进行阐述,包括强化学习、半马尔可夫决策过程和多智能体强化学习;然后基于分层的角度,对基于选项、基于分层抽象机、基于值函数分解和基于端到端等4种多智能体分层强化学习方法的算法原理和研究现状进行了综述;最后介绍了多智能体分层强化学习在机器人控制、博弈决策以及任务规划等领域的应用现状。  相似文献   

10.
闫超  相晓嘉  徐昕  王菖  周晗  沈林成 《控制与决策》2022,37(12):3083-3102
得益于深度学习强大的特征表达能力和强化学习有效的策略学习能力,深度强化学习在一系列复杂序贯决策问题中取得了令人瞩目的成就.伴随着深度强化学习在诸多单智能体任务中的成功应用,其在多智能体系统中的研究方兴未艾.近年来,多智能体深度强化学习在人工智能领域备受关注,可扩展与可迁移性已成为其中的核心研究点之一.鉴于此,首先阐释深度强化学习的发展脉络和典型算法,介绍多智能体深度强化学习的3种学习范式,分析两类多智能体强化学习的典型算法,即分解值函数方法和中心化值函数方法;然后归纳注意力机制、图神经网络等6类具有可扩展性的多智能体深度强化学习模型,梳理迁移学习和课程学习在多智能体深度强化学习可迁移性方向的研究进展;最后讨论多智能体深度强化学习的应用前景与研究方向,为未来多智能体深度强化学习的进一步发展提供可借鉴的参考.  相似文献   

11.
Forecasting the direction of the daily changes of stock indices is an important yet difficult task for market participants. Advances on data mining and machine learning make it possible to develop more accurate predictions to assist investment decision making. This paper attempts to develop a learning architecture LR2GBDT for forecasting and trading stock indices, mainly by cascading the logistic regression (LR) model onto the gradient boosted decision trees (GBDT) model. Without any assumption on the underlying data generating process, raw price data and twelve technical indicators are employed for extracting the information contained in the stock indices. The proposed architecture is evaluated by comparing the experimental results with the LR, GBDT, SVM (support vector machine), NN (neural network) and TPOT (tree-based pipeline optimization tool) models on three stock indices data of two different stock markets, which are an emerging market (Shanghai Stock Exchange Composite Index) and a mature stock market (Nasdaq Composite Index and S&P 500 Composite Stock Price Index). Given the same test conditions, the cascaded model not only outperforms the other models, but also shows statistically and economically significant improvements for exploiting simple trading strategies, even when transaction cost is taken into account.  相似文献   

12.

The prediction of stock price movement direction is significant in financial circles and academic. Stock price contains complex, incomplete, and fuzzy information which makes it an extremely difficult task to predict its development trend. Predicting and analysing financial data is a nonlinear, time-dependent problem. With rapid development in machine learning and deep learning, this task can be performed more effectively by a purposely designed network. This paper aims to improve prediction accuracy and minimizing forecasting error loss through deep learning architecture by using Generative Adversarial Networks. It was proposed a generic model consisting of Phase-space Reconstruction (PSR) method for reconstructing price series and Generative Adversarial Network (GAN) which is a combination of two neural networks which are Long Short-Term Memory (LSTM) as Generative model and Convolutional Neural Network (CNN) as Discriminative model for adversarial training to forecast the stock market. LSTM will generate new instances based on historical basic indicators information and then CNN will estimate whether the data is predicted by LSTM or is real. It was found that the Generative Adversarial Network (GAN) has performed well on the enhanced root mean square error to LSTM, as it was 4.35% more accurate in predicting the direction and reduced processing time and RMSE by 78 s and 0.029, respectively. This study provides a better result in the accuracy of the stock index. It seems that the proposed system concentrates on minimizing the root mean square error and processing time and improving the direction prediction accuracy, and provides a better result in the accuracy of the stock index.

  相似文献   

13.
由于股票市场存在人为扰动性,使得基于情绪的股市预测算法效果不佳。针对股市的诱多诱空问题,提出一种基于理性指标的马尔可夫链股市态势预测算法(RI_MCA)。提取股市的主要理性特征,并对这些理性特征进行量化;通过主成分分析将这些理性特征融合成理性指标,并利用理性指标获取股市的买卖点;将买卖点所对应的股市状态引入到马尔可夫链中,实现股市态势预测。在理性指标和股市状态相背离情况下会降低买卖点的可靠性,因而通过将特征背离引入到RI_MCA算法中提出了RICD_MCA算法,RICD_MCA算法根据特征背离程度对RI_MCA算法的结果进行调整优化。在上证指数上的实验比较与分析结果表明,RICD_MCA算法具有更高的预测精度。  相似文献   

14.
Ma  Chi  Liang  Yan  Wang  Shaofan  Lu  Shengliang 《Multimedia Tools and Applications》2022,81(9):12599-12617

Stock linkage refers to the correlation or similar performance of two or more stocks in the stock market. The quantification of stock linkage relationship is the trend and difficulty of research in recent years. The study of stock linkage can dig out the potential relationship between stocks at a deeper level. At present, the existing research often only studies the linkage phenomenon from the perspective of the correlation or similarity of stock movement, and there is no unified and standard numerical index to effectively describe the degree of linkage phenomenon, which greatly hinders the progress of research. Aiming at the problem that it is difficult to quantify the phenomenon of stock linkage, we analyze the correlation and morphological similarity of time series, and propose the combination of correlation coefficient and time weighted distance as the numerical expression of stock linkage for the first time, so as to realize the quantification of stock linkage. In addition, the parallel network structure of LSTM model is designed, and the automatic noise reduction encoder and wavelet transform module are added as the noise reduction processing layer, which effectively improves the prediction performance of LSTM model for stock market linkage numerical time series. Three different types of comparative experiments based on 2.309 million stock market sequences show that the proposed optimized LSTM model has more accurate prediction effect, and its RMSE error is 18.68% lower than the compared DB-LSTM model and 46.38% lower than SDAE-LSTM model.

  相似文献   

15.
In the stock market, technical analysis is a useful method for predicting stock prices. Although, professional stock analysts and fund managers usually make subjective judgments, based on objective technical indicators, it is difficult for non-professionals to apply this forecasting technique because there are too many complex technical indicators to be considered. Moreover, two drawbacks have been found in many of the past forecasting models: (1) statistical assumptions about variables are required for time series models, such as the autoregressive moving average model (ARMA) and the autoregressive conditional heteroscedasticity (ARCH), to produce forecasting models of mathematical equations, and these are not easily understood by stock investors; and (2) the rules mined from some artificial intelligence (AI) algorithms, such as neural networks (NN), are not easily realized.In order to overcome these drawbacks, this paper proposes a hybrid forecasting model, using multi-technical indicators to predict stock price trends. Further, it includes four proposed procedures in the hybrid model to provide efficient rules for forecasting, which are evolved from the extracted rules with high support value, by using the toolset based on rough sets theory (RST): (1) select the essential technical indicators, which are highly related to the future stock price, from the popular indicators based on a correlation matrix; (2) use the cumulative probability distribution approach (CDPA) and minimize the entropy principle approach (MEPA) to partition technical indicator value and daily price fluctuation into linguistic values, based on the characteristics of the data distribution; (3) employ a RST algorithm to extract linguistic rules from the linguistic technical indicator dataset; and (4) utilize genetic algorithms (GAs) to refine the extracted rules to get better forecasting accuracy and stock return. The effectiveness of the proposed model is verified with two types of performance evaluations, accuracy and stock return, and by using a six-year period of the TAIEX (Taiwan Stock Exchange Capitalization Weighted Stock Index) as the experiment dataset. The experimental results show that the proposed model is superior to the two listed forecasting models (RST and GAs) in terms of accuracy, and the stock return evaluations have revealed that the profits produced by the proposed model are higher than the three listed models (Buy-and-Hold, RST and GAs).  相似文献   

16.
Stock market is considered chaotic, complex, volatile and dynamic. Undoubtedly, its prediction is one of the most challenging tasks in time series forecasting. Moreover existing Artificial Neural Network (ANN) approaches fail to provide encouraging results. Meanwhile advances in machine learning have presented favourable results for speech recognition, image classification and language processing. Methods applied in digital signal processing can be applied to stock data as both are time series. Similarly, learning outcome of this paper can be applied to speech time series data. Deep learning for stock prediction has been introduced in this paper and its performance is evaluated on Google stock price multimedia data (chart) from NASDAQ. The objective of this paper is to demonstrate that deep learning can improve stock market forecasting accuracy. For this, (2D)2PCA + Deep Neural Network (DNN) method is compared with state of the art method 2-Directional 2-Dimensional Principal Component Analysis (2D)2PCA + Radial Basis Function Neural Network (RBFNN). It is found that the proposed method is performing better than the existing method RBFNN with an improved accuracy of 4.8% for Hit Rate with a window size of 20. Also the results of the proposed model are compared with the Recurrent Neural Network (RNN) and it is found that the accuracy for Hit Rate is improved by 15.6%. The correlation coefficient between the actual and predicted return for DNN is 17.1% more than RBFNN and it is 43.4% better than RNN.  相似文献   

17.
股票市场不仅是上市公司的重要融资渠道,也是重要的投资市场,股票预测一直受到人们的关注。为了充分利用来自不同股票价格的信息,提高股票的预测效果,提出一种多尺度股票价格预测模型TL-EMD-LSTM-MA(TELM)。TELM模型通过经验模态分解将收盘价分解为多个时间尺度分量,不同时间尺度分量震荡频率不同,反映了不同的周期性信息;根据分量的震荡频率选择不同方法进行预测,高频分量利用深度迁移学习的方法训练堆叠LSTM,低频分量利用移动平均法进行预测;将所有分量的预测值相加作为收盘价的最终预测输出。通过深度迁移学习训练的堆叠LSTM,包含来自不同股票的信息,具备更多行业或市场的知识,能有效降低预测误差。利用移动平均法预测低频分量,更有效捕获股票的总体趋势。对中国A股市场内500支股票以及上证指数、深证成指等指数进行预测,结果表明,与其他模型相比,TELM预测误差最低,拟合优度最高。根据TELM预测的股票收盘价模拟股票交易过程,结果表明TELM投资风险低、收益高。  相似文献   

18.
股价预测一直是投资者在股票市场中关注的焦点.近年来,深度学习技术在这一领域得到广泛应用.在融合卷积神经网络(CNN)和长短时记忆网络(LSTM),构建CNN-LSTM模型的基础上,引入多向延迟嵌入的张量处理技术MDT(mutiway-delay-embedding),对每日股票因子向量进行因子重构,生成汉克尔矩阵,按时...  相似文献   

19.
股市是金融市场的重要组成部分,对股票价格预测有着重要的意义.同时,深度学习具有强大的数据处理能力,可以解决金融时间序列的复杂性所带来的问题.对此,本文提出一种结合自注意力机制的混合神经网络模型(ATLG).该模型由长短期记忆网络(LSTM)、门控递归单元(GRU)、自注意力机制构建而成,用于对股票价格的预测.实验结果表明:(1)与LSTM、GRU、RNN-LSTM、RNN-GRU等模型相比, ATLG模型的准确率更高;(2)引入自注意力机制使模型更能聚焦于重要时间点的股票特征信息;(3)通过对比,双层神经网络起到的效果更为明显.(4)通过MACD (moving average convergence and divergence)指标进行回测检验,获得了53%的收益,高于同期沪深300的收益.结果证明了该模型在股票价格预测中的有效性和实用性.  相似文献   

20.
Stock market forecasting has been a challenging financial research topic for decades. In the literature, there are numerous results based on point methods. However, poor forecasting quality has been a continuous problem. Motivated by the fact that financial data varies within intervals, we apply interval methods on a well known stock pricing model [3] to predict stock market variability as intervals. Empirical results obtained with a few different approaches in this paper consistently suggest that interval forecasts have better overall quality than traditional point forecasts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号