满足匹配律的策略参数搜索决策模型 A stochastic policy search model for matching behavior期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

满足匹配律的策略参数搜索决策模型

引用本文：	程振波,张宇,邓志东.满足匹配律的策略参数搜索决策模型[J].中国科学：信息科学,2012(1):83-98.

作者姓名：	程振波张宇邓志东

作者单位：	清华信息科学与技术国家实验室(筹)智能技术与国家重点实验室清华大学计算机系;浙江工业大学计算机科学与技术学院

基金项目：	国家自然科学基金(批准号:61005085,60775040,90820305)资助项目

摘要：	匹配律是决策理论的基本定律之一,它建立了对备选目标的偏好与所获奖励之间的对应关系.通过构建获得匹配律的策略模型,研究了该定律成立的可能机制.基于再励学习理论,提出了通过调整策略参数以满足决策目标的策略搜索模型.在该策略模型的基础上,通过设定简单的假设条件推导出满足匹配律的策略算法.理论分析和数值仿真结果均验证了算法的正确性.另一方面利用该算法模拟了经典的心理学与神经生理学的匹配行为实验.研究结果不仅对匹配行为给出了合理的解释,也为建立基于奖励的决策模型提供了一种有效的理论建模方法.
关键词：	策略模型匹配律再励学习决策模型神经回路
A stochastic policy search model for matching behavior

CHENG ZhenBo,ZHANG Yu,& DENG ZhiDong.A stochastic policy search model for matching behavior[J].Scientia Sinica Informationis,2012(1):83-98.

Authors:	CHENG ZhenBo ZHANG Yu & DENG ZhiDong

Affiliation:	1 State Key Laboratory on Intelligent Technology and Systems,Tsinghua National Laboratory for Information Science and Technology,Department of Computer Science,Tsinghua University,Beijing 100084,China;2 Department of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310014,China

Abstract:	The matching law is one of the basic empirical laws in decision theory,and it states that a subject’s preference to optional targets depends on which choices are reinforced.In this paper,we study the possible mechanisms that explain why subjects’ decisions often obey this law.On the basis of reinforcement learning theory,we put forward a decision-making model in which the policy is updated by a policy parameter,and the model might be implemented in the brain through the prefrontal cortex and the basal ganglia neural circuit.Based on this model,an algorithm that satisfies the matching law is derived under some simple assumptions.Theoretical analysis and simulation results show that the decision behavior achieved by the algorithm obeys the matching law.In addition,the matching behaviors in two classical experiments are reproduced using the algorithm.Our results provide a reasonable strategy for the matching law and a useful computational tool for rewarded decision-making tasks.

Keywords:	policy model matching law reinforcement learning decision-making model neural circuit
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏