首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Technical Update: Least-Squares Temporal Difference Learning   总被引:2,自引:0,他引:2  
Boyan  Justin A. 《Machine Learning》2002,49(2-3):233-246
TD./ is a popular family of algorithms for approximate policy evaluation in large MDPs. TD./ works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1–3, 33–57) eliminates all stepsize parameters and improves data efficiency.This paper updates Bradtke and Barto's work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from = 0 to arbitrary values of ; at the extreme of = 1, the resulting new algorithm is shown to be a practical, incremental formulation of supervised linear regression. Third, it presents a novel and intuitive interpretation of LSTD as a model-based reinforcement learning technique.  相似文献   

2.
This paper presents aut, a modern Automath checker. It is a straightforward re-implementation of the Zandleven Automath checker from the seventies. It was implemented about five years ago, in the programming language C. It accepts both the AUT-68 and AUT-QE dialects of Automath. This program was written to restore a damaged version of Jutting's translation of Landau's Grundlagen. Some notable features: It is fast. On a 1 GHz machine it will check the full Jutting formalization (736 K of nonwhitespace Automath source) in 0.6 seconds. Its implementation of -terms does not use named variables or de Bruijn indices (the two common approaches) but instead uses a graph representation. In this representation variables are represented by pointers to a binder. The program can compile an Automath text into one big Automath single line-style -term. It outputs such a term using de Bruijn indices. (These -terms cannot be checked by modern systems like Coq or Agda, because the -typed -calculi of de Bruijn are different from the -typed -calculi of modern type theory.)The source of aut is freely available on the Web at the address .  相似文献   

3.
A text is a triple=(, 1, 2) such that is a labeling function, and 1 and 2 are linear orders on the domain of ; hence may be seen as a word (, 1) together with an additional linear order 2 on the domain of . The order 2 is used to give to the word (, 1) itsindividual hierarchical representation (syntactic structure) which may be a tree but it may be also more general than a tree. In this paper we introducecontext-free grammars for texts and investigate their basic properties. Since each text has its own individual structure, the role of such a grammar should be that of a definition of a pattern common to all individual texts. This leads to the notion of ashapely context-free text grammar also investigated in this paper.  相似文献   

4.
We present a new definition of optimality intervals for the parametric right-hand side linear programming (parametric RHS LP) Problem () = min{c t x¦Ax =b + ¯b,x 0}. We then show that an optimality interval consists either of a breakpoint or the open interval between two consecutive breakpoints of the continuous piecewise linear convex function (). As a consequence, the optimality intervals form a partition of the closed interval {; ¦()¦ < }. Based on these optimality intervals, we also introduce an algorithm for solving the parametric RHS LP problem which requires an LP solver as a subroutine. If a polynomial-time LP solver is used to implement this subroutine, we obtain a substantial improvement on the complexity of those parametric RHS LP instances which exhibit degeneracy. When the number of breakpoints of () is polynomial in terms of the size of the parametric problem, we show that the latter can be solved in polynomial time.This research was partially funded by the United States Navy-Office of Naval Research under Contract N00014-87-K-0202. Its financial support is gratefully acknowledged.  相似文献   

5.
When interpolating incomplete data, one can choose a parametric model, or opt for a more general approach and use a non-parametric model which allows a very large class of interpolants. A popular non-parametric model for interpolating various types of data is based on regularization, which looks for an interpolant that is both close to the data and also smooth in some sense. Formally, this interpolant is obtained by minimizing an error functional which is the weighted sum of a fidelity term and a smoothness term.The classical approach to regularization is: select optimal weights (also called hyperparameters) that should be assigned to these two terms, and minimize the resulting error functional.However, using only the optimal weights does not guarantee that the chosen function will be optimal in some sense, such as the maximum likelihood criterion, or the minimal square error criterion. For that, we have to consider all possible weights.The approach suggested here is to use the full probability distribution on the space of admissible functions, as opposed to the probability induced by using a single combination of weights. The reason is as follows: the weight actually determines the probability space in which we are working. For a given weight , the probability of a function f is proportional to exp(– f2 uu du) (for the case of a function with one variable). For each different , there is a different solution to the restoration problem; denote it by f. Now, if we had known , it would not be necessary to use all the weights; however, all we are given are some noisy measurements of f, and we do not know the correct . Therefore, the mathematically correct solution is to calculate, for every , the probability that f was sampled from a space whose probability is determined by , and average the different f's weighted by these probabilities. The same argument holds for the noise variance, which is also unknown.Three basic problems are addressed is this work: Computing the MAP estimate, that is, the function f maximizing Pr(f/D) when the data D is given. This problem is reduced to a one-dimensional optimization problem. Computing the MSE estimate. This function is defined at each point x as f(x)Pr(f/D) f. This problem is reduced to computing a one-dimensional integral.In the general setting, the MAP estimate is not equal to the MSE estimate. Computing the pointwise uncertainty associated with the MSE solution. This problem is reduced to computing three one-dimensional integrals.  相似文献   

6.
The postal network is an interconnection network that possesses many desirable properties in networking applications. It includes hypercubes and Fibonacci cubes as its special cases. Basically, the postal network forms a series (with series number ) that is based on the sequence N (n)=N (n–1)+N (n–), where n is the dimension and N (n) represents the number of nodes in an n-dimensional postal network in series . In this paper, we study topological properties of postal networks and relationships between different postal networks. One application of postal networks is also shown in implementing barrier synchronization using a special spanning tree called a postal tree. The postal network can also be considered as a flexible version of the hypercube by relaxing the restriction on the number of nodes, and hence, makes it possible to construct multicomputers with arbitrary sizes.  相似文献   

7.
Learning to Play Chess Using Temporal Differences   总被引:4,自引:0,他引:4  
Baxter  Jonathan  Tridgell  Andrew  Weaver  Lex 《Machine Learning》2000,40(3):243-263
In this paper we present TDLEAF(), a variation on the TD() algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program KnightCap used TDLEAF() to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF() can yield better results in the domain of backgammon, where TD() has previously yielded striking success.  相似文献   

8.
A loss queueing system GI/G/m/0 is considered. Let a(x) be a p.d.f. of interarrival intervals. Assume that this function behaves like cx-1 for small x. Further let B(x) be a d.f. of service time; (1/) be the mean service time. Conditions are derived for the light-traffic insensitivity of the loss probability to the form of B(x) as (/ ) 0. In particular, the condition = 1 is necessary. Estimates for the loss probability are obtained.  相似文献   

9.
We introduce a calculus which is a direct extension of both the and the calculi. We give a simple type system for it, that encompasses both Curry's type inference for the -calculus, and Milner's sorting for the -calculus as particular cases of typing. We observe that the various continuation passing style transformations for -terms, written in our calculus, actually correspond to encodings already given by Milner and others for evaluation strategies of -terms into the -calculus. Furthermore, the associated sortings correspond to well-known double negation translations on types. Finally we provide an adequate CPS transform from our calculus to the -calculus. This shows that the latter may be regarded as an assembly language, while our calculus seems to provide a better programming notation for higher-order concurrency. We conclude by discussing some alternative design decisions.  相似文献   

10.
For compact Euclidean bodiesP, Q, we define (P, Q) to be the smallest ratior/s wherer > 0,s > 0 satisfy . HeresQ denotes a scaling ofQ by the factors, andQ,Q are some translates ofQ. This function gives us a new distance function between bodies which, unlike previously studied measures, is invariant under affine transformations. If homothetic bodies are identified, the logarithm of this function is a metric. (Two bodies arehomothetic if one can be obtained from the other by scaling and translation.)For integerk 3, define (k) to be the minimum value such that for each convex polygonP there exists a convexk-gonQ with (P, Q) (k). Among other results, we prove that 2.118 ... <-(3) 2.25 and (k) = 1 + (k –2). We give anO(n 2 log2 n)-time algorithm which, for any input convexn-gonP, finds a triangleT that minimizes (T, P) among triangles. However, in linear time we can find a trianglet with (t, P)<-2.25.Our study is motivated by the attempt to reduce the complexity of the polygon containment problem, and also the motion-planning problem. In each case we describe algorithms which run faster when certain implicitslackness parameters of the input are bounded away from 1. These algorithms illustrate a new algorithmic paradigm in computational geometry for coping with complexity.Work of all authors was partially supported by the ESPRIT II Basic Research Actions Program of the EC under Contract No. 3075 (project ALCOM). Rudolf Fleischer and Kurt Mehlhorn acknowledge also DFG (Grant SPP Me 620/6). Chee Yap acknowledges also DFG (Grant Be 142/46-1) and NSF (Grants DCR-84-01898 and CCR-87-03458). This research was performed when Günter Rote and Chee Yap were at the Freie Universität Berlin.  相似文献   

11.
Kolmogorov introduced the concept of -entropy to analyze information in classical continuous systems. The fractal dimension of a geometric set was introduced by Mandelbrot as a new criterion to analyze the geometric complexity of the set. The -entropy and the fractal dimension of a state in a general quantum system were introduced by one of the present authors (MO) in order to characterize chaotic properties of general states.In this paper, we show that -entropy of a state includes Kolmogorov's -entropy, and that the fractal dimension of a state describes fractal structure of Gaussian measures.  相似文献   

12.
Q()-learning uses TD()-methods to accelerate Q-learning. The update complexity of previous online Q() implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.  相似文献   

13.
Computing with Contexts   总被引:1,自引:0,他引:1  
We investigate a representation of contexts, expressions with holes in them, that enables them to be meaningfully transformed, in particular -converted and -reduced. In particular we generalize the set of -expressions to include holes, and on these generalized entities define -reduction (i.e., substitution) and filling so that these operations preserve -congruence and commute. The theory is then applied to allow the encoding of reduction systems and operational semantics of call-by-value calculi enriched with control, imperative and concurrent features.  相似文献   

14.
Average-Case Competitive Analyses for Ski-Rental Problems   总被引:7,自引:0,他引:7  
Let s be the ratio of the cost for purchasing skis over the cost for renting them. Then the famous result for the ski-rental problem shows that skiers should buy their skis after renting them (s - 1) times, which gives us an optimal competitive ratio of 2 - 1/s. In practice, however, it appears that many skiers buy their skis before this optimal point of time and also many skiers keep renting them forever. In this paper we show that this behavior of skiers is quite reasonable by using an average-case competitive ratio. For an exponential input distribution f(t) = e-t, optimal strategies are (i) if 1/ \leq s, then skiers should rent their skis forever and (ii) otherwise they should purchase them after renting approximately s2 (相似文献   

15.
We consider a single-server queueing system with a finite buffer, K input Poisson flows of intensities i , and distribution functions B i (x) of service times for calls of the ith type, . If the buffer is overflowed, an arriving call is sent to the orbit and becomes a repeat call. In a random time, which has exponential distribution, the call makes an attempt to reenter the buffer or server, if the latter is free. The maximum number of calls in the orbit is limited; if the orbit is overflowed, an arriving call is lost. We find the relation between steady-state distributions of this system and a system with one Poisson flow of intensity where type i of a call is chosen with probability i / at the beginning of its service. A numerical example is given.  相似文献   

16.
We provide a very simple model of a reflective facility based onthe pure -calculus, and we show that its theory of contextual equivalenceis trivial: two terms in the language are contextually equivalent iff theyare -congruent.  相似文献   

17.
In this paper, we investigate the numerical solution of a model equation u xx = exp(– ) (and several slightly more general problems) when 1 using the standard central difference scheme on nonuniform grids. In particular, we are interested in the error behaviour in two limiting cases: (i) the total mesh point number N is fixed when the regularization parameter 0, and (ii) is fixed when N. Using a formal analysis, we show that a generalized version of a special piecewise uniform mesh 12 and an adaptive grid based on the equidistribution principle share some common features. And the optimal meshes give rates of convergence bounded by |log()| as 0 and N is given, which are shown to be sharp by numerical tests.  相似文献   

18.
Reasoning about programs in continuation-passing style   总被引:6,自引:0,他引:6  
Plotkin's v -calculus for call-by-value programs is weaker than the -calculus for the same programs in continuation-passing style (CPS). To identify the call-by-value axioms that correspond to on CPS terms, we define a new CPS transformation and an inverse mapping, both of which are interesting in their own right. Using the new CPS transformation, we determine the precise language of CPS terms closed under -transformations, as well as the call-by-value axioms that correspond to the so-called administrative -reductions on CPS terms. Using the inverse mapping, we map the remaining and equalities on CPS terms to axioms on call-by-value terms. On the pure (constant free) set of -terms, the resulting set of axioms is equivalent to Moggi's computational -calculus. If the call-by-value language includes the control operatorsabort andcall-with-current-continuation, the axioms are equivalent to an extension of Felleisenet al.'s v -C-calculus and to the equational subtheory of Talcott's logic IOCC.This article is a revised and extended version of the conference paper with the same title [42]. The technical report of the same title contains additional material.The authors were supported in part by NSF grant CCR 89-17022 and by Texas ATP grant 91-003604014.  相似文献   

19.
This paper is an informal introduction to the theory of types which use a connective for the intersection of two types and a constant for a universal type, besides the usual connective for function-types. This theory was first devised in about 1977 by Coppo, Dezani and Sallé in the context of-calculus and its main development has been by Coppo and Dezani and their collaborators in Turin. With suitable axioms and rules to assign types to-calculus terms, they obtained a system in which (i) the set of types given to a term does not change under-conversion, (ii) some interesting sets of terms, for example the solvable terms and the terms with normal form, can be characterised exactly by the types of their members, and (iii) the type-apparatus is not so complex as polymorphic systems with quantifier-containing types and therefore probably not so expensive to implement mechanically as these systems.There are in fact several variant systems with different detailed properties. This paper defines and motivates the simplest one from which the others are derived, and describes its most basic properties. No proofs are given but the motivation is shown by examples. A comprehensive bibliography is included.  相似文献   

20.
A dialectical model of assessing conflicting arguments in legal reasoning   总被引:2,自引:2,他引:0  
Inspired by legal reasoning, this paper presents a formal framework for assessing conflicting arguments. Its use is illustrated with applications to realistic legal examples, and the potential for implementation is discussed. The framework has the form of a logical system for defeasible argumentation. Its language, which is of a logic-programming-like nature, has both weak and explicit negation, and conflicts between arguments are decided with the help of priorities on the rules. An important feature of the system is that these priorities are not fixed, but are themselves defeasibly derived as conclusions within the system. Thus debates on the choice between conflicting arguments can also be modelled.The proof theory of the system is stated in dialectical style, where a proof takes the form of a dialogue between a proponent and an opponent of an argument. An argument is shown to be justified if the proponent can make the opponent run out of moves in whatever way the opponent attacks. Despite this dialectical form, the system reflects a declarative, or relational approach to modelling legal argument. A basic assumption of this paper is that this approach complements two other lines of research in AI and Law, investigations of precedent-based reasoning and the development of procedural, or dialectical models of legal argument.Supported by a research fellowship of the Royal Netherlands Academy of Arts and Sciences, and by Esprit WG 8319 Modelage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号