期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Quantifying Prior Determination Knowledge Using the PAC Learning Model

Mahadevan Sridhar Tadepalli Prasad 《Machine Learning》1994,17(1):69-105

Prior knowledge, or bias, regarding a concept can reduce the number of examples needed to learn it. Probably Approximately Correct (PAC) learning is a mathematical model of concept learning that can be used to quantify the reduction in the number of examples due to different forms of bias. Thus far, PAC learning has mostly been used to analyzesyntactic bias, such as limiting concepts to conjunctions of boolean prepositions. This paper demonstrates that PAC learning can also be used to analyzesemantic bias, such as a domain theory about the concept being learned. The key idea is to view the hypothesis space in PAC learning as that consistent withall prior knowledge, syntactic and semantic. In particular, the paper presents an analysis ofdeterminations, a type of relevance knowledge. The results of the analysis reveal crisp distinctions and relations among different determinations, and illustrate the usefulness of an analysis based on the PAC learning model. 相似文献

2.

On Restricted-Focus-of-Attention Learnability of Boolean Functions

Birkendorf Andreas Dichterman Eli Jackson Jeffrey Klasner Norbert Simon Hans Ulrich 《Machine Learning》1998,30(1):89-123

In the k-Restricted-Focus-of-Attention (k-RFA) model, only k of the n attributes of each example are revealed to the learner, although the set of visible attributes in each example is determined by the learner. While thek -RFA model is a natural extension of the PAC model, there are also significant differences. For example, it was previously known that learnability in this model is not characterized by the VC-dimension and that many PAC learning algorithms are not applicable in the k-RFA setting.In this paper we further explore the relationship between the PAC and k -RFA models, with several interesting results. First, we develop an information-theoretic characterization of k-RFA learnability upon which we build a general tool for proving hardness results. We then apply this and other new techniques for studying RFA learning to two particularly expressive function classes,k -decision-lists (k-DL) and k-TOP, the class of thresholds of parity functions in which each parity function takes at most k inputs. Among other results, we prove a hardness result for k-RFA learnability of k-DL,k n-2 . In sharp contrast, an (n-1)-RFA algorithm for learning (n-1)-DL is presented. Similarly, we prove that 1-DL is learnable if and only if at least half of the inputs are visible in each instance. In addition, we show that there is a uniform-distribution k-RFA learning algorithm for the class of k -DL. For k-TOP we show weak learnability by ak -RFA algorithm (with efficient time and sample complexity for constant k) and strong uniform-distribution k-RFA learnability of k-TOP with efficient sample complexity for constant k. Finally, by combining some of our k-DL and k-TOP results, we show that, unlike the PAC model, weak learning does not imply strong learning in the k -RFA model. 相似文献

3.

On the Efficiency of Noise-Tolerant PAC Algorithms Derived from Statistical Queries

Jeffrey Jackson 《Annals of Mathematics and Artificial Intelligence》2003,39(3):291-313

The Statistical Query (SQ) model provides an elegant means for generating noise-tolerant PAC learning algorithms that run in time inverse polynomial in the noise rate. Whether or not there is an SQ algorithm for every noise-tolerant PAC algorithm that is efficient in this sense remains an open question. However, we show that PAC algorithms derived from the Statistical Query model are not always the most efficient possible. Specifically, we give a general definition of SQ-based algorithm and show that there is a subclass of parity functions for which there is an efficient PAC algorithm requiring asymptotically less running time than any SQ-based algorithm. While it turns out that this result can be derived fairly easily by combining a recent algorithm of Blum, Kalai, and Wasserman with an older lower bound, we also provide alternate, Fourier-based approaches to both the upper and lower bounds that strengthen the results in various ways. The lower bound in particular is stronger than might be expected, and the amortized technique used in deriving this bound may be of independent interest. 相似文献

4.

On the Limits of Proper Learnability of Subclasses of DNF Formulas

Pillaipakkamnatt Krishnan Raghavan Vijay 《Machine Learning》1996,25(2-3):237-263

Bshouty, Goldman, Hancock and Matar have shown that up to term DNF formulas can be properly learned in the exact model with equivalence and membership queries. Given standard complexity-theoretical assumptions, we show that this positive result for proper learning cannot be significantly improved in the exact model or the PAC model extended to allow membership queries. Our negative results are derived from two general techniques for proving such results in the exact model and the extended PAC model. As a further application of these techniques, we consider read-thrice DNF formulas. Here we improve on Aizenstein, Hellerstein, and Pitt's negative result for proper learning in the exact model in two ways. First, we show that their assumption of NP co-NP can be replaced with the weaker assumption of P NP. Second, we show that read-thrice DNF formulas are not properly learnable in the extended PAC model, assuming RP NP. 相似文献

5.

Exploring learnability between exact and PAC

Nader H. Bshouty Jeffrey C. Jackson Christino Tamon 《Journal of Computer and System Sciences》2005,70(4):471-484

We study a model of probably exactly correct (PExact) learning that can be viewed either as the Exact model (learning from equivalence queries only) relaxed so that counterexamples to equivalence queries are distributionally drawn rather than adversarially chosen or as the probably approximately correct (PAC) model strengthened to require a perfect hypothesis. We also introduce a model of probably almost exactly correct (PAExact) learning that requires a hypothesis with negligible error and thus lies between the PExact and PAC models. Unlike the Exact and PExact models, PAExact learning is applicable to classes of functions defined over infinite instance spaces. We obtain a number of separation results between these models. Of particular note are some positive results for efficient parallel learning in the PAExact model, which stand in stark contrast to earlier negative results for efficient parallel Exact learning. 相似文献

6.

Learning Nonoverlapping Perceptron Networks from Examples and Membership Queries

Hancock Thomas R. Golea Mostefa Marchand Mario 《Machine Learning》1994,16(3):161-183

We investigate, within the PAC learning model, the problem of learning nonoverlapping perceptron networks (also known as read-once formulas over a weighted threshold basis). These are loop-free neural nets in which each node has only one outgoing weight. We give a polynomial time algorithm that PAC learns any nonoverlapping perceptron network using examples and membership queries. The algorithm is able to identify both the architecture and the weight values necessary to represent the function to be learned. Our results shed some light on the effect of the overlap on the complexity of learning in neural networks. 相似文献

7.

On the Learnability of Rich Function Classes

《Journal of Computer and System Sciences》1999,58(1):183-192

The probably approximately correct (PAC) model of learning and its extension to real-valued function classes sets a rigorous framework based upon which the complexity of learning a target from a function class using a finite sample can be computed. There is one main restriction, however, that the function class have a finite VC-dimension or scale-sensitive pseudo-dimension. In this paper we present an extension of the PAC framework with which rich function classes with possibly infinite pseudo-dimension may be learned with a finite number of examples and a finite amount of partial information. As an example we consider learning a family of infinite dimensional Sobolev classes. 相似文献

8.

The Power of Self-Directed Learning

Goldman Sally A. Sloan Robert H. 《Machine Learning》1994,14(3):271-294

This article studies self-directed learning, a variant of the on-line (or incremental) learning model in which the learner selects the presentation order for the instances. Alternatively, one can view this model as a variation of learning with membership queries in which the learner is only charged for membership queries for which it could not predict the outcome. We give tight bounds on the complexity of self-directed learning for the concept classes of monomials, monotone DNF formulas, and axis-parallel rectangles in {0, 1, , n – 1}^d. These results demonstrate that the number of mistakes under self-directed learning can be surprisingly small. We then show that learning complexity in the model of self-directed learning is less than that of all other commonly studied on-line and query learning models. Next we explore the relationship between the complexity of self-directed learning and the Vapnik-Chervonenkis (VC-)dimension. We show that, in general, the VC-dimension and the self-directed learning complexity are incomparable. However, for some special cases, we show that the VC-dimension gives a lower bound for the self-directed learning complexity. Finally, we explore a relationship between Mitchell's version space algorithm and the existence of self-directed learning algorithms that make few mistakes. 相似文献

9.

On the Complexity of Function Learning

Auer Peter Long Philip M. Maass Wolfgang Woeginger Gerhard J. 《Machine Learning》1995,18(2-3):187-230

The majority of results in computational learning theory are concerned with concept learning, i.e. with the special case of function learning for classes of functions with range {0, 1}. Much less is known about the theory of learning functions with a larger range such as or . In particular relatively few results exist about the general structure of common models for function learning, and there are only very few nontrivial function classes for which positive learning results have been exhibited in any of these models.We introduce in this paper the notion of a binary branching adversary tree for function learning, which allows us to give a somewhat surprising equivalent characterization of the optimal learning cost for learning a class of real-valued functions (in terms of a max-min definition which does not involve any learning model).Another general structural result of this paper relates the cost for learning a union of function classes to the learning costs for the individual function classes.Furthermore, we exhibit an efficient learning algorithm for learning convex piecewise linear functions from ^d into . Previously, the class of linear functions from ^d into was the only class of functions with multidimensional domain that was known to be learnable within the rigorous framework of a formal model for online learning.Finally we give a sufficient condition for an arbitrary class of functions from into that allows us to learn the class of all functions that can be written as the pointwise maximum ofk functions from . This allows us to exhibit a number of further nontrivial classes of functions from into for which there exist efficient learning algorithms. 相似文献

10.

Model-based transductive learning of the kernel matrix

Zhihua Zhang James T. Kwok Dit-Yan Yeung 《Machine Learning》2006,63(1):69-101

This paper addresses the problem of transductive learning of the kernel matrix from a probabilistic perspective. We define the kernel matrix as a Wishart process prior and construct a hierarchical generative model for kernel matrix learning. Specifically, we consider the target kernel matrix as a random matrix following the Wishart distribution with a positive definite parameter matrix and a degree of freedom. This parameter matrix, in turn, has the inverted Wishart distribution (with a positive definite hyperparameter matrix) as its conjugate prior and the degree of freedom is equal to the dimensionality of the feature space induced by the target kernel. Resorting to a missing data problem, we devise an expectation-maximization (EM) algorithm to infer the missing data, parameter matrix and feature dimensionality in a maximum a posteriori (MAP) manner. Using different settings for the target kernel and hyperparameter matrices, our model can be applied to different types of learning problems. In particular, we consider its application in a semi-supervised learning setting and present two classification methods. Classification experiments are reported on some benchmark data sets with encouraging results. In addition, we also devise the EM algorithm for kernel matrix completion. Editor: Philip M. Long 相似文献

11.

Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

Vladislav B. Tadić 《Machine Learning》2006,63(2):107-133

The mean-square asymptotic behavior of temporal-difference learning algorithms with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching. Editor: Robert Schapire 相似文献

12.

Relations between Gold-style learning and query learning

Steffen Lange Sandra Zilles 《Information and Computation》2005,203(2):2562

Different formal learning models address different aspects of human learning. Below we compare Gold-style learning—modelling learning as a limiting process in which the learner may change its mind arbitrarily often before converging to a correct hypothesis—to learning via queries—modelling learning as a one-shot process in which the learner is required to identify the target concept with just one hypothesis. In the Gold-style model considered below, the information presented to the learner consists of positive examples for the target concept, whereas in query learning, the learner may pose a certain kind of queries about the target concept, which will be answered correctly by an oracle (called teacher). Although these two approaches seem rather unrelated at first glance, we provide characterisations of different models of Gold-style learning (learning in the limit, conservative inference, and behaviourally correct learning) in terms of query learning. Thus we describe the circumstances which are necessary to replace limit learners by equally powerful one-shot learners. Our results are valid in the general context of learning indexable classes of recursive languages. This analysis leads to an important observation, namely that there is a natural query learning type hierarchically in-between Gold-style learning in the limit and behaviourally correct learning. Astonishingly, this query learning type can then again be characterised in terms of Gold-style inference. 相似文献

13.

Reliable agnostic learning

Adam Tauman Kalai Varun Kanade Yishay Mansour 《Journal of Computer and System Sciences》2012,78(5):1481-1495

It is well known that in many applications erroneous predictions of one type or another must be avoided. In some applications, like spam detection, false positive errors are serious problems. In other applications, like medical diagnosis, abstaining from making a prediction may be more desirable than making an incorrect prediction. In this paper we consider different types of reliable classifiers suited for such situations. We formalize the notion and study properties of reliable classifiers in the spirit of agnostic learning (Haussler, 1992; Kearns, Schapire, and Sellie, 1994), a PAC-like model where no assumption is made on the function being learned. We then give two algorithms for reliable agnostic learning under natural distributions. The first reliably learns DNFs with no false positives using membership queries. The second reliably learns halfspaces from random examples with no false positives or false negatives, but the classifier sometimes abstains from making predictions. 相似文献

14.

Can PAC learning algorithms tolerate random attribute noise? 总被引：2，自引：0，他引：2

S. A. Goldman R. H. Sloan 《Algorithmica》1995,14(1):70-84

This paper studies the robustness of PAC learning algorithms when the instance space is {0,1}ⁿ, and the examples are corrupted by purely random noise affecting only the attributes (and not the labels). Foruniform attribute noise, in which each attribute is flipped independently at random with the same probability, we present an algorithm that PAC learns monomials for any (unknown) noise rate less than ₂ ¹ . Contrasting this positive result, we show thatproduct random attribute noise, where each attributei is flipped randomly and independently with its own probability p_i, is nearly as harmful as malicious noise-no algorithm can tolerate more than a very small amount of such noise.The research of S. A. Goldman was supported in part by a GE Foundation Junior Faculty grant and NSF Grant CCR-9110108. Part of this research was conducted while the author was at the M.I.T. Laboratory for Computer Science and supported by NSF Grant DCR-8607494 and a grant from the Siemens Corporation. The research of R. H. Sloan was supported in part by NSF Grant CCR-9108753. Part of this research was conducted while the author was at Harvard and supported by ARO Grant DAAL 03-86-K-0171. 相似文献

15.

On the non-efficient PAC learnability of conjunctive queries

《Information Processing Letters》2024

This note serves three purposes: (i) we provide a self-contained exposition of the fact that conjunctive queries are not efficiently learnable in the Probably-Approximately-Correct (PAC) model, paying clear attention to the complicating fact that this concept class lacks the polynomial-size fitting property, a property that is tacitly assumed in much of the computational learning theory literature; (ii) we establish a strong negative PAC learnability result that applies to many restricted classes of conjunctive queries (CQs), including acyclic CQs for a wide range of notions of acyclicity; (iii) we show that CQs (and UCQs) are efficiently PAC learnable with membership queries. 相似文献

16.

Compositional CSP Traces Refinement Checking

Heike Wehrheim Daniel Wonisch 《Electronic Notes in Theoretical Computer Science》2009,250(2):135

Compositional verification using assume-guarantee reasoning has recently seen an uprise due to the introduction of automatic techniques for learning assumptions. In this paper, we transfer this technique to a setting with CSP as modelling and property specification language, and present an approach to compositional traces refinement checking. The approach has been implemented using the CSP model checker FDR as teacher during learning. The implementation shows that the compositional approach can both drastically outperform as well as underperform FDR's performance, depending on the example at hand. 相似文献

17.

Learning DFA from Simple Examples

Rajesh Parekh Vasant Honavar 《Machine Learning》2001,44(1-2):9-35

相似文献

18.

Some contributions to fixed-distribution learning theory

Vidyasagar M. Kulkarni S.R. 《Automatic Control, IEEE Transactions on》2000,45(2):217-234

We consider some problems in learning with respect to a fixed distribution. We introduce two new notions of learnability; these are probably uniformly approximately correct (PUAC) learnability which is a stronger requirement than the widely studied PAC learnability, and minimal empirical risk (MER) learnability, which is a stronger requirement than the previously defined notions of “solid” or “potential” learnability. It is shown that, although the motivations for defining these two notions of learnability are entirely different, these two notions are in fact equivalent to each other and, in turn, equivalent to a property introduced here, referred to as the shrinking width property. It is further shown that if the function class to be learned has the property that empirical means converge uniformly to their true values, then all of these learnability properties hold. In the course of proving conditions for these forms of learnability, we also obtain a new estimate for the VC-dimension of a collection of sets obtained by performing Boolean operations on a given collection; this result is of independent interest. We consider both the case in which there is an underlying target function, as well as the case of “model-free” (or agnostic) learning. Finally, we consider the issue of representation of a collection of sets by its subcollection of equivalence classes. It is shown by example that, by suitably choosing representatives of each equivalence class, it is possible to affect the property of uniform convergence of empirical probabilities 相似文献

19.

基于k个标记样本的弱监督学习框架

付治王红军李天瑞滕飞张继《软件学报》2020,31(4):981-990

聚类是机器学习领域中的一个研究热点,弱监督学习是半监督学习中一个重要的研究方向,有广泛的应用场景.在对聚类与弱监督学习的研究中,提出了一种基于k个标记样本的弱监督学习框架.该框架首先用聚类及聚类置信度实现了标记样本的扩展.其次,对受限玻尔兹曼机的能量函数进行改进,提出了基于k个标记样本的受限玻尔兹曼机学习模型.最后,完成了对该模型的推理并设计相关算法.为了完成对该框架和模型的检验,选择公开的数据集进行对比实验,实验结果表明,基于k个标记样本的弱监督学习框架实验效果较好. 相似文献

20.

Induction Over the Unexplained: Using Overly-General Domain Theories to Aid Concept Learning

Mooney Raymond J. 《Machine Learning》1993,10(1):79-110

This paper describes and evaluates an approach to combining empirical and explanation-based learning called Induction Over the Unexplained (IOU). IOU is intended for learning concepts that can be partially explained by an overly-general domain theory. An eclectic evaluation of the method is presented which includes results from all three major approaches: empirical, theoretical, and psychological. Empirical results show that IOU is effective at refining overly-general domain theories and that it learns more accurate concepts from fewer examples than a purely empirical approach. The application of theoretical results from PAC learnability theory explains why IOU requires fewer examples. IOU is also shown to be able to model psychological data demonstrating the effect of background knowledge on human learning. 相似文献