首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Jiang W 《Neural computation》2006,18(11):2762-2776
Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. We use a prior to select a limited number of candidate variables to enter the model, applying a popular method with selection indicators. We show that this approach can induce posterior estimates of the regression functions that are consistently estimating the truth, if the true regression model is sparse in the sense that the aggregated size of the regression coefficients are bounded. The estimated regression functions therefore can also produce consistent classifiers that are asymptotically optimal for predicting future binary outputs. These provide theoretical justifications for some recent empirical successes in microarray data analysis.  相似文献   

2.
Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learning-theoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a component-wise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsity-promoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.  相似文献   

3.
Software defect prediction is an important decision support activity in software quality assurance. The limitation of the labelled modules usually makes the prediction difficult, and the class‐imbalance characteristic of software defect data leads to negative influence on decision of classifiers. Semi‐supervised learning can build high‐performance classifiers by using large amount of unlabelled modules together with the labelled modules. Ensemble learning achieves a better prediction capability for class‐imbalance data by using a series of weak classifiers to reduce the bias generated by the majority class. In this paper, we propose a new semi‐supervised software defect prediction approach, non‐negative sparse‐based SemiBoost learning. The approach is capable of exploiting both labelled and unlabelled data and is formulated in a boosting framework. In order to enhance the prediction ability, we design a flexible non‐negative sparse similarity matrix, which can fully exploit the similarity of historical data by incorporating the non‐negativity constraint into sparse learning for better learning the latent clustering relationship among software modules. The widely used datasets from NASA projects are employed as test data to evaluate the performance of all compared methods. Experimental results show that non‐negative sparse‐based SemiBoost learning outperforms several representative state‐of‐the‐art semi‐supervised software defect prediction methods. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

4.
A Bayesian approach to joint feature selection and classifier design   总被引:5,自引:0,他引:5  
This paper adopts a Bayesian approach to simultaneously learn both an optimal nonlinear classifier and a subset of predictor variables (or features) that are most relevant to the classification task. The approach uses heavy-tailed priors to promote sparsity in the utilization of both basis functions and features; these priors act as regularizers for the likelihood function that rewards good classification on the training data. We derive an expectation- maximization (EM) algorithm to efficiently compute a maximum a posteriori (MAP) point estimate of the various parameters. The algorithm is an extension of recent state-of-the-art sparse Bayesian classifiers, which in turn can be seen as Bayesian counterparts of support vector machines. Experimental comparisons using kernel classifiers demonstrate both parsimonious feature selection and excellent classification accuracy on a range of synthetic and benchmark data sets.  相似文献   

5.
Sparse representation is a mathematical model for data representation that has proved to be a powerful tool for solving problems in various fields such as pattern recognition, machine learning, and computer vision. As one of the building blocks of the sparse representation method, dictionary learning plays an important role in the minimization of the reconstruction error between the original signal and its sparse representation in the space of the learned dictionary. Although using training samples directly as dictionary bases can achieve good performance, the main drawback of this method is that it may result in a very large and inefficient dictionary due to noisy training instances. To obtain a smaller and more representative dictionary, in this paper, we propose an approach called Laplacian sparse dictionary (LSD) learning. Our method is based on manifold learning and double sparsity. We incorporate the Laplacian weighted graph in the sparse representation model and impose the l1-norm sparsity on the dictionary. An LSD is a sparse overcomplete dictionary that can preserve the intrinsic structure of the data and learn a smaller dictionary for each class. The learned LSD can be easily integrated into a classification framework based on sparse representation. We compare the proposed method with other methods using three benchmark-controlled face image databases, Extended Yale B, ORL, and AR, and one uncontrolled person image dataset, i-LIDS-MA. Results show the advantages of the proposed LSD algorithm over state-of-the-art sparse representation based classification methods.  相似文献   

6.
Bayesian networks (BNs) have gained increasing attention in recent years. One key issue in Bayesian networks is parameter learning. When training data is incomplete or sparse or when multiple hidden nodes exist, learning parameters in Bayesian networks becomes extremely difficult. Under these circumstances, the learning algorithms are required to operate in a high-dimensional search space and they could easily get trapped among copious local maxima. This paper presents a learning algorithm to incorporate domain knowledge into the learning to regularize the otherwise ill-posed problem, to limit the search space, and to avoid local optima. Unlike the conventional approaches that typically exploit the quantitative domain knowledge such as prior probability distribution, our method systematically incorporates qualitative constraints on some of the parameters into the learning process. Specifically, the problem is formulated as a constrained optimization problem, where an objective function is defined as a combination of the likelihood function and penalty functions constructed from the qualitative domain knowledge. Then, a gradient-descent procedure is systematically integrated with the E-step and M-step of the EM algorithm, to estimate the parameters iteratively until it converges. The experiments with both synthetic data and real data for facial action recognition show our algorithm improves the accuracy of the learned BN parameters significantly over the conventional EM algorithm.  相似文献   

7.
贝叶斯学习,贝叶斯网络与数据采掘   总被引:16,自引:1,他引:15  
自从50~60年代贝叶斯学派形成后,关于贝叶斯分析的研究久盛不衰。早在80年代,贝叶斯网络就成功地应用于专家系统,成为表示不确定性专家知识和推理的一种流行方法。90年代以来,贝叶斯学习一直是机器学习研究的重要方向。由于概率统计与数据采掘的  相似文献   

8.
One-shot learning of object categories   总被引:6,自引:0,他引:6  
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by maximum likelihood (ML) and maximum a posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.  相似文献   

9.
In this paper, we present a novel approach to voice activity detection (VAD) based on the sparse representation of an input noisy speech over a learned dictionary. First, we investigate the relationship between the signal detection and the sparse representation based on the Bayesian framework. Second, we derive the decision rule and an adaptive threshold based on a likelihood ratio test, by modeling the non-zero elements in the sparse representation as a Gaussian distribution. The experimental results show that the proposed approach outperforms the current statistical model-based methods, such as Gaussian, Laplacian, and Gamma, under white, babble, and vehicle noise conditions.  相似文献   

10.
A multi-species approach to fisheries management requires taking into account the interactions between species in order to improve recruitment forecasting of the fish species. Recent advances in Bayesian networks direct the learning of models with several interrelated variables to be forecasted simultaneously. These models are known as multi-dimensional Bayesian network classifiers (MDBNs). Pre-processing steps are critical for the posterior learning of the model in these kinds of domains. Therefore, in the present study, a set of ‘state-of-the-art’ uni-dimensional pre-processing methods, within the categories of missing data imputation, feature discretization and feature subset selection, are adapted to be used with MDBNs. A framework that includes the proposed multi-dimensional supervised pre-processing methods, coupled with a MDBN classifier, is tested with synthetic datasets and the real domain of fish recruitment forecasting. The correctly forecasting of three fish species (anchovy, sardine and hake) simultaneously is doubled (from 17.3% to 29.5%) using the multi-dimensional approach in comparison to mono-species models. The probability assessments also show high improvement reducing the average error (estimated by means of Brier score) from 0.35 to 0.27. Finally, these differences are superior to the forecasting of species by pairs.  相似文献   

11.
Boosted Bayesian network classifiers   总被引:2,自引:0,他引:2  
The use of Bayesian networks for classification problems has received a significant amount of recent attention. Although computationally efficient, the standard maximum likelihood learning method tends to be suboptimal due to the mismatch between its optimization criteria (data likelihood) and the actual goal of classification (label prediction accuracy). Recent approaches to optimizing classification performance during parameter or structure learning show promise, but lack the favorable computational properties of maximum likelihood learning. In this paper we present boosted Bayesian network classifiers, a framework to combine discriminative data-weighting with generative training of intermediate models. We show that boosted Bayesian network classifiers encompass the basic generative models in isolation, but improve their classification performance when the model structure is suboptimal. We also demonstrate that structure learning is beneficial in the construction of boosted Bayesian network classifiers. On a large suite of benchmark data-sets, this approach outperforms generative graphical models such as naive Bayes and TAN in classification accuracy. Boosted Bayesian network classifiers have comparable or better performance in comparison to other discriminatively trained graphical models including ELR and BNC. Furthermore, boosted Bayesian networks require significantly less training time than the ELR and BNC algorithms.  相似文献   

12.

Image fusion is the process which aims to integrate the relevant and complementary information from a set of images into a single comprehensive image. Sparse representation (SR) is a powerful technique used in a wide variety of applications like denoising, compression and fusion. Building a compact and informative dictionary is the principal challenge in these applications. Hence, we propose a supervised classification based learning technique for the fusion algorithm. As an initial step, each patch of the training data set is pre-classified based on their gradient dominant direction. Then, a dictionary is learned using K-SVD algorithm. With this universal dictionary, sparse coefficients are estimated using greedy OMP algorithm to represent the given set of source images in the dominant direction. Finally, the Euclidean norm is used as a distance measure to reconstruct the fused image. Experimental results on different types of source images demonstrate the effectiveness of the proposed algorithm with conventional methods in terms of visual and quantitative evaluations.

  相似文献   

13.
This paper presents a method for improved ensemble learning, by treating the optimization of an ensemble of classifiers as a compressed sensing problem. Ensemble learning methods improve the performance of a learned predictor by integrating a weighted combination of multiple predictive models. Ideally, the number of models needed in the ensemble should be minimized, while optimizing the weights associated with each included model. We solve this problem by treating it as an example of the compressed sensing problem, in which a sparse solution must be reconstructed from an under-determined linear system. Compressed sensing techniques are then employed to find an ensemble which is both small and effective. An additional contribution of this paper, is to present a new performance evaluation method (a new pairwise diversity measurement) called the roulette-wheel kappa-error. This method takes into account the different weightings of the classifiers, and also reduces the total number of pairs of classifiers needed in the kappa-error diagram, by selecting pairs through a roulette-wheel selection method according to the weightings of the classifiers. This approach can greatly improve the clarity and informativeness of the kappa-error diagram, especially when the number of classifiers in the ensemble is large. We use 25 different public data sets to evaluate and compare the performance of compressed sensing ensembles using four different sparse reconstruction algorithms, combined with two different classifier learning algorithms and two different training data manipulation techniques. We also give the comparison experiments of our method against another five state-of-the-art pruning methods. These experiments show that our method produces comparable or better accuracy, while being significantly faster than the compared methods.  相似文献   

14.
The main purpose of a gene interaction network is to map the relationships of the genes that are out of sight when a genomic study is tackled. DNA microarrays allow the measure of gene expression of thousands of genes at the same time. These data constitute the numeric seed for the induction of the gene networks. In this paper, we propose a new approach to build gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling. The interactions induced by the Bayesian classifiers are based both on the expression levels and on the phenotype information of the supervised variable. Feature selection and bootstrap resampling add reliability and robustness to the overall process removing the false positive findings. The consensus among all the induced models produces a hierarchy of dependences and, thus, of variables. Biologists can define the depth level of the model hierarchy so the set of interactions and genes involved can vary from a sparse to a dense set. Experimental results show how these networks perform well on classification tasks. The biological validation matches previous biological findings and opens new hypothesis for future studies.  相似文献   

15.
In this work, a variational Bayesian framework for efficient training of echo state networks (ESNs) with automatic regularization and delay&sum (D&S) readout adaptation is proposed. The algorithm uses a classical batch learning of ESNs. By treating the network echo states as fixed basis functions parameterized with delay parameters, we propose a variational Bayesian ESN training scheme. The variational approach allows for a seamless combination of sparse Bayesian learning ideas and a variational Bayesian space-alternating generalized expectation-maximization (VB-SAGE) algorithm for estimating parameters of superimposed signals. While the former method realizes automatic regularization of ESNs, which also determines which echo states and input signals are relevant for "explaining" the desired signal, the latter method provides a basis for joint estimation of D&S readout parameters. The proposed training algorithm can naturally be extended to ESNs with fixed filter neurons. It also generalizes the recently proposed expectation-maximization-based D&S readout adaptation method. The proposed algorithm was tested on synthetic data prediction tasks as well as on dynamic handwritten character recognition.  相似文献   

16.
Husmeier D 《Neural computation》2000,12(11):2685-2717
Training probability-density estimating neural networks with the expectation-maximization (EM) algorithm aims to maximize the likelihood of the training set and therefore leads to overfitting for sparse data. In this article, a regularization method for mixture models with generalized linear kernel centers is proposed, which adopts the Bayesian evidence approach and optimizes the hyperparameters of the prior by type II maximum likelihood. This includes a marginalization over the parameters, which is done by Laplace approximation and requires the derivation of the Hessian of the log-likelihood function. The incorporation of this approach into the standard training scheme leads to a modified form of the EM algorithm, which includes a regularization term and adapts the hyperparameters on-line after each EM cycle. The article presents applications of this scheme to classification problems, the prediction of stochastic time series, and latent space models.  相似文献   

17.
There is a great interest in dimensionality reduction techniques for tackling the problem of high-dimensional pattern classification. This paper addresses the topic of supervised learning of a linear dimension reduction mapping suitable for classification problems. The proposed optimization procedure is based on minimizing an estimation of the nearest neighbor classifier error probability, and it learns a linear projection and a small set of prototypes that support the class boundaries. The learned classifier has the property of being very computationally efficient, making the classification much faster than state-of-the-art classifiers, such as SVMs, while having competitive recognition accuracy. The approach has been assessed through a series of experiments, showing a uniformly good behavior, and competitive compared with some recently proposed supervised dimensionality reduction techniques.  相似文献   

18.
There is an ongoing debate over the capabilities of hierarchical neural feedforward architectures for performing real-world invariant object recognition. Although a variety of hierarchical models exists, appropriate supervised and unsupervised learning methods are still an issue of intense research. We propose a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network. We show that principles of sparse coding, which were previously mostly applied to the initial feature detection stages, can also be employed to obtain optimized intermediate complex features. We suggest a new approach to optimize the learning of sparse features under the constraints of a weight-sharing or convolutional architecture that uses pooling operations to achieve gradual invariance in the feature hierarchy. The approach explicitly enforces symmetry constraints like translation invariance on the feature set. This leads to a dimension reduction in the search space of optimal features and allows determining more efficiently the basis representatives, which achieve a sparse decomposition of the input. We analyze the quality of the learned feature representation by investigating the recognition performance of the resulting hierarchical network on object and face databases. We show that a hierarchy with features learned on a single object data set can also be applied to face recognition without parameter changes and is competitive with other recent machine learning recognition approaches. To investigate the effect of the interplay between sparse coding and processing nonlinearities, we also consider alternative feedforward pooling nonlinearities such as presynaptic maximum selection and sum-of-squares integration. The comparison shows that a combination of strong competitive nonlinearities with sparse coding offers the best recognition performance in the difficult scenario of segmentation-free recognition in cluttered surround. We demonstrate that for both learning and recognition, a precise segmentation of the objects is not necessary.  相似文献   

19.
A technique for evaluating the learning capability and optimizing the feature space of a class of higher-order neural networks is presented. It is shown that supervised learning can be posed as an optimization problem in which inequality constraints are used to code the information contained in the training patterns and to specify the degree of accuracy expected from the neural network. The approach establishes: (a) whether the structure of the network can effectively learn the training patterns and, if it can, a connectivity which corresponds to satisfactorily learning; (b) those features which can be suppressed from the definition of the feature space without deteriorating performance; and (c) if the structure is not appropriate for learning the training patterns, the minimum set of patterns which cannot be learned. The technique is tested with two examples and results are discussed.  相似文献   

20.
针对传统基于K阶奇异值分解(KSVD)的字典学习算法时间复杂度高,学习字典对源图像的表达能力不理想,应用于医学图像融合效果差的问题,提出了一种新的字典学习方法:在字典学习之前对医学图像的特征信息进行筛选,选取能量和细节信息丰富的图像块作为训练集学习字典;根据学习得到的字典建立源图像的稀疏表示模型,运用正交匹配追踪算法(OMP)求解每个图像块的稀疏系数,采用"绝对值最大"策略构造融合图像的稀疏表示系数,最终得到融合图像.实验结果表明:针对不同的医学图像,提出的方法有效.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号