首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
Cdric  Nicolas  Michel 《Neurocomputing》2008,71(7-9):1274-1282
Mixtures of probabilistic principal component analyzers model high-dimensional nonlinear data by combining local linear models. Each mixture component is specifically designed to extract the local principal orientations in the data. An important issue with this generative model is its sensitivity to data lying off the low-dimensional manifold. In order to address this problem, the mixtures of robust probabilistic principal component analyzers are introduced. They take care of atypical points by means of a long tail distribution, the Student-t. It is shown that the resulting mixture model is an extension of the mixture of Gaussians, suitable for both robust clustering and dimensionality reduction. Finally, we briefly discuss how to construct a robust version of the closely related mixture of factor analyzers.  相似文献   

2.
SMEM algorithm for mixture models   总被引:12,自引:0,他引:12  
We present a split-and-merge expectation-maximization (SMEM) algorithm to overcome the local maxima problem in parameter estimation of finite mixture models. In the case of mixture models, local maxima often involve having too many components of a mixture model in one part of the space and too few in another, widely separated part of the space. To escape from such configurations, we repeatedly perform simultaneous split-and-merge operations using a new criterion for efficiently selecting the split-and-merge candidates. We apply the proposed algorithm to the training of gaussian mixtures and mixtures of factor analyzers using synthetic and real data and show the effectiveness of using the split-and-merge operations to improve the likelihood of both the training data and of held-out test data. We also show the practical usefulness of the proposed algorithm by applying it to image compression and pattern recognition problems.  相似文献   

3.
Dimensionality reduction (DR) has been one central research topic in information theory, pattern recognition, and machine learning. Apparently, the performance of many learning models significantly rely on dimensionality reduction: successful DR can largely improve various approaches in clustering and classification, while inappropriate DR may deteriorate the systems. When applied on high-dimensional data, some existing research approaches often try to reduce the dimensionality first, and then input the reduced features to other available models, e.g., Gaussian mixture model (GMM). Such independent learning could however significantly limit the performance, since the optimal subspace given by a particular DR approach may not be appropriate for the following model. In this paper, we focus on investigating how unsupervised dimensionality reduction could be performed together with GMM and if such joint learning could lead to improvement in comparison with the traditional unsupervised method. In particular, we engage the mixture of factor analyzers with the assumption that a common factor loading exists for all the components. Based on that, we then present EM-algorithm that converges to a local optimal solution. Such setting exactly optimizes a dimensionality reduction together with the parameters of GMM. We describe the framework, detail the algorithm, and conduct a series of experiments to validate the effectiveness of our proposed approach. Specifically, we compare the proposed joint learning approach with two competitive algorithms on one synthetic and six real data sets. Experimental results show that the joint learning significantly outperforms the comparison methods in terms of three criteria.  相似文献   

4.
Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data, where the number of observations n is small relative to their dimension p. However, this approach is sensitive to outliers as it is based on a mixture model in which the multivariate normal family of distributions is assumed for the component error and factor distributions. An extension to mixtures of t-factor analyzers is considered, whereby the multivariate t-family is adopted for the component error and factor distributions. An EM-based algorithm is developed for the fitting of mixtures of t-factor analyzers. Its application is demonstrated in the clustering of some microarray gene-expression data.  相似文献   

5.
There has been growing interest in subspace data modeling over the past few years. Methods such as principal component analysis, factor analysis, and independent component analysis have gained in popularity and have found many applications in image modeling, signal processing, and data compression, to name just a few. As applications and computing power grow, more and more sophisticated analyses and meaningful representations are sought. Mixture modeling methods have been proposed for principal and factor analyzers that exploit local gaussian features in the subspace manifolds. Meaningful representations may be lost, however, if these local features are nongaussian or discontinuous. In this article, we propose extending the gaussian analyzers mixture model to an independent component analyzers mixture model. We employ recent developments in variational Bayesian inference and structure determination to construct a novel approach for modeling nongaussian, discontinuous manifolds. We automatically determine the local dimensionality of each manifold and use variational inference to calculate the optimum number of ICA components needed in our mixture model. We demonstrate our framework on complex synthetic data and illustrate its application to real data by decomposing functional magnetic resonance images into meaningful-and medically useful-features.  相似文献   

6.
The combination of static and dynamic software analysis, such as data flow analysis (Dfa) and model checking, provides benefits for both disciplines. On the one hand, the information extracted by Dfas about program data may be utilized by model checkers to optimize the state space representation. On the other hand, the expressiveness of logic formulas allows us to consider model checkers as generic data flow analyzers. Following this second approach, we propose in this paper an algorithm to calculate Dfas using on-the-fly resolution of boolean equation systems (Bess). The overall framework includes the abstraction of the input program into an implicit labeled transition system (Lts), independent of the program specification language. Moreover, using Bess as an intermediate representation allowed us to reformulate classical Dfas encountered in the literature, which were previously encoded in terms of μ-calculus formulas with forward and backward modalities. Our work was implemented and integrated into the widespread verification platform Cadp, and experimented on real examples.  相似文献   

7.
Handling of incomplete data sets using ICA and SOM in data mining   总被引:1,自引:0,他引:1  
Based on independent component analysis (ICA) and self-organizing maps (SOM), this paper proposes an ISOM-DH model for the incomplete data’s handling in data mining. Under these circumstances the data remain dependent and non-Gaussian, this model can make full use of the information of the given data to estimate the missing data and can visualize the handled high-dimensional data. Compared with mixture of principal component analyzers (MPCA), mean method and standard SOM-based fuzzy map model, ISOM-DH model can be applied to more cases, thus performing its superiority. Meanwhile, the correctness and reasonableness of ISOM-DH model is also validated by the experiment carried out in this paper.  相似文献   

8.
We develop a new biologically motivated algorithm for representing natural images using successive projections into complementary subspaces. An image is first projected into an edge subspace spanned using an ICA basis adapted to natural images which captures the sharp features of an image like edges and curves. The residual image obtained after extraction of the sharp image features is approximated using a mixture of probabilistic principal component analyzers (MPPCA) model. The model is consistent with cellular, functional, information theoretic, and learning paradigms in visual pathway modeling. We demonstrate the efficiency of our model for representing different attributes of natural images like color and luminance. We compare the performance of our model in terms of quality of representation against commonly used basis, like the discrete cosine transform (DCT), independent component analysis (ICA), and principal components analysis (PCA), based on their entropies. Chrominance and luminance components of images are represented using codes having lower entropy than DCT, ICA, or PCA for similar visual quality. The model attains considerable simplification for learning from images by using a sparse independent code for representing edges and explicitly evaluating probabilities in the residual subspace.  相似文献   

9.
提出了一种基于混合因子分析的分布估计算法.首先用次胜者受罚的竞争学习算法对选出的最优个体集合聚类,然后对每个类用因子分析模型进行分布信息的估计.为了保持种群的多样性,算法保留那些具有较好适应值并且与所选的最优个体集合较远的个体,并利用聚类的参数来减少计算量.试验结果证实了算法的性能.  相似文献   

10.
This paper presents a new procedure for learning mixtures of independent component analyzers. The procedure includes non-parametric estimation of the source densities, supervised-unsupervised learning of the model parameters, incorporation of any independent component analysis (ICA) algorithm into the learning of the ICA mixtures, and estimation of residual dependencies after training for correction of the posterior probability of every class to the testing observation vector. We demonstrate the performance of the procedure in the classification of ICA mixtures of two, three, and four classes of synthetic data, and in the classification of defective materials, consisting of 3D finite element models and lab specimens, in non-destructive testing using the impact-echo technique. The application of the proposed posterior probability correction demonstrates an improvement in the classification accuracy. Semi-supervised learning shows that unlabeled data can degrade the performance of the classifier when they do not fit the generative model. Comparative results of the proposed method and standard ICA algorithms for blind source separation in one and multiple ICA data mixtures show the suitability of the non-parametric ICA mixture-based method for data modeling.  相似文献   

11.
One of the simplest, and yet most consistently well-performing set of classifiers is the Naïve Bayes models. These models rely on two assumptions: (i) All the attributes used to describe an instance are conditionally independent given the class of that instance, and (ii) all attributes follow a specific parametric family of distributions. In this paper we propose a new set of models for classification in continuous domains, termed latent classification models. The latent classification model can roughly be seen as combining the Naïve Bayes model with a mixture of factor analyzers, thereby relaxing the assumptions of the Naïve Bayes classifier. In the proposed model the continuous attributes are described by a mixture of multivariate Gaussians, where the conditional dependencies among the attributes are encoded using latent variables. We present algorithms for learning both the parameters and the structure of a latent classification model, and we demonstrate empirically that the accuracy of the proposed model is significantly higher than the accuracy of other probabilistic classifiers.Editors: Pedro Larrañaga, Jose A. Lozano, Jose M. Peña and Iñaki Inza  相似文献   

12.
13.
为了改善传统Fast ICA算法的稳定性和分离效率,基于Tukey M估计构造了一种新的非线性函数,提出了MTICA算法;并在此基础上结合SVR算法,建立了一种新的MTICA-AEO-SVR股票价格预测模型。用MTICA算法将原始股票数据分解为独立分量进行排序去噪,选择不同的SVR模型分别对各独立分量和股票价格进行预测。在SVR算法中引入了人工生态系统优化算法(AEO)选参,提高了模型的预测精度。通过对上证B股指数的实证分析,结果表明,MTICA-AEO-SVR模型比ICA-AEO-SVR模型和ICA-SVR模型更准确和高效。  相似文献   

14.
For Bayesian inference on the mixture of factor analyzers, natural conjugate priors on the parameters are introduced, and then a Gibbs sampler that generates parameter samples following the posterior is constructed. In addition, a deterministic estimation algorithm is derived by taking modes instead of samples from the conditional posteriors used in the Gibbs sampler. This is regarded as a maximum a posteriori estimation algorithm with hyperparameter search. The behaviors of the Gibbs sampler and the deterministic algorithm are compared on a simulation experiment.  相似文献   

15.
Probabilistic sequential independent components analysis   总被引:1,自引:0,他引:1  
Under-complete models, which derive lower dimensional representations of input data, are valuable in domains in which the number of input dimensions is very large, such as data consisting of a temporal sequence of images. This paper presents the under-complete product of experts (UPoE), where each expert models a one-dimensional projection of the data. Maximum-likelihood learning rules for this model constitute a tractable and exact algorithm for learning under-complete independent components. The learning rules for this model coincide with approximate learning rules proposed earlier for under-complete independent component analysis (UICA) models. This paper also derives an efficient sequential learning algorithm from this model and discusses its relationship to sequential independent component analysis (ICA), projection pursuit density estimation, and feature induction algorithms for additive random field models. This paper demonstrates the efficacy of these novel algorithms on high-dimensional continuous datasets.  相似文献   

16.
为解决协同过滤推荐(CFR)算法中的数据量过大和数据稀疏性的问题,采用因子分析的方法对数据降维,并使用回归分析方法预测待评估值,既减少了数据量又最大限度保留了信息。该算法首先,采用因子分析的方法将用户和项目降维为若干用户因子和若干项目因子;然后,以目标用户为因变量,以用户因子为自变量建立一个回归模型,并且以待评价项目为因变量,以项目因子为自变量建立另一个回归模型,进而得到目标用户在待评项目上的两个预测值;最后,通过两者的加权得到最终的预测值。实验仿真证实了算法的可行性和有效性。实验结果表明,该算法比基于项目的协同过滤推荐算法在精确度上有所提高。  相似文献   

17.
基于概率潜在语义分析的中文信息检索   总被引:1,自引:1,他引:0       下载免费PDF全文
罗景  涂新辉 《计算机工程》2008,34(2):199-201
传统的信息检索模型把词看作孤立的单元,没有考虑自然语言中存在大量的同义词、多义词现象,对召回率和准确率有不利的影响。概率潜在语义模型使用统计的方法建立“文档-潜在语义-词”之间概率分布关系并利用这种关系进行检索。该文将概率潜在语义模型用于中文信息检索,实验结果表明,概率潜在语义模型相对于传统的向量空间模型能够显著地提高检索的平均精度。  相似文献   

18.
The curse of dimensionality hinders the effectiveness of density estimation in high dimensional spaces. Many techniques have been proposed in the past to discover embedded, locally linear manifolds of lower dimensionality, including the mixture of principal component analyzers, the mixture of probabilistic principal component analyzers and the mixture of factor analyzers. In this paper, we propose a novel mixture model for reducing dimensionality based on a linear transformation which is not restricted to be orthogonal nor aligned along the principal directions. For experimental validation, we have used the proposed model for classification of five “hard” data sets and compared its accuracy with that of other popular classifiers. The performance of the proposed method has outperformed that of the mixture of probabilistic principal component analyzers on four out of the five compared data sets with improvements ranging from 0.5 to 3.2%. Moreover, on all data sets, the accuracy achieved by the proposed method outperformed that of the Gaussian mixture model with improvements ranging from 0.2 to 3.4%.  相似文献   

19.
Mixtures of factor analyzers have been receiving wide interest in statistics as a tool for performing clustering and dimension reduction simultaneously. In this model it is assumed that, within each component, the data are generated according to a factor model. Therefore, the number of parameters on which the covariance matrices depend is reduced. Several estimation methods have been proposed for this model, both in the classical and in the Bayesian framework. However, so far, a direct maximum likelihood procedure has not been developed. This direct estimation problem, which simultaneously allows one to derive the information matrix for the mixtures of factor analyzers, is solved. The effectiveness of the proposed procedure is shown on a simulation study and on a toy example.  相似文献   

20.
This paper is devoted to extending common factors and categorical variables in the model of a finite mixture of factor analyzers based on the multivariate generalized linear model and the principle of maximum random utility in the probabilistic choice theory. The EM algorithm and Newton-Raphson algorithm are used to estimate model parameters, and then the algorithm is illustrated with a simulation study and a real example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号