首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
目的 经典的聚类算法在处理高维数据时存在维数灾难等问题,使得计算成本大幅增加并且效果不佳。以自编码或变分自编码网络构建的聚类网络改善了聚类效果,但是自编码器提取的特征往往比较差,变分自编码器存在后验崩塌等问题,影响了聚类的结果。为此,本文提出了一种基于混合高斯变分自编码器的聚类网络。方法 使用混合高斯分布作为隐变量的先验分布构建变分自编码器,并以重建误差和隐变量先验与后验分布之间的KL散度(Kullback-Leibler divergence)构造自编码器的目标函数训练自编码网络;以训练获得的编码器对输入数据进行特征提取,结合聚类层构建聚类网络,以编码器隐层特征的软分配分布与软分配概率辅助目标分布之间的KL散度构建目标函数并训练聚类网络;变分自编码器采用卷积神经网络实现。结果 为了验证本文算法的有效性,在基准数据集MNIST (Modified National Institute of Standards and Technology Database)和Fashion-MNIST上评估了该网络的性能,聚类精度(accuracy,ACC)和标准互信息(normalized mutual information,NMI)指标在MNIST数据集上分别为95.86%和91%,在Fashion-MNIST数据集上分别为61.34%和62.5%,与现有方法相比性能有了不同程度的提升。结论 实验结果表明,本文网络取得了较好的聚类效果,且优于当前流行的多种聚类方法。  相似文献   

2.
针对高通量测序技术因各种原因导致的DNA甲基化测序数据中包含部分缺失值的问题。提出一种基于变分自编码器的DNA甲基化缺失数据填补模型VAE-MethImp。VAE-MethImp是一种深度隐含空间生成模型,由编码层、隐含层和解码层组成,拥有强大的重构输入数据能力。编码层进行均值和方差的推断;隐含层是通过编码层输出的均值和方差计算出的输入数据的专属正态分布;解码层对隐含层包含的特征进行解码生成重构后的数据。通过在肺癌和乳腺癌上的填补实验证明,VAE-MethImp提取的特征更具信息性。在填补精度上,VAE-MethImp比对照方法(均值(Mean)、最近邻(KNN)、主成分分析(PCA)和奇异值分解(SVD))中最优的SVD提升了4.8%。生存分析实验结果显示VAE-MethImp填补的数据具有更好的预测性,同时也证明DNA甲基化与癌症的生存存在直接关联。  相似文献   

3.
传统的变分自编码器将样本展平后直接作为输入数据,当样本为图像数据时,采用这样的方法进行学习效果欠佳.本文提出一种卷积优化的变分自编码器,用多个可变层数的卷积网络预处理图像数据.每个卷积网络设置了不同的参数处理输入数据,再将不同层卷积结果拼接后,作为变分自编码器的输入.在变分自编码模型中增加一个类别编码器,用于计算每个样本的类别分布和原样本集中类别分布的差异,实现聚类.实验证明,本文提出的卷积优化方法相较于无优化的变分自编码器在聚类准确率上得到较大提高,生成图像的质量得到了改善,各类别生成样本在边缘及形状等方面的多样性也都有不同程度的增加.  相似文献   

4.
针对当前谣言检测任务中社交媒体推特平台的推文数据分布复杂且不均衡的特点,提出基于变分自编码器(VAE)的谣言立场分类算法VAE-LSTM。对数据进行预处理后,利用word2vec模型提取推文词向量并输入VAE中进行训练,得到符合简单概率分布的深度特征序列再从中采样获取有效特征,以避免数据量较大的推文类别影响特征向量。在此基础上,使用长短时记忆(LSTM)网络处理向量序列数据进而实现分类。理论分析和实验结果表明,VAE-LSTM算法无须手动提取或添加特征,训练过程简单高效,同时能缓解类间不平衡问题,其应用于实际场景准确率和F1得分分别为0.800和0.494,与时序注意力机制算法、Turing算法、霍克斯过程算法等相比分类性能更好,且较SVM等早期机器学习方法节省了大量数据预处理时间。  相似文献   

5.
针对传统离群点检测算法在类极度不平衡的高维数据集中难以学习离群点的分布模式,导致检测率低的问题,提出了一种生成对抗网络(generative adversarial network,GAN)与变分自编码器(variational auto-encoder,VAE)结合的GAN-VAE算法。算法首先将离群点输入VAE训练,学习离群点的分布模式;然后将VAE与GAN结合训练,生成更多潜在离群点,同时学习正常点与离群点的分类边界;最后将测试数据输入训练后的GAN-VAE,根据正常点与离群点相对密度的差异性计算每个对象的离群值,将离群值高的对象判定为离群点。在四个真实数据集上与六个离群点检测算法进行对比实验,结果表明GAN-VAE在AUC、准确率和F;值上平均提高了5.64%、5.99%和13.30%,证明GAN-VAE算法是有效可行的。  相似文献   

6.
The changing economic conditions have challenged many financial institutions to search for more efficient and effective ways to assess emerging markets. Data envelopment analysis (DEA) is a widely used mathematical programming technique that compares the inputs and outputs of a set of homogenous decision making units (DMUs) by evaluating their relative efficiency. In the conventional DEA model, all the data are known precisely or given as crisp values. However, the observed values of the input and output data in real-world problems are sometimes imprecise or vague. In addition, performance measurement in the conventional DEA method is based on the assumption that inputs should be minimized and outputs should be maximized. However, there are circumstances in real-world problems where some input variables should be maximized and/or some output variables should be minimized. Moreover, real-world problems often involve high-dimensional data with missing values. In this paper we present a comprehensive fuzzy DEA framework for solving performance evaluation problems with coexisting desirable input and undesirable output data in the presence of simultaneous input–output projection. The proposed framework is designed to handle high-dimensional data and missing values. A dimension-reduction method is used to improve the discrimination power of the DEA model and a preference ratio (PR) method is used to rank the interval efficiency scores in the resulting fuzzy environment. A real-life pilot study is presented to demonstrate the applicability of the proposed model and exhibit the efficacy of the procedures and algorithms in assessing emerging markets for international banking.  相似文献   

7.
Nonnegative matrix factorization (NMF) is an unsupervised learning method for decomposing high-dimensional nonnegative data matrices and extracting basic and intrinsic features. Since image data are described and stored as nonnegative matrices, the mining and analysis process usually involves the use of various NMF strategies. NMF methods have well-known applications in face recognition, image reconstruction, handwritten digit recognition, image denoising and feature extraction. Recently, several projective NMF (P-NMF) methods based on positively constrained projections have been proposed and were found to perform better than the standard NMF approach in some aspects. However, some drawbacks still affect the existing NMF and P-NMF algorithms; these include dense factors, slow convergence, learning poor local features, and low reconstruction accuracy. The aim of this paper is to design algorithms that address the aforementioned issues. In particular, we propose two embedded P-NMF algorithms: the first method combines the alternating least squares (ALS) algorithm with the P-NMF update rules of the Frobenius norm and the second one embeds ALS with the P-NMF update rule of the Kullback–Leibler divergence. To assess the performances of the proposed methods, we conducted various experiments on four well-known data sets of faces. The experimental results reveal that the proposed algorithms outperform other related methods by providing very sparse factors and extracting better localized features. In addition, the empirical studies show that the new methods provide highly orthogonal factors that possess small entropy values.  相似文献   

8.
少样本学习方法模拟人类从少量样本中学习的认知过程,已成为机器学习研究领域的热点.针对目前少样本学习迭代过程的任务量较大、过拟合现象严重等问题,文中提出基于深度网络的快速少样本学习算法.首先,利用核密度估计和图像滤波方法向训练集加入多种类型的随机噪声,生成支持集和查询集.再利用原型网络提取支持集和查询集图像特征,并根据Bregman散度,以每类支持集支持样本的中心点作为类原型.然后,使用L2范数度量支持集与查询图像的距离,利用交叉熵反馈损失,生成多个异构的基分类器.最后,采用投票机制融合基分类器的非线性分类结果.实验表明,文中算法能加快少样本学习收敛速度,分类准确率较高,鲁棒性较强.  相似文献   

9.
Time-varying and state shifting are two of the main process factors that cause poor prediction performance of soft sensors. Adaptive soft sensor is commonly an alternative practice to ensure high predictive accuracy. However, the large scale of process data often leads to inefficiency of model updating. In this paper, a streaming variational Bayesian supervised factor analysis (S-VBSFA) model is first proposed to capture the process time-varying and state shifting features through online updating of the posterior of model parameters. During the updating process, the symmetric Kullback–Leibler (SKL) divergence is utilized to determine priors of the next variation Bayesian inference. To improve the modeling efficiency for large-scale process data, the parallel computing strategy is further applied to the streaming model. As a result, the proposed streaming parallel VBSFA (SP-VBSFA) algorithm not only relieves the computing pressure of modeling big process data, but also improves the prediction accuracy and further reduces the tracking time delay for process variations. Two case studies demonstrate the superiority of the proposed method, compared to conventional methods.  相似文献   

10.
针对运动捕获数据的高效匹配问题,提出了一种新的基于四元数描述和EMD( Earth Mover's Distance)的人体运动检索算法。该算法主要包括特征提取和运动匹配两部分。在特征提取部分,为了解决高维数据检索效率低的问题,引入了四元数描述符对关节点的数据信息特征进行描述,通过映射姿态分布的原始数据,并采取K-means聚类方法对待查询动作和运动数据库的特征数据进行降维并归类。在运动匹配部分,根据聚类结果,建立每个特征数据集的距离矩阵,将匹配问题转换为运输优化问题。然后,用EMD算法度量待查询动作和数据库动作之间的相似值。仿真实验结果证明了提出的算法是有效的。  相似文献   

11.
跨模态检索的目标是用户给定任意一个样本作为查询样例,系统检索得到与查询样例相关的各个模态样本,多模态细粒度检索在跨模态检索基础上强调模态的数量至少大于两个,且待检索样本的分类标准为细粒度子类,存在多模态数据间的异构鸿沟及细粒度样本特征差异小等难题。引入模态特异特征及模态共享特征的概念,提出一种多模态细粒度检索框架MS2Net。使用分支网络及主干网络分别提取不同模态数据的模态特异特征及模态共享特征,将两种特征通过多模态特征融合模块进行充分融合,同时利用各个模态自身的特有信息及不同模态数据间的共性及联系,增加高维空间向量中包含的语义信息。针对多模态细粒度检索场景,在center loss函数的基础上提出multi-center loss函数,并引入类内中心来聚集同类别且同模态的样本,根据聚集类内中心来间接聚集同类别但模态不同的样本,同时消减样本间的异构鸿沟及语义鸿沟,增强模型对高维空间向量的聚类能力。在公开数据集FG-Xmedia上进行一对一与一对多的模态检索实验,结果表明,与FGCrossNet方法相比,MS2Net方法mAP指标分别提升65%和48%。  相似文献   

12.
针对实际系统中采集的数据流的不确定性,给异常点检测与修正带来了现实挑战。因此,根据滑动基本窗口采样算法(sliding basic windows sampling,SBWB)与高斯过程回归(Gaussian process regression,GPR)模型的特性,提出了基于SBWS_GPR预测模型的不确定性多数据流的异常检测方法。在基于时间序列采集的历史数据集中,引入索引号,对历史数据集进行聚类,分析数据集与索引号的映射关系,将实时获得的输入数据流通过滑动窗口匹配,实现对单数据流的异常点检测与修正。再利用输入、输出数据间的相关性,基于GPR建立预测模型,比较实时观察的输出数据流与预测模型的输出数据流,最终从输入、输出两种不同通道实现多数据流的异常检测与修正。  相似文献   

13.
基于C#的图片点阵数据提取和转换软件的开发   总被引:1,自引:0,他引:1  
针对嵌入式系统的GUI(Graphical User Interface)设计中对彩色图片和图标的点阵数据的需求,设计一种能从多种常见图片文件中提取点阵数据并转换格式的软件.开发过程采用基于.Net Framework 2.0的C#语言,采用完全面向对象的方法,将文件处理、数据提取、格式转换以及输入和输出等功能封装成一个图像处理类.在绣花机控制器的开发使用中表明,该软件使用方便、功能实用、性能良好,完全达到设计目的,满足了GUI设计的需求.  相似文献   

14.
《Real》1996,2(3):139-152
Images are being generated at an ever increasing rate by diverse military and civilian sources. A content-based image retrieval system is required to utilize information from the image repositories effectively. Content-based retrieval is characterized by several generic query classes. With the existence of the information superhighway, image repositories are evolving in a decentralized fashion on the Internet. This necessitates network transparent distributed access in addition to the content-based retrieval capability.Images stored in low-level formats such as vector and raster are referred to as physical images. Constructing interactive responses to user queries using physical images is not practical and robust. To overcome this problem, we introduce the notion of logical features and describe various features to enable content-based query processing in a distributed environment. We describe a tool named SemCap for extracting the logical features semi-automatically. We also propose an architecture and an application level communication protocol for distributed content-based retrieval. We describe the prototype implementation of the architecture and demonstrate its versatility on two distributed image collections.  相似文献   

15.
In this study, a unified scheme using divergence analysis and genetic search is proposed to determine significant components of feature vectors in high-dimensional spaces, without having to deal with singular matrix problems.In the literature it is observed that three main problems exist in the feature selection process performed in a high-dimensional space. These problems are high computational load, local minima, and singular matrices. In this study, feature selection is realized by increasing the dimension one by one, rather than reducing the dimension. In this sense, the recursive covariance matrices are formulated to decrease the computational load. The use of genetic algorithms is proposed to avoid local optima and singular matrix problems in high-dimensional feature spaces. Candidate strings in the genetic pool represent the new features formed by increasing the dimension. The genetic algorithms investigate the combination of features which give the highest divergence value.In this study, two methods are proposed for the selection of features. In the first method, features in a high-dimensional space are determined by using divergence analysis and genetic search (DAGS) together. If the dimension is not high, the second method is offered which uses only recursive divergence analysis (RDA) without any genetic search. In Section 3 two experiments are presented: Feature determination in a two-dimensional phantom feature space, and feature determination for ECG beat classification in a real data space.  相似文献   

16.
谭桥宇  余国先  王峻  郭茂祖 《软件学报》2017,28(11):2851-2864
弱标记学习是多标记学习的一个重要分支,近几年已被广泛研究并被应用于多标记样本的缺失标记补全和预测等问题.然而,针对特征集合较大、更容易拥有多个语义标记和出现标记缺失的高维数据问题,现有弱标记学习方法普遍易受这类数据包含的噪声和冗余特征的干扰.为了对高维多标记数据进行准确的分类,提出了一种基于标记与特征依赖最大化的弱标记集成分类方法EnWL.EnWL首先在高维数据的特征空间多次利用近邻传播聚类方法,每次选择聚类中心构成具有代表性的特征子集,降低噪声和冗余特征的干扰;再在每个特征子集上训练一个基于标记与特征依赖最大化的半监督多标记分类器;最后,通过投票集成这些分类器实现多标记分类.在多种高维数据集上的实验结果表明,EnWL在多种评价度量上的预测性能均优于已有相关方法.  相似文献   

17.
Jiann-Ming Wu  Pei-Hsun Hsu 《Neurocomputing》2011,74(12-13):2228-2240
This work explores learning LCGM (lattice-connected Gaussian mixture) models by annealed Kullback–Leibler (KL) divergence minimization for a hybrid of topological and statistical pattern analysis. The KL divergence measures the general criteria of learning an LCGM model that is composed of a lattice of multivariate Gaussian units. A planar lattice emulates topological order of cortex-like neighboring relations and built-in parameters of connected Gaussian units represent statistical features of unsupervised data. Learning an LCGM model involves collateral optimization tasks of resolving mixture combinatorics and extracting geometric features from high-dimensional patterns. Under assumption that mixture combinatorics encoded by Potts variables obey the Boltzmann distribution, approximating their joint probability by the product of individual probabilities is qualified by the KL divergence whose minimization under physical-like deterministic annealing faithfully optimizes involved mixture combinatorics and geometric features. Numerical simulations show the proposed annealed KL divergence minimization is effective and reliable for solving generalized TSP, spot identification, self-organization and visualization and sorting of yeast gene expressions.  相似文献   

18.
Ground penetrating Radar (GPR) can detect and deliver the response signal from any buried kind of object like plastic or metallic landmines, stones, and wood sticks. It delivers three kinds of data: Ascan, Bscan, and Cscan. However, it cannot discriminate between landmines and inoffensive objects or ‘clutter.’ One-class classification is an alternative to detect landmines, especially, as landmines features data are unbalanced. In this article, we investigate the effectiveness of the Covariance-guided One-Class Support Vector Machine (COSVM) to detect, discriminate, and locate landmines efficiently. In fact, compared to existing one-class classifiers, the COSVM has the advantage of emphasizing low variance directions. Moreover, we will compare the one-class classification to multiclass classification to tease out the advantage of the former over the latter as data are unbalanced. Our method consists of extracting Ascan GPR data. Extracted features are used as an input for COSVM to discriminate between landmines and clutter. We provide an extensive evaluation of our detection method compared to other methods based on relevant state of the art one-class and multiclass classifiers, on the well-known MACADAM database. Our experimental results show clearly the superiority of using COSVM in landmine detection and localization.  相似文献   

19.
This paper presents OS-Guard(On-Site Guard), a novel on-site signature based framework for multimedia surveillance data management. One of the major concerns in widespread deployment of multimedia surveillance systems is the enormous amount of data collected from multiple media streams that need to be communicated, observed and stored for crime alerts and forensic analysis. This necessitates investigating efficient data management techniques to solve this problem. This work aims to tackle this problem, motivated by the following observation, more data does not mean more information. OS-Guard is a novel framework that attempts to collect informative data and filter out non-informative data on-site, thus taking a step towards solving the data management problem. In the framework, both audio and video cues are utilized by extracting features from the incoming data stream and the resultant real valued feature data is binarized for efficient storage and processing. A feature selection process based on association rule mining selects discriminant features. A short representative sample of the whole database is generated using a novel reservoir sampling algorithm that is stored onsite and used with an support vector machine to classify an important event. Initial experiments for a Bank ATM monitoring scenario demonstrates promising results.  相似文献   

20.
This paper proposes a new method to weight subspaces in feature groups and individual features for clustering high-dimensional data. In this method, the features of high-dimensional data are divided into feature groups, based on their natural characteristics. Two types of weights are introduced to the clustering process to simultaneously identify the importance of feature groups and individual features in each cluster. A new optimization model is given to define the optimization process and a new clustering algorithm FG-k-means is proposed to optimize the optimization model. The new algorithm is an extension to k-means by adding two additional steps to automatically calculate the two types of subspace weights. A new data generation method is presented to generate high-dimensional data with clusters in subspaces of both feature groups and individual features. Experimental results on synthetic and real-life data have shown that the FG-k-means algorithm significantly outperformed four k-means type algorithms, i.e., k-means, W-k-means, LAC and EWKM in almost all experiments. The new algorithm is robust to noise and missing values which commonly exist in high-dimensional data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号