首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 125 毫秒
1.
基因芯片技术在肿瘤分型分类的研究中得到了广泛的应用.为了处理肿瘤基因表达谱数据,建立肿瘤分类预测模型,文中采用基因表达差异显著性分析方法,支持向量机,遗传算法相结合的多步骤降维分类方法.采用该方法处理大肠癌和白血病数据集,筛选到基因数量较少并且分类准确度较高的特征基因子集.实验结果表明,文中的方法可以快速有效地筛选肿瘤特征基因,获得更好的分类效果.  相似文献   

2.
数据降维就是将数据集从高维特征空间向低维特征空间的映射.传统的主成分分析(PCA)算法是一种常用的线性数据降维算法.但是耗时太长,降维结果不够好,同时不能满足实际分类要求.为此,通过引入互信息可信度,提出了一种基于互信息综合可信度的主成分分析(MIS-PCA)数据降维算法.该算法首先介绍互信息(MI)、相对互信息可信度(MIR)和绝对互信息可信度(MIA)的思想;然后根据MIA和MIR求解互信息综合可信度(MIS),利用互信息综合可信度进行特征筛选;最后再运用PCA算法对处理后的数据进行降维,并将降维后的数据采用KNN、SVM算法进行分类.对比PCA、E-PCA算法,通过试验表明该方法的降维结果较好、分类精确度较高.  相似文献   

3.
主要采用偏最小二乘法和线性判别分析(LDA)有监督分类的方法来对基因芯片(微阵列)数据进行分析.PCA,PLS是一种提取海量数据有效特征的有效方法,而且可以获得与原来基因芯片数据更为接近的成分的提取特征的效果.比较PCA降维和PLS降维对LDA统计判别分类的效果.得出的结论可为工业应用提供科学依据.  相似文献   

4.
针对反应离子刻蚀工艺,研究其状态监测与识别.采用主元分析(PCA)方法对原始数据进行降维,提取出有效的特征子集,再应用SVM建立失效状态的诊断识别模型,分析模型参数对失效状态的分类识别效果.采用主元分析法进行数据降维,从多监控量中提取影响最大的特征子集,再基于支持向量机(SVM)算法建立了失效状态的诊断识别模型,并分析了模型参数对失效状态的分类识别效果.研究结果验证了基于SVM方法的有效性,表明该模型具有高效的模式识别能力,可应用于存在小样本问题的其他半导体工艺状态分类和识别中.  相似文献   

5.
针对肿瘤基因数据因维度高和冗余基因较多而导致分类精度低的问题,提出一种基于PCA和信息增益的肿瘤特征基因选择方法.该方法首先使用PCA算法剔除冗余基因,获得预选特征基因子集;然后利用信息增益算法对预选特征基因子集进行优化选取,得到特征基因子集;最后采用不同分类模型对特征基因子集进行仿真实验.实验结果表明,所提方法提高了基因表达谱的分类精度,从而表明致病基因被有效地选取出来.  相似文献   

6.
文本自动分类的一项关键技术是特征选择。本文针对信息过滤的特点,对特征选择方法进行了改进,提出了一种基于语义神经网络的文本特征选择方法。首先对原始特征集进行初始筛选,去除冗余特征及噪声后,对得到的特征子集采用语义神经网络进行智能的特征选择,其核心是关联度及激活变量的计算。从而得出代表问题空间的最优特征子集,实现降维并提高分类精度。实验证明,该方法可以极大地降低文本的维数,提高文本过滤的质量。  相似文献   

7.
基于多元图形特征融合原理的降维方法研究   总被引:1,自引:0,他引:1  
降维是将高维模式映射到低维子空间的过程.在降维后的低维子空间进行分类往往能得到更好的效果.本文以高维数据为研究对象,采用多元描述图对高维数据进行可视化表达,采用多元图图形特征融合的方法对高维数据进行降维,用K邻分类器进行分类效果评价.与Fisher线性判别及其他一些常用非线性降维方法相比,本文所提方法在数据的可视化以及分类精度等方面均有较好效果.  相似文献   

8.
基因芯片技术的出现改变了生物医学研究的前景,其产生的海量数据是限制其发展的瓶颈问题。论文针对基因芯片数据量大、样本数低和基因维数高的特点,提出了一种对基因芯片数据进行分类的降维近似支持向量机DRPSVM基因芯片数据分类器。DRPSVM采用降维的二次规划算法,使得该算法的时间复杂度和空间复杂度比传统的PSVM算法均有降低。通过在CAMDA2000、colon 1 dataset和colon 2 dataset等基因芯片数据集上的与BP、Nearest、RBF、SVM分类器的分类性能比较,DRPSVM在数据样本少、数据维数急剧升高时,分类性能稳定、存在唯一的最优解、训练时间快,适合基因芯片数据分类的应用环境。  相似文献   

9.
随着DNA微列阵技术的发展,利用基因表达谱数据进行生物信息的有效挖掘已经成为研究热点.因此,该文中提出将点的代数连通强度与非负矩阵分解相结合的方法对基因表达数据进行分类处理.首先利用点的代数连通强度剔除受外界因素影响过大的基因数据并用修正的特征计分准则进行计分排序,选取具有高计分的基因子集;接着利用近来流行的非负矩阵分解将该基因子集映射到极低维的特征空间;最后利用SVM分类器实现分类实验.通过几组公开的基因表达谱数据集的实验结果以及与其他方法的对比分析,验证了该方法是有效的、可行的.  相似文献   

10.
针对肌电信号特征维数高、运算效率低等问题,提出了一种基于ReliefF算法与遗传算法(GA)相结合的肌电信号特征选择方法.分析了肌电信号的特征,运用小波分析对肌电信号进行特征提取,采用ReliefF算法评估所提取的高维特征信号的权值,以选出对分类效果影响显著(权值较大)的特征子集,采用GA进一步筛选出分类效果最佳的特征子集,并对比分析了基于ReliefFGA-Wrapper算法与全局搜索算法对肌电信号处理的时间和分类效果.结果表明,所提出的方法能够提高运算效率并具有很好的分类效果.  相似文献   

11.
Gene association study is one of the major challenges of biochip technology both for gene diagnosis where only a gene subset is responsible to some diseases, and for treatment of curse of dimensionality which occurs especially in DNA microarray datasets where there are more than thousands of genes and only a few number of experiments (samples). This paper presents a gene selection method by training linear support vector machine (SVM)/nonlinear MLP (multi-layer perceptron) classifiers and testing them with cross validation for finding a gene subset which is optimal/suboptimal for diagnosis of binary/multiple disease types. Genes are selected with linear SVM classifier for the diagnosis of each binary disease types pair and tested by leave-one-out cross validation; then, genes in the gene subset initialized by the union of them are deleted one by one by removing the gene which brings the greatest decrease of the generalization power, for samples, on the gene subset after removal, where generalization is measured by training MLPs with leave-one-out and leave-4-out cross validations. The proposed method was tested with experiments on real DNA microarray MIT data and NCI data. The result shows that it outperforms conventional SNR method in separability of the data with expression levels on selected genes. For real DNA microarray MIT/NCI data, which is composed of 7129/2308 effective genes with only 72/64 labeled samples belonging to 2/4 disease classes, only 11/6 genes are selected to be diagnostic genes. The selected genes are tested by classification of samples on these genes with SVM/MLP with leave-one-out/both leave-one-out and leave-4-out cross validations. The result of no misclassification indicates that the selected genes can be really considered as diagnostic genes for the diagnosis of the corresponding diseases.  相似文献   

12.
Gene association study is one of the major challenges of biochip technology both for gene diagnosis where only a gene subset is responsible for some diseases, and for the treatment of the curse of dimensionality which occurs especially in DNA microarray datasets where there are more than thousands of genes and only a few number of experiments (samples). This paper presents a gene selection method by training linear support vector machine (SVM)/nonlinear MLP (multilayer perceptron) classifiers and testing them with cross-validation for finding a gene subset which is optimal/suboptimal for the diagnosis of binary/multiple disease types. Genes are selected with linear SVM classifier for the diagnosis of each binary disease types pair and tested by leave-one-out cross-validation; then, genes in the gene subset initialized by the union of them are deleted one by one by removing the gene which brings the greatest decrease of the generalization power, for samples, on the gene subset after removal, where generalization is measured by training MLPs with leaveone-out and leave-four-out cross-validations. The proposed method was tested with experiments on real DNA microarray MIT data and NCI data. The result shows that it outperforms conventional SNR method in the separability of the data with expression levels on selected genes. For real DNA microarray MIT/NCI data, which is composed of 7129/2308 effective genes with only 72/64 labeled samples belonging to 2/4 disease classes, only 11/6 genes are selected to be diagnostic genes. The selected genes are tested by the classification of samples on these genes with SVM/MLP with leave-one-out/both leave-one-out and leave-four-out cross-validations. The result of no misclassification indicates that the selected genes can be really considered as diagnostic genes for the diagnosis of the corresponding diseases.  相似文献   

13.
Gene association study is one of the major challenges of biochip technology both for gene diagnosis where only a gene subset is responsible for some diseases, and for the treatment of the curse of dimensionality which occurs especially in DNA microarray datasets where there are more than thousands of genes and only a few number of experiments (samples). This paper presents a gene selection method by training linear support vector machine (SVM)/nonlinear MLP (multilayer perceptron) classifiers and testing them with cross-validation for finding a gene subset which is optimal/suboptimal for the diagnosis of binary/multiple disease types. Genes are selected with linear SVM classifier for the diagnosis of each binary disease types pair and tested by leave-one-out cross-validation; then, genes in the gene subset initialized by the union of them are deleted one by one by removing the gene which brings the greatest decrease of the generalization power, for samples, on the gene subset after removal, where generalization is measured by training MLPs with leave-one-out and leave-four-out cross-validations. The proposed method was tested with experiments on real DNA microarray MIT data and NCI data. The result shows that it outperforms conventional SNR method in the separability of the data with expression levels on selected genes. For real DNA microarray MIT/NCI data, which is composed of 7129/2308 effective genes with only 72/64 labeled samples belonging to 2/4 disease classes, only 11/6 genes are selected to be diagnostic genes. The selected genes are tested by the classification of samples on these genes with SVM/MLP with leave-one-out/both leave-one-out and leave-four-out cross-validations. The result of no misclassification indicates that the selected genes can be really considered as diagnostic genes for the diagnosis of the corresponding diseases.  相似文献   

14.
中文文本数据的半结构化甚至非结构化的特点使得其分类存在着特征高维的问题,传统单一的特征降维方法难以满足大数据时代的文本分类需求.基于此,提出了一种基于卡方统计(Chi-square statistics,CHI)和主成分分析(principal component analysis,PCA)的混合特征降维方法(CHI-...  相似文献   

15.
物联网(internet of things,IoT)技术中结合多个数据源互补信息提高数据分类准确率的研究受到了越来越多的关注。针对物联网无线传感器采集到数据的多源异构特性,给出了一种基于改进多核学习支持向量机(improved multi-kernel learning-support vector machine,IMKL-SVM)的IoT数据分类方法。传统的多核学习方法中核函数主要是采用经验法选取核函数类型及参数,本文改进方法在确定核函数类型及参数时分为两步:首先采用交叉验证方法初步确定核函数类型及参数;其次在第一步结果中利用支持向量机(SVM)同时训练样本和优化多核函数的类型及参数。实验中针对温度、湿度、光照、大气压力等4种数据设计了两组数据——第一组数据被标记为上午、中下午、傍晚、夜间4类,第二组数据被标记为白天、傍晚、夜间3类,比较了本文的IMKL-SVM方法、单核SVM方法及传统MKL-SVM方法在两组数据集上的分类准确率。此外,针对UCI公开数据集AReM进行了分类实验,实验结果表明IMKL-SVM方法针对具有多源异构特性的物联网数据实现了较高的分类准确率。  相似文献   

16.
空间听觉重建中,头相关传输函数(head-related transfer function,HRTF)庞大的数据量是影响虚拟声源合成效率的主要因素之一.为了减少HRTF的数据存储,提出一种局部线性嵌入(locally linear embedding,LLE)空间听觉重建方法.通过LLE对高维HRTF数据进行降维,在低维数据空间提取与方位感知相关的特征,然后利用聚类算法进行分类,得到特征HRTF,而其余非特征HRTF则可以利用特征HRTF通过改进插值算法进行重构.与现有的主成分分析法(principal component analysis,PCA)相比,利用LLE降维后的数据保留了更多的感知信息,利用HRTF数据间的内在关系,对插值后的数据进行修正,可减少重建误差.仿真结果表明,该方法能够有效地减少HRTF的存储数据量,有利于提高虚拟声源的合成效率.  相似文献   

17.
为探究肾母细胞瘤(Nephroblastoma)发生的关键基因,并筛选出潜在的治疗靶点及生物标志,采用由GEO数据库获取的基因芯片GSE11151和GSE53224,经归一化处理后,通过GO和KEGG分析筛选出差异基因,并通过构建蛋白互作网络获得其中的关键基因.总计获得差异基因404个,其中上调基因385个,下调基因19个.PMCH、CCR5、CCR7、RGS1和KNG1作为关键基因,涉及趋化因子信号通路、G蛋白偶联信号通路,参与肿瘤微环境的形成.这5个关键基因在肾母细胞瘤的发生中有着重要作用,并可能作为潜在治疗靶点及生物标志.  相似文献   

18.
Kamran  Ullah  Khan  杨建 《清华大学学报》2007,12(1):97-104
Different methods proposed so far for accurate classification of land cover types in polarimetric synthetic aperture radar (SAR) image are data specific and no general method is available. A novel hybrid framework for this classification was developed in this work. A set of effective features derived from the coherence matrix of polarimetric SAR data was proposed. Constituents of the feature set are wavelet, texture, and nonlinear features. The proposed feature set has a strong discrimination power. A neural network was used as the classification engine in a unique way. By exploiting the speed of the conjugate gradient method and the convergence rate of the Levenberg-Marquardt method (near the optimal point), an overall speed up of the classification procedure was achieved. Principal component analysis (PCA) was used to shrink the dimension of the feature vector without sacrificing much of the classification accuracy. The proposed approach is compared with the maximum likelihood estimator (MLE) based on the complex Wishart distribution and the results show the superiority of the proposed method, with the average classification accuracy by the proposed method (95.4%) higher than that of the MLE (93.77%). Use of PCA to reduce the dimensionality of the feature vector helps reduce the memory requirements and computational cost, thereby enhancing the speed of the process.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号