首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 640 毫秒
1.
基因表达数据是由DNA微阵列实验产生的大规模数据矩阵,双聚类算法是挖掘数据矩阵中具有较高相关性的子矩阵,能有效地提取生物学信息。针对当前多目标双聚类优化算法易于陷入早熟和局部最优解等问题,论文提出了基于逻辑运算的离散人工蜂群优化双聚类算法(LOABCB算法),一方面引入人工蜂群算法增强双聚类的全局寻优能力,另一方面通过逻辑运算邻域搜索策略寻找最优双聚类,提高搜索效率。采用基因表达数据的酵母细胞数据集进行实验,结果表明论文算法能够获得实验效果优的具有生物意义的双聚类。  相似文献   

2.
基因数据双聚类是基因表达数据矩阵中具有相近的表达水平的子矩阵,其中的行和列分别代表基因子集和条件子集。双聚类算法则是在基因数据矩阵的行和列2个方向上同时聚类以找出这样的子矩阵。本文提出基于模拟退火与粒子群优化的混合优化算法,避免单纯模拟退火法中的概率突跳性缺点。我们算法采用自底向上的搜索策略,首先生成双聚类种子,然后采用混合优化算法添加种子的行和列,找出最优聚类结果。在酵母细胞基因数据集的实验中,我们双聚类的各项指标能够达到高质量结构,验证了本文方法的有效性。  相似文献   

3.
一种基于层次聚类的双聚类算法   总被引:1,自引:0,他引:1  
双聚类是为了发现基因表达数据矩阵中具有生物意义的矩阵而提出新的聚类方法,目的是通过分别交换行和列,将数据相似的数据聚合在一起组合成子矩阵,这样的子矩阵具有生物意义.本文根据均方残值理论全局优化双聚类,首先用层次聚类算法生成初始的数据矩阵,然后对这些初始的数据矩阵添加行和列,并进行优化生成最终的双聚类.实验表明,该算法能够高效地生成表达水平一致的双聚类,效果令人满意.  相似文献   

4.
SUBCLU高维子空间聚类算法在自底向上搜索最大兴趣子空间类的过程中不断迭代产生中间类,这些中间类的产生消耗了大量时间,针对这一问题,提出改进算法BDFS-SUBCLU,采用一种带回溯的深度优先搜索策略来挖掘最大兴趣子空间中的类,通过这种策略避免了中间类的产生,降低了算法的时间复杂度。同时BDFS-SUBCLU算法在子空间中对核心点增加一种约束,通过这个约束条件在一定程度上避免了聚类过程中相邻的类由于特殊的数据点合为一类的情况。在仿真数据集和真实数据集上的实验结果表明BDFS-SUBCLU算法与SUBCLU算法相比,效率和准确性均有所提高。  相似文献   

5.
针对现有协同过滤算法普遍存在数据稀疏、可扩展性低、计算量大的缺点,提出一种基于BC-AW的协同过滤推荐算法,引入联合聚类(BlockClust,BC)和正则化迭代最小二乘法(Alternating least squares with Weighted regularization,AW),首先对原评分矩阵进行用户—项目双维度的联合聚类,接着产生具有相同模式评分块的多个子矩阵,通过分析得出这些子矩阵规模远小于原评分矩阵,从而有效降低预测阶段的计算量.然后分别对每个子矩阵应用正则化迭代最小二乘法来预测子矩阵的未知评分,进而实现推荐.经仿真实验表明,本文算法与传统的协同过滤算法比较,能有效改善稀疏性、可扩展性和计算量的问题.  相似文献   

6.
赵军  徐晓燕 《计算机应用》2016,36(10):2710-2714
为解决幂迭代聚类算法并行实现中存在的编程繁琐、效率低下等问题,基于Spark大规模数据通用计算引擎及其GraphX组件,提出了一种在分布式环境下实现幂迭代聚类的方法。首先,利用某种相似性度量方法,将原始数据转换成一个可以视为图的亲和矩阵;然后,通过顶点切割,把行归一化后的亲和矩阵切分成若干个小图,分别存储在不同的机器上;最后,利用Spark基于内存计算的特点,对存储在集群中的图进行多次迭代计算,得到这个图的一个切割,图的每一个划分子图对应一个类簇。在不同规模的数据集和不同executor个数下进行的实验结果表明,基于GraphX的分布式幂迭代聚类算法具有良好的可扩展性,算法运行时间与executor个数呈负相关的线性关系,在6个executor下,与单个executor相比,算法的加速比达到了2.09到3.77。同时,通过与基于Hadoop的幂迭代聚类进行对比,在新闻数量为40000篇时,运行时间降低了61%。  相似文献   

7.
乔永坚  刘晓琳  白亮 《计算机应用》2022,42(11):3322-3329
针对高维特征缺失数据在聚类过程中面临的因数据高维引发的维度灾难问题和数据特征缺失导致的样本间有效距离计算失效问题,提出一种面向高维特征缺失数据的K最近邻(KNN)插补子空间聚类算法KISC。首先,利用高维特征缺失数据的子空间下的近邻关系对原始空间下的特征缺失数据进行KNN插补;然后,利用多次迭代矩阵分解和KNN插补获得数据最终可靠的子空间结构,并在该子空间结构进行聚类分析。在6个图像数据集原始空间的聚类结果表明,相较于经过插补后直接进行聚类的对比算法,KISC算法聚类效果更好,说明子空间结构能够更加容易且有效地识别数据的潜在聚类结构;在6个高维数据集子空间下的聚类结果显示,KISC算法在各个数据集的聚类性能均优于对比算法,且在大多数据集上取得了最优的聚类精确度(ACC)和标准互信息(NMI)。KISC算法能够更加有效地处理高维特征缺失数据,提高算法的聚类性能。  相似文献   

8.
当前推荐系统多数存在推荐准确性低、受稀疏性影响大且稳定性差的问题,因此,在Coclus聚类算法的基础上,提出一种评分矩阵与联合聚类的推荐算法。通过Coclus联合聚类,利用图模块度最大化理论分别将评分矩阵的行与列分成g类,经过行列变换形成g×g个低秩评分子矩阵,并对低秩评分子矩阵进行矩阵分解,填充缺失值,以提高推荐质量,在矩阵分解阶段采用改进的非负矩阵分解算法,通过引入L1、L2范数分别提高特征值选择能力和防止模型过拟合,并利用坐标轴下降的迭代算法进行参数更新。实验结果表明,与基线算法相比,该算法具有较高的推荐准确率,且稳定性较强。  相似文献   

9.
双向聚类已成为分析基因表达数据的一种重要工具,可以同时从基因和条件两个方向寻找具有相同表达波动的簇。但双向聚类是一种多目标优化的局部搜索算法,处理繁杂的基因数据时容易陷入局部最优。为提高算法的全局搜索能力,提出了一种多样性选择的量子粒子群双向聚类算法(Diversify-Optional QPSO,DOQPSO)。算法首先采用DOQPSO处理基因数据,然后用改进的FLOC算法进行贪心迭代寻找双向聚类,以求得更为理想的结果。算法通过实验仿真,并与FLOC算法和QPSO算法进行比较,结果证明DOQPSO双向聚类算法具有更好的全局寻优能力,且聚类效果更佳。  相似文献   

10.
协同聚类是对数据矩阵的行和列两个方向同时进行聚类的一类算法。本文将双层加权的思想引入协同聚类,提出了一种双层子空间加权协同聚类算法(TLWCC)。TLWCC对聚类块(co-cluster)加一层权重,对行和列再加一层权重,并且算法在迭代过程中自动计算块、行和列这三组权重。TLWCC考虑不同的块、行和列与相应块、行和列中心的距离,距离越大,认为其噪声越强,就给予小权重;反之噪声越弱,给予大权重。通过给噪声信息小权重,TLWCC能有效地降低噪声信息带来的干扰,提高聚类效果。本文通过四组实验展示TLWCC算法识别噪声信息的能力、参数选取对算法聚类结果的影响程度,算法的聚类性能和时间性能。  相似文献   

11.
Bio-chip data that consists of high-dimensional attributes have more attributes than specimens. Thus, it is difficult to obtain covariance matrix from tens thousands of genes within a number of samples. Feature selection and extraction is critical to remove noisy features and reduce the dimensionality in microarray analysis. This study aims to fill the gap by developing a data mining framework with a proposed algorithm for cluster analysis of gene expression data, in which coefficient correlation is employed to arrange genes. Indeed, cluster analysis of microarray data can find coherent patterns of gene expression. The output is displayed as table list for convenient survey. We adopt the breast cancer microarray dataset to demonstrate practical viability of this approach.  相似文献   

12.
对于时间序列的基因表达数据,传统的聚类算法都是以距离为相似性度量标准,没有考虑基因随时间变化的相似趋势。从基因变化的趋势出发,构造了一种新的模糊相似关系矩阵,提出了改进的基于模糊相似关系的聚类算法,并以该算法计算FCM的初始聚类中心。将该方法应用在酵母菌基因表达数据中,实验结果表明该算法不仅克服了FCM算法易陷入局部极小值、对初值敏感的缺点,而且能够发现一些表达模式变化趋势相似的共调控基因。  相似文献   

13.
针对驱动通路识别的相关研究依赖传统生物实验方法,存在费时费力且经济成本高的问题,提出一种新的二进制癌症驱动通路识别方法PEA-BLMWS。首先,利用已有的基因表达数据,通过对比正常基因与突变基因表达量的差异,挖掘潜在的基因突变数据;其次,引入蛋白质相互作用网络数据,构建出一个改进的二进制线性最大权重子矩阵模型;最后,提出一种双亲协同进化算法求解该矩阵模型。在GBM(glioblastoma)和OVCA(ovarian cancer)数据集上的实验结果表明,相比于其他先进的Dendrix、CCA-NMWS和CGP-NCM识别方法,PEA-BLMWS识别的基因集中有更多基因富集在已知的信号通路中,未富集在信号通路中的基因也与癌症的发生密切相关,故该识别方法可作为一种驱动通路识别的有效工具。  相似文献   

14.
Gene expression data are expected to be of significant help in the development of efficient cancer diagnosis and classification platforms. One problem arising from these data is how to select a small subset of genes from thousands of genes and a few samples that are inherently noisy. This research aims to select a small subset of informative genes from the gene expression data which will maximize the classification accuracy. A model for gene selection and classification has been developed by using a filter approach, and an improved hybrid of the genetic algorithm and a support vector machine classifier. We show that the classification accuracy of the proposed model is useful for the cancer classification of one widely used gene expression benchmark data set.  相似文献   

15.
基因芯片是微阵列技术的典型代表,它具有高通量的特性和同时检测全部基因组基因表达水平的能力。应用微阵列芯片的一个主要目的是基因表达模式的发现,即在基因组水平发现功能相似,生物学过程相关的基因簇;或者将样本分类,发现样本的各种亚型。例如根据基因表达水平对癌症样本进行分类,发现疾病的分子亚型。非负矩阵分解NMF方法是一种非监督的、非正交的、基于局部表示的矩阵分解方法。近年来这种方法被越来越多地应用在微阵列数据的分类分析和聚类发现中。系统地介绍了非负矩阵分解的原理、算法和应用,分解结果的生物学解释,分类结果的质量评估和基于NMF算法的分类软件。总结并评估了NMF方法在微阵列数据分类和聚类发现应用中的表现。  相似文献   

16.

Cancer classification is one of the main steps during patient healing process. This fact enforces modern clinical researchers to use advanced bioinformatics methods for cancer classification. Cancer classification is usually performed using gene expression data gained in microarray experiment and advanced machine learning methods. Microarray experiment generates huge amount of data, and its processing via machine learning methods represents a big challenge. In this study, two-step classification paradigm which merges genetic algorithm feature selection and machine learning classifiers is utilized. Genetic algorithm is built in MapReduce programming spirit which makes this algorithm highly scalable for Hadoop cluster. In order to improve the performance of the proposed algorithm, it is extended into a parallel algorithm which process on microarray data in distributed manner using the Hadoop MapReduce framework. In this paper, the algorithm was tested on eleven GEMS data sets (9 tumors, 11 tumors, 14 tumors, brain tumor 1, lung cancer, brain tumor 2, leukemia 1, DLBCL, leukemia 2, SRBCT, and prostate tumor) and its accuracy reached 100% for less than 25 selected features. The proposed cloud computing-based MapReduce parallel genetic algorithm performed well on gene expression data. In addition, the scalability of the suggested algorithm is unlimited because of underlying Hadoop MapReduce platform. The presented results indicate that the proposed method can be effectively implemented for real-world microarray data in the cloud environment. In addition, the Hadoop MapReduce framework demonstrates substantial decrease in the computation time.

  相似文献   

17.
In cancer classification based on gene expression data, it would be desirable to defer a decision for observations that are difficult to classify. For instance, an observation for which the conditional probability of being cancer is around 1/2 would preferably require more advanced tests rather than an immediate decision. This motivates the use of a classifier with a reject option that reports a warning in cases of observations that are difficult to classify. In this paper, we consider a problem of gene selection with a reject option. Typically, gene expression data comprise of expression levels of several thousands of candidate genes. In such cases, an effective gene selection procedure is necessary to provide a better understanding of the underlying biological system that generates data and to improve prediction performance. We propose a machine learning approach in which we apply the l1 penalty to the SVM with a reject option. This method is referred to as the l1 SVM with a reject option. We develop a novel optimization algorithm for this SVM, which is sufficiently fast and stable to analyze gene expression data. The proposed algorithm realizes an entire solution path with respect to the regularization parameter. Results of numerical studies show that, in comparison with the standard l1 SVM, the proposed method efficiently reduces prediction errors without hampering gene selectivity.  相似文献   

18.
提出一种基于鱼群优化算法和Cholesky分解的改进的正则极限学习机算法(FSC-RELM)来对基因表达数据进行分类。FSC-RELM算法中,首先用鱼群优化算法对RELM输入层权值进行优化,其中目标函数定义为误差函数的倒数;再对RELM输出层权值矩阵进行分解,采用Cholesky分解法进行优化,以提高算法速度,减少训练时间。为了评价算法性能,对若干标准基因数据集进行了实验,结果表明,FSC-RELM算法在较短的时间内可以获得较高的分类精度,性能优异。  相似文献   

19.
This paper combines a powerful algorithm, called Dongguang Li (DGL) global optimization, with the methods of cancer diagnosis through gene selection and microarray analysis. A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is proposed and applied to two test cancer cases, colon and leukemia. The study attempts to analyze multiple sets of genes simultaneously, for an overall global solution to the gene??s joint discriminative ability in assigning tumors to known classes. With the workable concepts and methodologies described here an accurate classification of the type and seriousness of cancer can be made. Using the orthogonal arrays for sampling and a search space reduction process, a computer program has been written that can operate on a personal laptop computer. Both the colon cancer and the leukemia microarray data can be classified 100% correctly without previous knowledge of their classes. The classification processes are automated after the gene expression data being inputted. Instead of examining a single gene at a time, the DGL method can find the global optimum solutions and construct a multi-subsets pyramidal hierarchy class predictor containing up to 23 gene subsets based on a given microarray gene expression data collection within a period of several hours. An automatically derived class predictor makes the reliable cancer classification and accurate tumor diagnosis in clinical practice possible.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号