首页 | 官方网站   微博 | 高级检索  
     

一种基于哈夫曼判定的蛋白质分类方法
引用本文:何红洲,周明天.一种基于哈夫曼判定的蛋白质分类方法[J].计算机工程,2013(12):181-185,190.
作者姓名:何红洲  周明天
作者单位:[1]绵阳师范学院数学与计算机科学学院,四川绵阳621000 [2]电子科技大学计算机科学与工程学院,成都611731
基金项目:四川省教育厅自然科学研究基金资助项目(12ZB070)
摘    要:已有的仿射传播聚类算法不能很好地反映复杂蛋白质序列本身的聚类结构。为此,提出一种基于哈夫曼判定的蛋白质分类方法。在计算广义置换式匹配相似度的基础上,使用已有的自适应仿射传播算法聚类蛋白质序列。采用哈夫曼编码方法,通过限制平均码长使聚类结果能反映蛋白质序列家族的聚类结构。在蛋白质同源聚类数据库和蛋白质结构分类数据库的6个数据集上进行实验,结果表明,该方法与adAP、谱聚类、SMS和TribeMCL方法相比,不仅能获得更接近于数据集家族的聚类数目及更紧凑的聚类结构,而且F—measure指标平均估值分别高出19.67%、8.7%、9.5%和43.51%。

关 键 词:聚类分析  蛋白质序列  广义置换式匹配相似度  仿射传播聚类  哈夫曼判定  F-measure指标

A Classification Method of Protein Based on Huffman Decision
HE Hong-zhou,ZHOU Ming-tian.A Classification Method of Protein Based on Huffman Decision[J].Computer Engineering,2013(12):181-185,190.
Authors:HE Hong-zhou  ZHOU Ming-tian
Affiliation:2 (1. College of Mathematics & Computer Science, Mianyang Normal University, Mianyang 621000, China; 2. School of Computer Science and Engineering, University of Electronic Science & Technology, Chengdu 611731, China)
Abstract:Existed Affinity Propagation(AP) clustering algorithm can not reflect the clustering structure of the complex protein sequences, This paper proposes an adaptive AP classification method based on Generalized SMS and Huffman Decision(adAP/GSHD). Protein sequences are clustered via generalized Substitution Matching Similarity(gSMS) and existed adaptive affinity propagation(adAP) algorithm. It uses Huffman coding and confines the average code length of clustering results to embody the family clustering structure of protein sequences. By experiment of test adAP/GSHD and comparing its performance with other four classic clustering methods on six datasets of Clusters of Orthologous Groups(COG) of proteins database and Structural Classification of Proteins(SCOP) database, results demonstrate that this method not only can acquire number of clusters more approximately to the correct family number of clusters and more compact clustering structure for a given set of proteins, but also the average F-measure is 19.67%, 8.7%, 9.5% and 43.81% better than that of adAP, SMS, Spectral Clustering and TribeMCL respectively.
Keywords:clustering analysis  protein sequence  generalized Substitution Matching Similarity(gSMS)  Affinity Propagation(AP)clustering  Huffrnan decision  F-measure index
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号