基于隐马尔科夫模型的DNA序列分类方法 DNA Sequence Classification Method Based on Hidden Markov Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于隐马尔科夫模型的DNA序列分类方法

引用本文：	郭彦明,陈黎飞,郭躬德.基于隐马尔科夫模型的DNA序列分类方法[J].计算机系统应用,2014,23(7):24-30.

作者姓名：	郭彦明陈黎飞郭躬德

作者单位：	福建师范大学数学与计算机科学学院, 福州 350007;福建师范大学数学与计算机科学学院, 福州 350007;福建师范大学数学与计算机科学学院, 福州 350007

基金项目：	国家自然科学基金（61175123）

摘要：	DNA序列分类是生物信息学的一项基础任务，目的是根据结构或功能的相似性预测DNA序列所属的类别。为进行有效分类，如何将序列映射到特征向量空间并最大程度地保留序列中蕴含的碱基间顺序关系是一项困难的任务。为克服现有方法容易导致因DNA序列碱基残缺而影响分类精度等问题，提出一种新的DNA序列特征表示方法。新方法首先为每条序列训练一个隐马尔科夫模型（HMM），然后将DNA序列投影到由HMM状态转移概率矩阵的特征向量构成的向量空间中。基于这种新的特征表示法，构造了一种 K-NN分类器对DNA序列进行分类。实验结果表明，新型特征表示方法可以较为完整地保留 DNA 序列中不同碱基间的关系，充分反映序列的结构信息，从而有效提高了序列的分类精度。
关键词：	DNA序列分类特征表示隐马尔科夫模型特征值分解
收稿时间：	2013/11/24 0:00:00
修稿时间：	2013/12/24 0:00:00
DNA Sequence Classification Method Based on Hidden Markov Model

GUO Yan-Ming,CHEN Li-Fei and GUO Gong-De.DNA Sequence Classification Method Based on Hidden Markov Model[J].Computer Systems& Applications,2014,23(7):24-30.

Authors:	GUO Yan-Ming CHEN Li-Fei and GUO Gong-De

Affiliation:	School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China;School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China;School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China

Abstract:	DNA sequence classification is a basic task of bioinformatics, which aims at predicting the category of DNA sequences in terms of their structural or functional similarity. In order to perform an effective classification, how to map the sequences into a feature vector space while retaining the chronological relationships hidden in the sequences as much as possible is currently a difficult task. To address the problems of existing methods, which easily result in affecting the classification accuracy because of incomplete representation of the nucleotides in DNA sequences, in this paper, a new feature representation method for DNA sequence is proposed. In the new method, first, each sequence is used to train a Hidden Markov Model (HMM); then, the DNA sequences are projected onto a vector space spanned by the eigenvectors of the HMM state transition probability matrix. Based on the new feature representation, a K-Nearest Neighbour classifier is constructed to classify DNA sequences over the vector space. Experimental results show that the new feature representation is able to represent the chronological relationships between different nucleotides in a DNA sequences more integrally. Consequently, the structural information hidden in the sequences can be reflected fully, which in turn improve the classification accuracy of sequences.

Keywords:	DNA sequence classification feature representation Hidden Markov Models (HMM) eigenvalue decomposition
本文献已被维普等数据库收录！
	点击此处可从《计算机系统应用》浏览原始摘要信息
	点击此处可从《计算机系统应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏