首页 | 官方网站   微博 | 高级检索  
     

关联性特征在宏基因组分装中的应用
引用本文:张倩倩.关联性特征在宏基因组分装中的应用[J].电子器件,2013,36(4):450-454.
作者姓名:张倩倩
作者单位:东南大学生物科学与医学工程学院
基金项目:国家自然基金项目(2012CB316501)
摘    要:随着高通量测序技术的成熟,宏基因组学已经成为了一门新兴的热门学科。从混合的微生物测序片段中正确的分装DNA片段一直是一个挑战。分装的准确性直接影响宏基因组学研究的深度和效率,提高分装准确性的关键在于提取出一种有效的宏基因组测序片段的序列特征。目前主流分装方法可以分为两类,一类是基于序列相似性比较;另一类是基于序列特征。本文深入研究碱基之间的关联性,运用一种基于碱基关联性特征的分装方法(碱基对关联性),利用机器学习算法实现准确的分装,在对不同物种层次不同复杂度的模拟宏基因组数据集进行分装时都能保持良好的性能。并且将此方法同无监督分装软件MetaCluster3.0以及那些单纯使用三联、四联核苷酸频率进行分装的算法做对比,并对结果进行了深入讨论。

关 键 词:宏基因组  分装  序列关联性特征

Application of relevance characteristics for the assignment of genomic fragments
ZHANG Qianqian,CAO Changchang,DING Xiao,SUN Xiao.Application of relevance characteristics for the assignment of genomic fragments[J].Journal of Electron Devices,2013,36(4):450-454.
Authors:ZHANG Qianqian  CAO Changchang  DING Xiao  SUN Xiao
Affiliation:(StateKey Laboratory of Bioelectronics School of Biological Science and Medical Engineering,Nanjing 210096,China)
Abstract:As the developing and popularing of the genome sequencing techniques, binning methods remains an ongoing challenge in the taxonomic characterization of DNA fragments resulting from sequencing a sample of mixed species. The binning accuracy of metagenomics directly affect the depth and efficiency of genomic research, improve the accuracy of the key to improve binning accuracy is to extract an effective sequence characteristics from metagenomic sequencing fragments. Most of the current methods of binning can be divided into two categories, one is assigning sequence fragments by comparing sequence similarity, and another is sequence composition. These methods, however, can produce ambiguous results. In this study, we propose an unsupervised binning method based on the distribution of sequence base relevance characteristics, which is also known as Base-Base Correlation(BBC). From our experiments, BBC is applied to several data sets, including eukaryotic data set, and a large number of prokaryotic data sets. we show that our method can accurately bin DNA fragments with various lengths and relative species abundance ratios without using any reference and training datasets. We also compare with Metacluster 3.0 and and the results were discussed in depth.
Keywords:Metagenomics  binning  sequence relevance characteristics
本文献已被 CNKI 等数据库收录!
点击此处可从《电子器件》浏览原始摘要信息
点击此处可从《电子器件》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号