首页 | 官方网站   微博 | 高级检索  
     

基于图划分的全基因组并行拼接算法
引用本文:林皎,陈文光,栗强,郑纬民,张益民.基于图划分的全基因组并行拼接算法[J].计算机研究与发展,2006,43(8):1323-1329.
作者姓名:林皎  陈文光  栗强  郑纬民  张益民
作者单位:1. 清华大学计算机科学与技术系,北京,100084;清华大学教育部生物信息学重点实验室,北京,100084
2. 英特尔中国研究中心,北京,100080
基金项目:国家高技术研究发展计划(863计划)
摘    要:提出了一种基于图划分的全基因组并行拼接算法.该算法巧妙地将数据划分问题转化成图划分的问题,解决了传统数据划分算法中存在的节点负载不平衡的问题.同时,算法在建立关系图时有效地利用了WGS测序中所提供reads之间的长度信息和配对信息,使reads关系图能更准确地反映出数据之间的关系特性,从而提高了数据划分的准确性.实验结果表明,该算法可以准确地划分各种模拟数据、真实数据的数据集,相对于传统数据划分算法划分质量有了明显改善.

关 键 词:序列拼接  全基因组鸟枪法  并行拼接  图划分
收稿时间:12 26 2005 12:00AM
修稿时间:2005-12-262006-03-01

A New Data Clustering Algorithm for Parallel Whole-Genome Shotgun Sequence Assembly
Lin Jiao,Chen Wenguang,Li Qiang,Zheng Weimin,Zhang Yimin.A New Data Clustering Algorithm for Parallel Whole-Genome Shotgun Sequence Assembly[J].Journal of Computer Research and Development,2006,43(8):1323-1329.
Authors:Lin Jiao  Chen Wenguang  Li Qiang  Zheng Weimin  Zhang Yimin
Affiliation:1, Department of Computer Science and Technology, Tsinghua University, Beijing 100084;2,Ministry of Education Key Laboratory on Bioinformatics, Tsinghua University, Beijing 100084;3,Intel China Research Center Ltd, Beijing 100080
Abstract:Presented in this paper is a data clustering method based on graph-partition in parallel wholegenome sequence assembly. The algorithm transforms the data clustering problem into graph partition problem which helps to solve the load unbalancing in the parallel assembly stage. In addition, the method improves the quality of clustering by adding paired mate information into the read-relation graph which shows relationship between reads accurately. Experiments in both artificial and real genome data sets show that the data clustering method can obtain high quality clustered data and outperforms the traditional method significantly.
Keywords:sequence assembly  whole-genome shotgun sequencing  parallel assembly  graph-partition
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号