首页 | 官方网站   微博 | 高级检索  
     


Gene association study with SVM, MLP and cross validation for diagnosis of diseases
Authors:Junying Zhang  Shenling Liu  Yue Wang
Affiliation:a) School of Computer Science and Engineering, Xidian University, Xi'an710071, China; b) Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Alexandria, VA 22314, USA
Abstract:Gene association study is one of the major challenges of biochip technology both for gene diagnosis where only a gene subset is responsible to some diseases, and for treatment of curse of dimensionality which occurs especially in DNA microarray datasets where there are more than thousands of genes and only a few number of experiments (samples). This paper presents a gene selection method by training linear support vector machine (SVM)/nonlinear MLP (multi-layer perceptron) classifiers and testing them with cross validation for finding a gene subset which is optimal/suboptimal for diagnosis of binary/multiple disease types. Genes are selected with linear SVM classifier for the diagnosis of each binary disease types pair and tested by leave-one-out cross validation; then, genes in the gene subset initialized by the union of them are deleted one by one by removing the gene which brings the greatest decrease of the generalization power, for samples, on the gene subset after removal, where generalization is measured by training MLPs with leave-one-out and leave-4-out cross validations. The proposed method was tested with experiments on real DNA microarray MIT data and NCI data. The result shows that it outperforms conventional SNR method in separability of the data with expression levels on selected genes. For real DNA microarray MIT/NCI data, which is composed of 7129/2308 effective genes with only 72/64 labeled samples belonging to 2/4 disease classes, only 11/6 genes are selected to be diagnostic genes. The selected genes are tested by classification of samples on these genes with SVM/MLP with leave-one-out/both leave-one-out and leave-4-out cross validations. The result of no misclassification indicates that the selected genes can be really considered as diagnostic genes for the diagnosis of the corresponding diseases.
Keywords:DNA microarray data  Curse of dimensionality  Gene selection  Diagnostic genes  SVM  MLP  Cross validation
点击此处可从《自然科学进展》浏览原始摘要信息
点击此处可从《自然科学进展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号