Gene association study with SVM, MLP and cross validation for diagnosis of diseases |
| |
Authors: | Junying Zhang Shenling Liu Yue Wang |
| |
Affiliation: | a) School of Computer Science and Engineering, Xidian University, Xi'an710071, China;
b) Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Alexandria, VA 22314, USA |
| |
Abstract: | Gene association study is one of the major challenges of biochip technology both for gene diagnosis where only a gene subset is responsible to some diseases, and for treatment of curse of dimensionality which occurs especially in DNA microarray datasets where there are more than thousands of genes and only a few number of experiments (samples). This paper presents a gene selection method by training linear support vector machine (SVM)/nonlinear MLP (multi-layer perceptron) classifiers and testing them with cross validation for finding a gene subset which is optimal/suboptimal for diagnosis of binary/multiple disease types. Genes are selected with linear SVM classifier for the diagnosis of each binary disease types pair and tested by leave-one-out cross validation; then, genes in the gene subset initialized by the union of them are deleted one by one by removing the gene which brings the greatest decrease of the generalization power, for samples, on the gene subset after removal, where generalization is measured by training MLPs with leave-one-out and leave-4-out cross validations. The proposed method was tested with experiments on real DNA microarray MIT data and NCI data. The result shows that it outperforms conventional SNR method in separability of the data with expression levels on selected genes. For real DNA microarray MIT/NCI data, which is composed of 7129/2308 effective genes with only 72/64 labeled samples belonging to 2/4 disease classes, only 11/6 genes are selected to be diagnostic genes. The selected genes are tested by classification of samples on these genes with SVM/MLP with leave-one-out/both leave-one-out and leave-4-out cross validations. The result of no misclassification indicates that the selected genes can be really considered as diagnostic genes for the diagnosis of the corresponding diseases. |
| |
Keywords: | DNA microarray data Curse of dimensionality Gene selection Diagnostic genes SVM MLP Cross validation |
|
| 点击此处可从《自然科学进展》浏览原始摘要信息 |
|
点击此处可从《自然科学进展》下载全文 |
|