首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A cancers disease in virtually any of its types presents a significant reason behind death surrounding the world. In cancer analysis, classification of varied tumor types is of the greatest importance. Microarray gene expressions datasets investigation has been seemed to provide a successful framework for revising tumor and genetic diseases. Despite the fact that standard machine learning ML strategies have effectively been valuable to realize significant genes and classify category type for new cases, regular limitations of DNA microarray data analysis, for example, the small size of an instance, an incredible feature number, yet reason for limitation its investigative, medical and logical uses. Extending the interpretability of expectation and forecast approaches while holding a great precision would help to analysis genes expression profiles information in DNA microarray dataset all the most reasonable and proficiently. This paper presents a new methodology based on the gene expression profiles to classify human cancer diseases. The proposed methodology combines both Information Gain (IG) and Standard Genetic Algorithm (SGA). It first uses Information Gain for feature selection, then uses Genetic Algorithm (GA) for feature reduction and finally uses Genetic Programming (GP) for cancer types’ classification. The suggested system is evaluated by classifying cancer diseases in seven cancer datasets and the results are compared with most latest approaches. The use of proposed system on cancers datasets matching with other machine learning methodologies shows that no classification technique commonly outperforms all the others, however, Genetic Algorithm improve the classification performance of other classifiers generally.  相似文献   

2.
Gene selection is one of the important issues for cancer classification based on gene expression profiles. Filter and wrapper approaches are widely used for gene selection, where the former is hard to measure the relationship between genes and the latter requires lots of computation. We present a novel method, called gene boosting, to select relevant gene subsets by integrating filter and wrapper approaches. It repeatedly selects a set of top-ranked informative genes by a filtering algorithm with respect to a temporal training dataset constructed according to the classification result for the original training dataset. Empirical results on three microarray benchmark datasets have shown that the proposed method is effective and efficient in finding a relevant and concise gene subset. It achieved competitive performance with fewer genes in a reasonable time, as well as led to the identification of some genes frequently getting selected.  相似文献   

3.
Properly designing a wavelet neural network (WNN) is crucial for achieving the optimal generalization performance. In this paper, two different approaches were proposed for improving the predictive capability of WNNs. First, the types of activation functions used in the hidden layer of the WNN were varied. Second, the proposed enhanced fuzzy c-means clustering algorithm—specifically, the modified point symmetry-based fuzzy c-means (MSFCM) algorithm—was employed in selecting the locations of the translation vectors of the WNN. The modified WNN was then applied to heterogeneous cancer classification using four different microarray benchmark datasets. The comparative experimental results showed that the proposed methodology achieved an almost 100% classification accuracy in multiclass cancer prediction, leading to superior performance with respect to other clustering algorithms. Subsequently, performance comparisons with other classifiers were made. An assessment analysis showed that this proposed approach outperformed most of the other classifiers.  相似文献   

4.
基于基因表达谱提出了一种依据类加权Bhattacharyya距离提取特征基因并使用人工神经网络(ANN)进行肿瘤亚型识别的方法。分析了儿童小圆蓝细胞瘤(SRBCTs)的基因表达数据后,在训练样本集上计算出各个基因的类加权Bhattacharyya距离,并据此选择特征基因构造若干ANN模型,利用独立测试集验证其分类能力,且依据分类错误率最小的原则确定了含40个基因的特征基因组合。基于该特征基因组合的ANN模型不仅正确地识别了所有的患病样本的亚型,还能鉴别非患病样本。  相似文献   

5.
In this paper, we present a medical diagnosis decision support model for gastrointestinal cancer. It should be used by general practitioners whenever there is a suspicion that a patient has this type of cancer. To build our model, we used Case-Based Reasoning (CBR) and Rule-Based Reasoning (RBR). We used real patient data as inputs to our model. We applied RBR to improve the CBR retrieve process. The model’s output presents the probability of the patient having a specific cancer. In order to adjust the attributes weights, we collected data from a general practitioner. To validate our model, we used K-fold cross validation and the paired t-test. The results showed that, with our approach, the accuracy of the diagnosis increased by 22.92% when compared to a CBR approach not using RBR in case retrieval. Furthermore, we evaluated our approach with an online questionnaire and semi-structured interviews. Even though, given the number of respondents, we cannot generalize our conclusions, the results indicate that our approach would be useful for general practitioners.  相似文献   

6.
Predicting the accurate prognosis of breast cancer from high throughput microarray data is often a challenging task. Although many statistical methods and machine learning techniques were applied to diagnose the prognosis outcome of breast cancer, they are suffered from the low prediction accuracy (usually lower than 70%). In this paper, we propose a better method (genetic algorithm-support vector machine, we called GASVM) to significant improve the prediction accuracy of breast cancer from gene expression profiles. To further improve the classification performance, we also apply GASVM model using combined clinical and microarray data. In this paper, we evaluate the performance of the GASVM model based on data provided by 97 breast cancer patients. Four kinds of gene selection methods are used: all genes (All), 70 correlation-selected genes (C70), 15 medical literature-selected genes (R15), and 50 T-test-selected genes (T50). With optimized parameter values identified from GASVM model, the average predictive accuracy of our model approaches 95% for T50 and 90% for C70 or R15 in all four kernel functions using integrated clinical and microarray data. Our model produces results more accurately than the average 70% predictive accuracy of other machine learning methods. The results indicate that the GASVM model has the potential to better assist physicians in the prognosis of breast cancer through the use of both clinical and microarray data.  相似文献   

7.
In this paper, we propose a microcalcification classification scheme, assisted by content-based mammogram retrieval, for breast cancer diagnosis. We recently developed a machine learning approach for mammogram retrieval where the similarity measure between two lesion mammograms was modeled after expert observers. In this work, we investigate how to use retrieved similar cases as references to improve the performance of a numerical classifier. Our rationale is that by adaptively incorporating local proximity information into a classifier, it can help to improve its classification accuracy, thereby leading to an improved “second opinion” to radiologists. Our experimental results on a mammogram database demonstrate that the proposed retrieval-driven approach with an adaptive support vector machine (SVM) could improve the classification performance from 0.78 to 0.82 in terms of the area under the ROC curve.  相似文献   

8.
This paper presents a new visualization software called pairheatmap, which is able to generate and compare two heatmaps so as to compare expression patterns of gene groups. It adds a conditioning variable such as time to the heatmap, and provides separate clustering for row groups in the first heatmap in order to visualize pattern changes between two heatmaps. pairheatmap is developed in R statistical environment. It provides: (1) the flexible framework for comparing two heatmaps; and (2) high-quality figures based on R package grid. The general architecture can be efficiently incorporated into bioinformatics pipeline. The package and user documentation are free to download at http://cran.r-project.org/web/packages/pairheatmap/index.html.  相似文献   

9.
The mixture-Gaussian model-based clustering method has received much attention in clustering gene expression profiles in the literature of bioinformatics. However, this method suffers from two difficulties in applications. The first one is on the parameter estimation, which becomes difficult when the dimension of the data is high or the size of a cluster is small. The second one is on the normality assumption for gene expression levels, which is seldom satisfied by real data. In this paper, we propose to overcome these two difficulties by the probit transformation in conjunction with the singular value decomposition (SVD). SVD reduces the dimensionality of the data, and the probit transformation converts the scaled eigensamples, which can be interpreted as correlation coefficients as explained in the text, into Gaussian random variables. Our numerical results show that the SVD-based probit transformation enhances the ability of the mixture-Gaussian model-based clustering method for identifying prominent patterns of the data. As a by-product, we show that the SVD-based probit transformation also improves the performance of the model-free clustering methods, such as hierarchical, K-means and self-organizing maps (SOM), for the data sets containing scattered genes. In this paper, we also propose a run test-based rule for selection of eigensamples used for clustering.  相似文献   

10.
11.
Boosting算法在基因表达谱样本分类中的应用   总被引:1,自引:0,他引:1       下载免费PDF全文
基于基因表达谱结构提出一种基因表达谱的样本分类方法。首先用基因的Bhattacharyya距离衡量其所含样本类别的信息,过滤Bhattacharyya距离较小的噪声基因;然后修改重复剪辑近邻算法,剔除噪声样本;再基于Boosting算法构建支持向量机组合分类器;最后以结肠癌基因表达谱样本为例,进行了分类实验。实验结果表明该方法简单、有效,对基因表达谱样本的分类问题有强的实用性。  相似文献   

12.
Principal components analysis (PCA) is useful for reproducing the total variation among hundreds or thousands of continuously-scaled variables with a much smaller number of unobservable variables called 'latent factors'. The CLUSFAVOR computer program was used to implement PCA for identifying groups of genes with similar expression profiles from a large number of genes used on DNA microarrays. This paper describes the principal components solution to the factor model of the correlation matrix R, calculation of eigenvalues and eigenvectors of R, extraction of factors, and calculation of factor loadings and identification of genes with similar loading patterns to construct groups of genes with similar expression profiles. With regard to extraction of factors, it was found that more than 90% of the total variance in input data could be accounted for by extracting factors whose eigenvalues exceed unity. Bipolar factors containing strong positive and negative loadings can also be used for identifying two unique groups of genes, since expression profiles of genes that load positive are unlike expression profiles of genes that load negative on the same factor. While PCA does not provide the absolute answer to a multidimensional problem, it nevertheless can provide a heuristic with which natural groupings of genes with similar expression profiles can be assembled. While cluster analysis essentially generates a single dendogram (tree branch) containing every gene in the input data, PCA can be used to assemble gene expression profiles that strongly correlate with the latent factors accounting for a majority of total variance. Example results for CLUSFAVOR computer program runs are provided.  相似文献   

13.
随着大规模基因表达谱技术的发展,基于基因表达谱的癌症诊断方法正在成为临床医学上一种快速有效的诊断方法,但是由于基因表达数据维数过高、样本量小、噪声大,使得正确提取有关癌症的特征基因成为关键。以结肠癌肿瘤的基因表达谱数据为例,提出了结合Fisher权函数、离散傅里叶变换和主成分分析的混合特征基因提取方法,以多元Logistic回归分析和贝叶斯决策作为分类器进行肿瘤分类检测。实验结果表明,该方法对于结肠癌数据集CV识别准确率高达96.80%。  相似文献   

14.
The aim of this study is to design a classifier based expert system for early diagnosis of the organ in constraint phase to reach informed decision making without biopsy by using some selected features. The other purpose is to investigate a relationship between BMI (body mass index), smoking factor, and prostate cancer. The data used in this study were collected from 300 men (100: prostate adenocarcinoma, 200: chronic prostatism or benign prostatic hyperplasia). Weight, height, BMI, PSA (prostate specific antigen), Free PSA, age, prostate volume, density, smoking, systolic, diastolic, pulse, and Gleason score features were used and independent sample t-test was applied for feature selection. In order to classify related data, we have used following classifiers; scaled conjugate gradient (SCG), Broyden–Fletcher–Goldfarb–Shanno (BFGS), and Levenberg–Marquardt (LM) training algorithms of artificial neural networks (ANN) and linear, polynomial, and radial based kernel functions of support vector machine (SVM). It was determined that smoking is a factor increases the prostate cancer risk whereas BMI is not affected the prostate cancer. Since PSA, volume, density, and smoking features were to be statistically significant, they were chosen for classification. The proposed system was designed with polynomial based kernel function, which had the best performance (accuracy: 79%). In Turkish Family Health System, family physician to whom patients are applied firstly, would contribute to extract the risk map of illness and direct patients to correct treatments by using expert system such proposed.  相似文献   

15.
This paper introduces a number of reliability criteria for computer-aided diagnostic systems for breast cancer. These criteria are then used to analyze some published neural network systems. It is also shown that the property of monotonicity for the data is rather natural in this medical domain, and it has the potential to significantly improve the reliability of breast cancer diagnosis while maintaining a general representation power. A central part of this paper is devoted to the representation/narrow vicinity hypothesis, upon which existing computer-aided diagnostic methods heavily rely. The paper also develops a framework for determining the validity of this hypothesis. The same framework can be used to construct a diagnostic procedure with improved reliability.  相似文献   

16.
针对较大规模结肠癌基因表达谱信息,对其噪声处理在基因标签提取问题中的作用进行了研究。不考虑噪声,用ReCorre算法确定分类基因,再用增l减r搜索算法确定基因标签组,对每个基因标签组使用基于支持向量机的留一交叉检验,确定最优的基因标签。分析噪声的影响,对于数据噪声,利用小波阈值去噪的方法滤除;对于无用基因,采用交替选择算法处理,进而重新确定基因标签。实验证明对肿瘤基因表达谱中噪声的处理有助于获取分类能力更好的基因标签。  相似文献   

17.
Discovering the similar groups is a popular primary step in analysis of biomedical data, which cannot be identified manually. Many supervised and unsupervised machine learning and statistical approaches have been developed to solve this problem. Clustering is an unsupervised learning approach, which organizes the data into similar groups, and is used to discover the intrinsic hidden structure of data. In this paper, we used clustering by fast search and find of density peaks (CDP) approach for cancer subtyping and identification of normal tissues from tumor tissues. In additional, we also address the preprocessing and underlying distance matrix’s impact on finalized groups. We have performed extensive experiments on real-world and synthetic cancer gene expression microarray data sets and compared obtained results with state-of-the-art clustering approaches.  相似文献   

18.
Multimedia Tools and Applications - This paper aims to early Breast Cancer (BC) detection by Mammography (MG) established on the production of excellent images and competent interpretation. This...  相似文献   

19.
Neural Computing and Applications - Breast cancer is a serious disease for women in the world and ranks the second cancer for women in many countries. Computer-aided diagnosis provides a second...  相似文献   

20.
Many cellular processes exhibit periodic behaviors. Hence, one of the important tasks in gene expression data analysis is to detect subset of genes that exhibit cyclicity or periodicity in their gene expression time series profiles. Unfortunately, gene expression time series profiles are usually of very short length, with very few periods, irregularly sampled and are highly contaminated with noise. This makes the detection of periodic profiles a very challenging problem. Recently, a hypothesis testing method based on the Fisher g-statistic with correction for multiple testing has been proposed to detect periodic gene expression profiles. However, it was observed that the test is not reliable if the signal length is too short. In this paper, we performed extensive simulation study to investigate the statistical power of the test as a function of noise distribution, signal length, SNR, and the false discovery rate (FDR). We have found that the number of periodic profiles can be severely underestimated for short length signal. The findings indicate that caution needs to be exercised when interpreting the test result for very short length signals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号