首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Gene expression data are expected to be of significant help in the development of efficient cancer diagnosis and classification platforms. One problem arising from these data is how to select a small subset of genes from thousands of genes and a few samples that are inherently noisy. This research aims to select a small subset of informative genes from the gene expression data which will maximize the classification accuracy. A model for gene selection and classification has been developed by using a filter approach, and an improved hybrid of the genetic algorithm and a support vector machine classifier. We show that the classification accuracy of the proposed model is useful for the cancer classification of one widely used gene expression benchmark data set.  相似文献   

2.
Predicting the accurate prognosis of breast cancer from high throughput microarray data is often a challenging task. Although many statistical methods and machine learning techniques were applied to diagnose the prognosis outcome of breast cancer, they are suffered from the low prediction accuracy (usually lower than 70%). In this paper, we propose a better method (genetic algorithm-support vector machine, we called GASVM) to significant improve the prediction accuracy of breast cancer from gene expression profiles. To further improve the classification performance, we also apply GASVM model using combined clinical and microarray data. In this paper, we evaluate the performance of the GASVM model based on data provided by 97 breast cancer patients. Four kinds of gene selection methods are used: all genes (All), 70 correlation-selected genes (C70), 15 medical literature-selected genes (R15), and 50 T-test-selected genes (T50). With optimized parameter values identified from GASVM model, the average predictive accuracy of our model approaches 95% for T50 and 90% for C70 or R15 in all four kernel functions using integrated clinical and microarray data. Our model produces results more accurately than the average 70% predictive accuracy of other machine learning methods. The results indicate that the GASVM model has the potential to better assist physicians in the prognosis of breast cancer through the use of both clinical and microarray data.  相似文献   

3.
A genetic algorithm-based method for feature subset selection   总被引:5,自引:2,他引:3  
As a commonly used technique in data preprocessing, feature selection selects a subset of informative attributes or variables to build models describing data. By removing redundant and irrelevant or noise features, feature selection can improve the predictive accuracy and the comprehensibility of the predictors or classifiers. Many feature selection algorithms with different selection criteria has been introduced by researchers. However, it is discovered that no single criterion is best for all applications. In this paper, we propose a framework based on a genetic algorithm (GA) for feature subset selection that combines various existing feature selection methods. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for a particular inductive learning algorithm of interest to build the classifier. We conducted experiments using three data sets and three existing feature selection methods. The experimental results demonstrate that our approach is a robust and effective approach to find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm.  相似文献   

4.
聚类是一种常用的基因表达数据处理手段,然而它又是主观的,如何选择符合数据内在分布的聚类算法成为目前急待解决的问题.根据经验,当选择最佳簇数k后,采用合理的聚类算法对目标数据重复聚类时,结果稳定性较好.因此提出一种基于稳定性的聚类算法选择.该方法将聚类结果的簇间分离度、簇内紧致度和聚类结果稳定性三者结合起来.在验证和应用三组数据时发现,比传统的评估方法,基于稳定性的聚类算法选择更客观、更可靠.  相似文献   

5.
A multi-objective particle swarm optimization for project selection problem   总被引:2,自引:0,他引:2  
Selecting the most appropriate projects out of a given set of investment proposals is recognized as a critical issue for which the decision maker takes several aspects into consideration. Since many of these aspects may be conflicting, the problem is rendered as a multi-objective one. Consequently, we consider a multi-objective project selection problem in this study where total benefits are to be maximized while total risk and total coat must be minimized, simultaneously. Since solving an NP-hard problem becomes demanding as the number of projects grows, a multi-objective particle swarm with new selection regimes for global best and personal best for swarm members is designed to find the locally Pareto-optimal frontier and is compared with a salient multi-objective genetic algorithm, i.e. SPEAII, based on some comparison metrics with random instances.  相似文献   

6.
Gene expression technology, namely microarrays, offers the ability to measure the expression levels of thousands of genes simultaneously in biological organisms. Microarray data are expected to be of significant help in the development of an efficient cancer diagnosis and classification platform. A major problem in these data is that the number of genes greatly exceeds the number of tissue samples. These data also have noisy genes. It has been shown in literature reviews that selecting a small subset of informative genes can lead to improved classification accuracy. Therefore, this paper aims to select a small subset of informative genes that are most relevant for cancer classification. To achieve this aim, an approach using two hybrid methods has been proposed. This approach is assessed and evaluated on two well-known microarray data sets, showing competitive results. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

7.
针对EM算法中的初始类的数目很难决定,在迭代中经常产生部分最优的情况,将K-means算法与基于EM的聚类方法相结合,提出了一个新的适用于基因表达数据的模型聚类方法。新的聚类方法,首先利用K-means算法具有全局性、效率高的优点,快速得到聚类的起始类的划分,将其设置为高斯混合模型的初始参数值,进一步采用EM方法进行聚类,得到最优聚类结果。通过2次对真实数据集的实验测试,将新的算法分别与K均值算法和EM算法进行了比较。实验结果表明,新算法是一种有效的聚类方法,聚类结果的准确度得到了提高。  相似文献   

8.
Cancer diagnosis is an important emerging clinical application of microarray data. Its accurate prediction to the type or size of tumors relies on adopting powerful and reliable classification models, so as to patients can be provided with better treatment or response to therapy. However, the high dimensionality of microarray data may bring some disadvantages, such as over-fitting, poor performance and low efficiency, to traditional classification models. Thus, one of the challenging tasks in cancer diagnosis is how to identify salient expression genes from thousands of genes in microarray data that can directly contribute to the phenotype or symptom of disease. In this paper, we propose a new ensemble gene selection method (EGS) to choose multiple gene subsets for classification purpose, where the significant degree of gene is measured by conditional mutual information or its normalized form. After different gene subsets have been obtained by setting different starting points of the search procedure, they will be used to train multiple base classifiers and then aggregated into a consensus classifier by the manner of majority voting. The proposed method is compared with five popular gene selection methods on six public microarray datasets and the comparison results show that our method works well.  相似文献   

9.
Supply chain network (SCN) design is to provide an optimal platform for efficient and effective supply chain management. It is an important and strategic operations management problem in supply chain management, and usually involves multiple and conflicting objectives such as cost, service level, resource utilization, etc. This paper proposes a new solution procedure based on genetic algorithms to find the set of Pareto-optimal solutions for multi-objective SCN design problem. To deal with multi-objective and enable the decision maker for evaluating a greater number of alternative solutions, two different weight approaches are implemented in the proposed solution procedure. An experimental study using actual data from a company, which is a producer of plastic products in Turkey, is carried out into two stages. While the effects of weight approaches on the performance of proposed solution procedure are investigated in the first stage, the proposed solution procedure and simulated annealing are compared according to quality of Pareto-optimal solutions in the second stage.  相似文献   

10.
In this paper, we propose a genetic algorithm with silhouette statistics as discriminant function (GASS) for gene selection and pattern recognition. The proposed method evaluates gene expression patterns for discriminating heterogeneous cancers. Distance metrics and classification rules have also been analyzed to design a GASS with high classification accuracy. Moreover, the proposed method is compared to previously published methods. Various experimental results show that our method is effective for classifying the NCI60, the GCM and the SRBCTs datasets. Moreover, GASS outperforms other existing methods in both the leave-one-out cross validations and the independent test for novel data.  相似文献   

11.
A cancers disease in virtually any of its types presents a significant reason behind death surrounding the world. In cancer analysis, classification of varied tumor types is of the greatest importance. Microarray gene expressions datasets investigation has been seemed to provide a successful framework for revising tumor and genetic diseases. Despite the fact that standard machine learning ML strategies have effectively been valuable to realize significant genes and classify category type for new cases, regular limitations of DNA microarray data analysis, for example, the small size of an instance, an incredible feature number, yet reason for limitation its investigative, medical and logical uses. Extending the interpretability of expectation and forecast approaches while holding a great precision would help to analysis genes expression profiles information in DNA microarray dataset all the most reasonable and proficiently. This paper presents a new methodology based on the gene expression profiles to classify human cancer diseases. The proposed methodology combines both Information Gain (IG) and Standard Genetic Algorithm (SGA). It first uses Information Gain for feature selection, then uses Genetic Algorithm (GA) for feature reduction and finally uses Genetic Programming (GP) for cancer types’ classification. The suggested system is evaluated by classifying cancer diseases in seven cancer datasets and the results are compared with most latest approaches. The use of proposed system on cancers datasets matching with other machine learning methodologies shows that no classification technique commonly outperforms all the others, however, Genetic Algorithm improve the classification performance of other classifiers generally.  相似文献   

12.
一种新聚类算法在基因表达数据分析中的应用   总被引:2,自引:1,他引:1       下载免费PDF全文
自组织特征映射神经网络与层次聚类算法是两种较经典的分析基因表达数据的聚类算法,但由于基因表达数据的复杂性与不稳定性,这两种算法都存在着自身的优劣。因此,在比较两种算法差异性的基础上,创造性地提出了一种新算法,即通过SOM算法对基因表达数据进行聚类,再用层次聚类将每个类对应的神经元权值二次聚类,并将此算法应用在酵母菌基因表达数据中,用实验证明改进算法克服了自组织算法的一些缺陷,提高了基因聚类的效能。  相似文献   

13.
This paper evaluates different forms of rank-based selection that are used with genetic algorithms and genetic programming. Many types of rank based selection have exactly the same expected value in terms of the sampling rate allocated to each member of the population. However, the variance associated with that sampling rate can vary depending on how selection is implemented. We examine two forms of tournament selection and compare these to linear rank-based selection using an explicit formula. Because selective pressure has a direct impact on population diversity, we also examine the interaction between selective pressure and different mutation strategies.  相似文献   

14.
Feature selection has always been a critical step in pattern recognition, in which evolutionary algorithms, such as the genetic algorithm (GA), are most commonly used. However, the individual encoding scheme used in various GAs would either pose a bias on the solution or require a pre-specified number of features, and hence may lead to less accurate results. In this paper, a tribe competition-based genetic algorithm (TCbGA) is proposed for feature selection in pattern classification. The population of individuals is divided into multiple tribes, and the initialization and evolutionary operations are modified to ensure that the number of selected features in each tribe follows a Gaussian distribution. Thus each tribe focuses on exploring a specific part of the solution space. Meanwhile, tribe competition is introduced to the evolution process, which allows the winning tribes, which produce better individuals, to enlarge their sizes, i.e. having more individuals to search their parts of the solution space. This algorithm, therefore, avoids the bias on solutions and requirement of a pre-specified number of features. We have evaluated our algorithm against several state-of-the-art feature selection approaches on 20 benchmark datasets. Our results suggest that the proposed TCbGA algorithm can identify the optimal feature subset more effectively and produce more accurate pattern classification.  相似文献   

15.
多维数据实视图选择问题是一个NP完全问题。提出一种基于约束的多目标优化遗传算法,将查询代价和维护代价分开考虑,更有效地解决复杂的实视图选择问题。实验结果表明,该算法具有更好的性能,特别是在获得的Pareto前沿的分布性上。  相似文献   

16.
BackgroundThe application of microarray data for cancer classification is important. Researchers have tried to analyze gene expression data using various computational intelligence methods.PurposeWe propose a novel method for gene selection utilizing particle swarm optimization combined with a decision tree as the classifier to select a small number of informative genes from the thousands of genes in the data that can contribute in identifying cancers.ConclusionStatistical analysis reveals that our proposed method outperforms other popular classifiers, i.e., support vector machine, self-organizing map, back propagation neural network, and C4.5 decision tree, by conducting experiments on 11 gene expression cancer datasets.  相似文献   

17.
There is an ever increasing need to use optimization methods for thermal design of data centers and the hardware populating them. Airflow simulations of cabinets and data centers are computationally intensive and this problem is exacerbated when the simulation model is integrated with a design optimization method. Generally speaking, thermal design of data center hardware can be posed as a constrained multi-objective optimization problem. A popular approach for solving this kind of problem is to use Multi-Objective Genetic Algorithms (MOGAs). However, the large number of simulation evaluations needed for MOGAs has been preventing their applications to realistic engineering design problems. In this paper, details of a substantially more efficient MOGA are formulated and demonstrated through a thermal analysis simulation model of a data center cabinet. First, a reduced-order model of the cabinet problem is constructed using the Proper Orthogonal Decomposition (POD). The POD model is then used to form the objective and constraint functions of an optimization model. Next, this optimization model is integrated with the new MOGA. The new MOGA uses a “kriging” guided operation in addition to conventional genetic algorithm operations to search the design space for global optimal design solutions. This approach for optimal design is essential to handle complex multi-objective situations, where the optimal solutions may be non-obvious from simple analyses or intuition. It is shown that in optimizing the data center cabinet problem, the new MOGA outperforms a conventional MOGA by estimating the Pareto front using 50% fewer simulation calls, which makes its use very promising for complex thermal design problems. Recommended by: Monem Beitelmal  相似文献   

18.
In recent years, a number of multi-objective immune algorithms (MOIAs) have been proposed as inspired by the information processing in biologic immune system. Since most MOIAs encourage to search around some boundary and less-crowded areas using the clonal selection principle, they have been validated to show the effectiveness on tackling various kinds of multi-objective optimization problems (MOPs). The crowding distance metric is often used in MOIAs as a diversity metric to reflect the status of population’s diversity, which is employed to clone less-crowded individuals for evolution. However, this kind of cloning may encounter some difficulties when tackling some complicated MOPs (e.g., the UF problems with variable linkages). To alleviate the above difficulties, a novel MOIA with a decomposition-based clonal selection strategy (MOIA-DCSS) is proposed in this paper. Each individual is associated to one subproblem using the decomposition approach and then the performance enhancement on each subproblem can be easily quantified. Then, a novel decomposition-based clonal selection strategy is designed to clone the solutions with the larger improvements for the subproblems, which encourages to search around these subproblems. Moreover, differential evolution is employed in MOIA-DCSS to strength the exploration ability and also to improve the population’s diversity. To evaluate the performance of MOIA-DCSS, twenty-eight test problems are used with the complicated Pareto-optimal sets and fronts. The experimental results validate the superiority of MOIA-DCSS over four state-of-the-art multi-objective algorithms (i.e., NSLS, MOEA/D-M2M, MOEA/D-DRA and MOEA/DD) and three competitive MOIAs (i.e., NNIA, HEIA, and AIMA).  相似文献   

19.
Classical approaches to layout design problem tend to maximise the efficiency of layout, measured by the handling cost related to the interdepartmental flow and to the distance among the departments. However, the actual problem involves several conflicting objectives hence requiring a multi-objective formulation. Multi-objective approaches, recently proposed, in most cases lead to the maximisation of a weighted sum of score functions. The poor practicability of such an approach is due to the difficulty of normalising these functions and of quantifying the weights. In this paper, this difficulty is overcome by approaching the problem in two subsequent steps: in the first step, the Pareto-optimal solutions are determined by employing a multi-objective constrained genetic algorithm and the subsequent selection of the optimal solution is carried out by means of the multi-criteria decision-making procedure Electre. This procedure allows the decision maker to express his preferences on the basis of the knowledge of candidate solution set. Quantitative (handling cost) and qualitative (adjacency and distance requests between departments) objectives are considered referring to a bay structure-based layout model, that allows to take into account also practical constraints such as the aspect ratio of departments. Results obtained confirm the effectiveness of the proposed procedure as a practicable support tool for layout designers.  相似文献   

20.
Microarray technologies enable quantitative simultaneous monitoring of expression levels for thousands of genes under various experimental conditions. This new technology has provided a new way of biological classification on a genome-wide scale. However, predictive accuracy is affected by the presence of thousands of genes many of which are unnecessary from the classification point of view. So, a key issue of microarray data classification is to identify the smallest possible set of genes that can achieve good predictive accuracy. In this study, we propose a novel Markov blanket-embedded genetic algorithm (MBEGA) for gene selection problem. In particular, the embedded Markov blanket-based memetic operators add or delete features (or genes) from a genetic algorithm (GA) solution so as to quickly improve the solution and fine-tune the search. Empirical results on synthetic and microarray benchmark datasets suggest that MBEGA is effective and efficient in eliminating irrelevant and redundant features based on both Markov blanket and predictive power in classifier model. A detailed comparative study with other methods from each of filter, wrapper, and standard GA shows that MBEGA gives a best compromise among all four evaluation criteria, i.e., classification accuracy, number of selected genes, computational cost, and robustness.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号