首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
There are many sources of systematic variations in cDNA microarray experiments which affect the measured gene expression levels. Print-tip lowess normalization is widely used in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. However, print-tip lowess normalization performs poorly in situations where error variability for each gene is heterogeneous over intensity ranges. We first develop support vector machine quantile regression (SVMQR) by extending support vector machine regression (SVMR) for the estimation of linear and nonlinear quantile regressions, and then propose some new print-tip normalization methods based on SVMR and SVMQR. We apply our proposed normalization methods to previous cDNA microarray data of apolipoprotein AI-knockout (apoAI-KO) mice, diet-induced obese mice, and genistein-fed obese mice. From our comparative analyses, we find that our proposed methods perform better than the existing print-tip lowess normalization method.  相似文献   

2.
In this paper, the classification of the two binary bioinformatics datasets, leukemia and colon tumor, is further studied by using the recently developed neural network-based finite impulse response extreme learning machine (FIR-ELM). It is seen that a time series analysis of the microarray samples is first performed to determine the filtering properties of the hidden layer of the neural classifier with FIR-ELM for feature identification. The linear separability of the data patterns in the microarray datasets is then studied. For improving the robustness of the neural classifier against noise and errors, a frequency domain gene feature selection algorithm is also proposed. It is shown in the simulation results that the FIR-ELM algorithm has an excellent performance for the classification of bioinformatics data in comparison with many existing classification algorithms.  相似文献   

3.

Cancer classification is one of the main steps during patient healing process. This fact enforces modern clinical researchers to use advanced bioinformatics methods for cancer classification. Cancer classification is usually performed using gene expression data gained in microarray experiment and advanced machine learning methods. Microarray experiment generates huge amount of data, and its processing via machine learning methods represents a big challenge. In this study, two-step classification paradigm which merges genetic algorithm feature selection and machine learning classifiers is utilized. Genetic algorithm is built in MapReduce programming spirit which makes this algorithm highly scalable for Hadoop cluster. In order to improve the performance of the proposed algorithm, it is extended into a parallel algorithm which process on microarray data in distributed manner using the Hadoop MapReduce framework. In this paper, the algorithm was tested on eleven GEMS data sets (9 tumors, 11 tumors, 14 tumors, brain tumor 1, lung cancer, brain tumor 2, leukemia 1, DLBCL, leukemia 2, SRBCT, and prostate tumor) and its accuracy reached 100% for less than 25 selected features. The proposed cloud computing-based MapReduce parallel genetic algorithm performed well on gene expression data. In addition, the scalability of the suggested algorithm is unlimited because of underlying Hadoop MapReduce platform. The presented results indicate that the proposed method can be effectively implemented for real-world microarray data in the cloud environment. In addition, the Hadoop MapReduce framework demonstrates substantial decrease in the computation time.

  相似文献   

4.
The construction of ultra-high-rise and long-span structures requires higher requirements for the integrity detection of piles. The acoustic signal detection has been verified an efficient and accurate nondestructive testing method. In fact, the integrity of piles is closely related to the onset time of signals. The accuracy of onset time directly affects the integrity evaluation of a pile. To achieve high-precision onset detection, continuous wavelet transform (CWT) preprocessing and machine learning algorithms were integrated into the software of high-sampling rate testing equipment. The distortion of waveforms, which could interfere with the accuracy of detection, was eliminated by CWT preprocessing. To make full use of the collected waveform data, three types of machine learning algorithms were used for classifying whether the data points are ambient or ultrasonic signals. The models involve a commonly used classifier (ELM), an individual classification tree model (DTC), an ensemble tree model (RFC) and a deep learning model (DBN). The classification accuracy of the ambient and ultrasonic signals of these models was compared by 5-fold validation. Results indicate that RFC performance is better than DBN and DTC after training. It is more suitable for the classification of points in waveforms. Then, a detection method of onset time based on classification results was therefore proposed to minimize the interference of classification errors on detection. In addition to the three data mining methods, the autocorrelation function method was selected as the control method to compare the proposed data mining based methods with the traditional one. The accuracy and error analysis of 300 waveforms proved the feasibility and stability of the proposed method. The RFC-based detection method is recommended because of the highest accuracy, lowest errors, and the most favorable error distribution among four onset detection methods. Successful applications demonstrate that it could provide a new way for ensuring the accurate testing of pile foundation integrity.  相似文献   

5.
针对在采用机器视觉的无夹具定位的壳体类零件几何参数检测过程中,需要先智能识别零件几何特征以规划检测路径的问题,提出一种基于监督式机器学习的几何特征智能识别方法。利用壳体零件待识别特征的中心位置关系构成特征矩阵,利用监督式机器学习算法进行识别,提出一种基于特征唯一性的纠错方法对分类过程中产生的识别错误进行纠正。对于所涉研究实例,零件共有4个待识别孔,在5次监督式训练后智能识别准确度达100%。  相似文献   

6.
In this study, diagnosis of hepatitis disease, which is a very common and important disease, is conducted with a machine learning method. We have proposed a novel machine learning method that hybridizes support vector machine (SVM) and simulated annealing (SA). Simulated annealing is a stochastic method currently in wide use for difficult optimization problems. Intensively explored support vector machine due to its several unique advantages is successfully verified as a predicting method in recent years. We take the dataset used in our study from the UCI machine learning database. The classification accuracy is obtained via 10-fold cross validation. The obtained classification accuracy of our method is 96.25% and it is very promising with regard to the other classification methods in the literature for this problem.  相似文献   

7.
DNA microarray technology has emerged as a prospective tool for diagnosis of cancer and its classification. It provides better insights of many genetic mutations occurring within a cell associated with cancer. However, thousands of gene expressions measured for each biological sample using microarray pose a great challenge. Many statistical and machine learning methods have been applied to get most relevant genes prior to cancer classification. A two phase hybrid model for cancer classification is being proposed, integrating Correlation-based Feature Selection (CFS) with improved-Binary Particle Swarm Optimization (iBPSO). This model selects a low dimensional set of prognostic genes to classify biological samples of binary and multi class cancers using Naive–Bayes classifier with stratified 10-fold cross-validation. The proposed iBPSO also controls the problem of early convergence to the local optimum of traditional BPSO. The proposed model has been evaluated on 11 benchmark microarray datasets of different cancer types. Experimental results are compared with seven other well known methods, and our model exhibited better results in terms of classification accuracy and the number of selected genes in most cases. In particular, it achieved up to 100% classification accuracy for seven out of eleven datasets with a very small sized prognostic gene subset (up to <1.5%) for all eleven datasets.  相似文献   

8.
Gene selection can help the analysis of microarray gene expression data. However, it is very difficult to obtain a satisfactory classification result by machine learning techniques because of both the curse-of-dimensionality problem and the over-fitting problem. That is, the dimensions of the features are too large but the samples are too few. In this study, we designed an approach that attempts to avoid these two problems and then used it to select a small set of significant biomarker genes for diagnosis. Finally, we attempted to use these markers for the classification of cancer. This approach was tested the approach on a number of microarray datasets in order to demonstrate that it performs well and is both useful and reliable.  相似文献   

9.
Cancer classification is the critical basis for patient-tailored therapy. Conventional histological analysis tends to be unreliable because different tumors may have similar appearance. The advances in microarray technology make individualized therapy possible. Various machine learning methods can be employed to classify cancer tissue samples based on microarray data. However, few methods can be elegantly adopted for generating accurate and reliable as well as biologically interpretable rules. In this paper, we introduce an approach for classifying cancers based on the principle of minimal rough fringe. For training rough hypercuboid classifiers from gene expression data sets, the method dynamically evaluates all available genes and sifts the genes with the smallest implicit regions as the dimensions of implicit hypercuboids. An unseen object is predicted to be a certain class if it falls within the corresponding class hypercuboid. Based upon the method, ensemble rough hypercuboid classifiers are subsequently constructed. Experimental results on some open cancer gene expression data sets show that the proposed method is capable of generating accurate and interpretable rules compared with some other machine learning methods. Hence, it is a feasible way of classifying cancer tissues in biomedical applications.  相似文献   

10.
Although machine tool can meet the specifications while it is new, after a long period of cutting operations, the abrasion of contact surfaces and deformation of structures will degrade the accuracy of machine tool due to the increase of the geometric errors in six freedoms. Therefore, how to maintain its accuracy for quality control of products is of crucial importance to machine tool. In this paper, machining accuracy reliability is defined as the ability to perform its specified machining accuracy under the stated conditions for a given period of time, and a new method to analyze the sensitivity of geometric errors to the machining accuracy reliability is proposed. By applying Multi-body system theory, a comprehensive volumetric model explains how individual geometric errors affect the machining accuracy (the coupling relationship) was established. Based on Monte Carlo mathematic simulation method, the models of the machining accuracy reliability and sensitivity analysis of machine tools were developed. By taking the machining accuracy reliability as a measure of the ability of machine tool and reliability sensitivity as a reference of optimizing the basic parameters of machine tools, an illustrative example of a three-axis machine tool was selected to demonstrate the effectiveness of the proposed method.  相似文献   

11.
Training data plays an essential role in modern applications of machine learning. However, gathering labeled training data is time-consuming. Therefore, labeling is often outsourced to less experienced users, or completely automated. This can introduce errors, which compromise valuable training data, and lead to suboptimal training results. We thus propose a novel approach that uses the power of pretrained classifiers to visually guide users to noisy labels, and let them interactively check error candidates, to iteratively improve the training data set. To systematically investigate training data, we propose a categorization of labeling errors into three different types, based on an analysis of potential pitfalls in label acquisition processes. For each of these types, we present approaches to detect, reason about, and resolve error candidates, as we propose measures and visual guidance techniques to support machine learning users. Our approach has been used to spot errors in well-known machine learning benchmark data sets, and we tested its usability during a user evaluation. While initially developed for images, the techniques presented in this paper are independent of the classification algorithm, and can also be extended to many other types of training data.  相似文献   

12.
一种基于聚类和统计分析DNA基因芯片图像处理算法   总被引:1,自引:0,他引:1  
DNA基因芯片可以同时监控成千上万个基因的表达信息。图像分析是基因芯片试验中一个重要的环节,直接影响到其后续的处理、分析和研究,比如鉴别预测具有不同表达信息的基因功能。基因芯片图像分析包括三个步骤:图像网格化,图像分割以及信息抽取。该文主要研究分割和信息抽取问题。首先基于K-Means聚类技术提出了一种新的分割方法;其次基于统计分析文章建议了一种新的背景和前景分割校正方法用于更准确的信息抽取。新方法的优点是对于基因芯片中spot图像没有任何形状限制。实际图像分析结果与目前最流行的基因芯片图像分析软件GenePix对比研究表明该文算法是精确有效的。  相似文献   

13.
Increasing attention is being paid to the classification of ground objects using hyperspectral spectrometer images. A key challenge of most hyperspectral classifications is the cost of training samples. It is difficult to acquire enough effective marked label sets using classification model frameworks. In this paper, a semi-supervised classification framework of hyperspectral images is proposed to better solve problems associated with hyperspectral image classification. The proposed method is based on an iteration process, making full use of the small amount of labeled data in a sample set. In addition, a new unlabeled data trainer in the self-training semi-supervised learning framework is explored and implemented by estimating the fusion evidence entropy of unlabeled samples using the minimum trust evaluation and maximum uncertainty. Finally, we employ different machine learning classification methods to compare the classification performance of different hyperspectral images. The experimental results indicate that the proposed approach outperforms traditional state-of-the-art methods in terms of low classification errors and better classification charts using few labeled samples.  相似文献   

14.
A hybrid Huberized support vector machine (HHSVM) with an elastic-net penalty has been developed for cancer tumor classification based on thousands of gene expression measurements. In this paper, we develop a Bayesian formulation of the hybrid Huberized support vector machine for binary classification. For the coefficients of the linear classification boundary, we propose a new type of prior, which can select variables and group them together simultaneously. Our proposed prior is a scale mixture of normal distributions and independent gamma priors on a transformation of the variance of the normal distributions. We establish a direct connection between the Bayesian HHSVM model with our special prior and the standard HHSVM solution with the elastic-net penalty. We propose a hierarchical Bayes technique and an empirical Bayes technique to select the penalty parameter. In the hierarchical Bayes model, the penalty parameter is selected using a beta prior. For the empirical Bayes model, we estimate the penalty parameter by maximizing the marginal likelihood. The proposed model is applied to two simulated data sets and three real-life gene expression microarray data sets. Results suggest that our Bayesian models are highly successful in selecting groups of similarly behaved important genes and predicting the cancer class. Most of the genes selected by our models have shown strong association with well-studied genetic pathways, further validating our claims.  相似文献   

15.
Microarray experiments have raised challenging questions such as how to make an accurate identification of a set of marker genes responsible for various cancers. In statistics, this specific task can be posed as the feature selection problem. Since a support vector machine can deal with a vast number of features, it has gained wide spread use in microarray data analysis. We propose a stepwise feature selection using the generalized logistic loss that is a smooth approximation of the usual hinge loss. We compare the proposed method with the support vector machine with recursive feature elimination for both real and simulated datasets. It is illustrated that the proposed method can improve the quality of feature selection through standardization while the method retains similar predictive performance compared with the recursive feature elimination.  相似文献   

16.
孪生支持向量机(TWSVM)的研究是近来机器学习领域的一个热点。TWSVM具有分类精度高、训练速度快等优点,但训练时没有充分利用样本的统计信息。作为TWSVM的改进算法,基于马氏距离的孪生支持向量机(TMSVM)在分类过程中考虑了各类样本的协方差信息,在许多实际问题中有着很好的应用效果。然而TMSVM的训练速度有待提高,并且仅适用于二分类问题。针对这两个问题,将最小二乘思想引入TMSVM,用等式约束取代TMSVM中的不等式约束,将二次规划问题的求解简化为求解两个线性方程组,得到基于马氏距离的最小二乘孪生支持向量机(LSTMSVM),并结合有向无环图策略(DAG)设计出基于马氏距离的最小二乘孪生多分类支持向量机。为了减少DAG结构的误差累积,构造了基于马氏距离的类间可分性度量。人工数据集和UCI数据集上的实验均表明,所提算法不仅有效,而且相对于传统多分类SVM,其分类性能有明显提高。  相似文献   

17.
In microarray processing, the appearance of artifacts, donuts, and irregularly shaped spots is a problem. In current microarray analysis, most approaches stress the segmentation of pixel intensities rather than emphasizing ratio estimators. To avoid segmenting spot target areas and to minimize sensitivity to aberrant pixels, we propose a robust ratio estimator of gene expression via inverse-variance weighting. Moreover, a metric is proposed to evaluate the spot quality. Both the simulation and numerical examples explored reveal that the proposed algorithm is superior to existing approaches with respect to mean square error. The acceptance quality measure recommended confirms the validity of the proposed ratio estimator.  相似文献   

18.
基于支持向量机集成的分类   总被引:6,自引:0,他引:6  
魏玲  张文修 《计算机工程》2004,30(13):1-2,17
支持向量机是一种基于结构风险最小化原理的分类技术,本文提出了将支持向量机分类器进行集成的分类思想。首先.在原始样本的基础上形成子支持向量机,得到待检样本的子预测;进而对子预测进行适当的组合,以确定样本最终的类别预报。模拟实验结果表明,该方法具有明显优于单一支持向量机的更高的分类准确率。  相似文献   

19.
In a DNA microarray dataset, gene expression data often has a huge number of features(which are referred to as genes) versus a small size of samples. With the development of DNA microarray technology, the number of dimensions increases even faster than before, which could lead to the problem of the curse of dimensionality. To get good classification performance, it is necessary to preprocess the gene expression data. Support vector machine recursive feature elimination (SVM-RFE) is a classical method for gene selection. However, SVM-RFE suffers from high computational complexity. To remedy it, this paper enhances SVM-RFE for gene selection by incorporating feature clustering, called feature clustering SVM-RFE (FCSVM-RFE). The proposed method first performs gene selection roughly and then ranks the selected genes. First, a clustering algorithm is used to cluster genes into gene groups, in each which genes have similar expression profile. Then, a representative gene is found to represent a gene group. By doing so, we can obtain a representative gene set. Then, SVM-RFE is applied to rank these representative genes. FCSVM-RFE can reduce the computational complexity and the redundancy among genes. Experiments on seven public gene expression datasets show that FCSVM-RFE can achieve a better classification performance and lower computational complexity when compared with the state-the-art-of methods, such as SVM-RFE.  相似文献   

20.
基于GA/SVM的微阵列数据特征的选择与分类   总被引:2,自引:0,他引:2       下载免费PDF全文
微阵列数据样本小、维度高的特点给数据分析造成了困难,而主基因的挑选又十分的重要。该文采用遗传算法挑选主基因,其中,用k最邻居距离作为模式识别方法,用支持向量机构造了诊断系统,用不同核函数进行预测分类性能测试。在经典的白血病数据集上,对34个样本的测试集的分类准确率为100%。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号