首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
韩亮  杨婷  蒲秀娟  黄谦 《电子与信息学报》2021,43(11):3319-3326
阿尔茨海默症(AD)分类有助于在AD早期阶段及时采取针对性的治疗和干预措施,对降低老年群体的AD发病率和延缓AD疾病进展具有重要意义。该文提出一种改进的高斯模糊逻辑特征选择方法,首先采用互信息量和方差齐性分析两种方法给出特征重要性评分并分别进行归一化,然后使用改进的高斯模糊逻辑方法对其加权得到最终的特征重要性评分,最后依据特征重要性评分选取特征。该文还使用逻辑回归、随机森林、LightGBM、支持向量机和深度前馈网络作为初级分类器,多项式朴素贝叶斯分类器作为次级分类器,构建异质集成分类器,利用选取的特征进行AD分类。在TADPOLE数据集上进行实验,实验结果证实了所提特征选择方法是有效的,且采用所提特征选择方法,基于多项式朴素贝叶斯的异质集成分类器在AD分类上的性能要优于传统分类器。  相似文献   

2.
To explore the potential of conventional image processing techniques in the classification of cervical cancer cells, in this work, a co-occurrence histogram method was employed for image feature extraction and an ensemble classifier was developed by combining the base classifiers, namely, the artificial neural network (ANN), random forest (RF), and support vector machine (SVM), for image classification. The segmented pap-smear cell image dataset was constructed by the k-means clustering technique and used to evaluate the performance of the ensemble classifier which was formed by the combination of above considered base classifiers. The result was also compared with that achieved by the individual base classifiers as well as that trained with color, texture, and shape features. The maximum average classification accuracy of 93.44% was obtained when the ensemble classifier was applied and trained with co-occurrence histogram features, which indicates that the ensemble classifier trained with co-occurrence histogram features is more suitable and advantageous for the classification of cervical cancer cells.  相似文献   

3.
针对传统集成学习方法直接应用于单类分类器效果不理想的问题,该文首先证明了集成学习方法能够提升单类分类器的性能,同时证明了若基分类器集不经选择会导致集成后性能下降;接着指出了经典集成方法直接应用于单类分类器集成时存在基分类器多样性严重不足的问题,并提出了一种能够提高多样性的基单类分类器混合生成策略;最后从集成损失构成的角度拆分集成单类分类器的损失函数,针对性地构造了集成单类分类器修剪策略并提出一种基于混合多样性生成和修剪的单类分类器集成算法,简称为PHD-EOC。在UCI标准数据集和恶意程序行为检测数据集上的实验结果表明,PHD-EOC算法兼顾多样性与单类分类性能,在各种单类分类器评价指标上均较经典集成学习方法有更好的表现,并降低了决策阶段的时间复杂度。  相似文献   

4.
Internet attacks pose a severe threat to most of the online resources and are a prime concern of security administrators these days. In spite of many efforts, the security techniques are unable to detect the intrusions accurately. Most of the methods suffer from the limitations of a high false positive rate, low detection rate and provide one solution which lacks the classification trade-offs. In this work, an effective two-stage method is proposed to produce a pool of non-dominating solutions or Pareto optimal solutions as base models and their ensembles for detecting the intrusions accurately. It generates Pareto optimal solutions to a chromosome structure in stage 1 formulating Pareto front. Whereas, another approximation to the Pareto front of optimal solutions is made to obtain non-dominating ensembles in the second stage. The final prediction ensemble solutions are computed from individual predictions using majority voting approach. Applicability of the suggested method is validated using benchmark dataset NSL-KDD dataset. The experimental results show that the recommended method provides better results than conventional ensemble techniques. The recommended method is also adequate to generate Pareto optimal solutions that address the issue of improving detection accuracy for minority as well as majority attack classes along with handling classification tradeoff problem. The proposed method resulted detection accuracy of 97% with FPR of 2% for KDD dataset respectively. The most attractive feature of the proposed method is that both generation of base classifier and their ensemble thereof are multi-objective in nature addressing the issue of low detection accuracy and classification tradeoffs.  相似文献   

5.
This paper proposes a JPEG steganalysis scheme based on the ensemble classifier and high-dimensional feature space. We first combine three current feature sets and remove the unimportant features according to the correlation between different features parts so as to form a new feature space used for steganalysis. This way, the dependencies among cover and steganographic images can be still represented by the features with a reduced dimensionality. Furthermore, we design a proportion mechanism to manage the feature selection in two subspaces for each base learner of the ensemble classifier. Experimental results show that the proposed scheme can effectively defeat the MB and nsF5 steganographic methods and its performance is better than that of existing steganalysis approaches.  相似文献   

6.
Security is the major concern in the world of Internet. Traditionally, encryption, firewall, and other security countermeasures are used to secure the data. However, in the modern era of technology, the Intrusion Detection System (IDS) plays a major role in the field of security to detect the attack type. IDS are tuned in such a way that it learns from historical network traffic data and detects normal as well as abnormal event connection from the monitored system. Nevertheless, due to the huge size of historical data, this system can suffer from issues like accuracy, false alarms and execution time. In this paper, a new abridging algorithm is proposed, which is able to vertically reduce the size of network traffic dataset without affecting its statistical characteristics. In the literature, vertical data reduction i.e. features selection techniques are always used to reduce dataset, but this paper evaluates the effect of vertical reduction, which has not been examined significantly. Apart from abridging of vertical instances, Infinite Feature Selection technique is used to extract the relevant features from the dataset and Support Vector Machine classifier is used to classify normal and anomalous instances. The performance of the proposed system is evaluated on different datasets like NSL‐KDD and Kyoto University benchmark dataset using various parameters like accuracy, the number of instances reduced, recall, precision, f1‐score, t‐value and execution time.  相似文献   

7.
该文提出了一种利用多特征融合和集成学习的极化SAR图像监督分类算法。该算法首先提取极化SAR图像的多重特征,包括EPFS特征,Hoekman分解特征,Huynen分解特征,H/alpha/A分解特征以及扩展四分量分解特征。为保证集成学习中基本分类器的差异性与准确性,算法从5组特征集中每次随机选取两组不同的特征进行串联融合,作为SVM分类器的输入。最后,利用随机森林学习算法将所有基本分类器的预测概率集成输出最终分类结果。像素级和区域级的分类实验表明了该文算法的有效性。   相似文献   

8.
为提高用于隐写分析的集成分类器的检测精度,提出一种基于特征排名的隐写分析算法。首先计算每维检测特征的互信息得分并根据得分高低将特征进行排名,然后设置分界点将特征分为重要特征区域与普通特征区域,依据设定的抽样比例从两个区域随机抽取特征组成不同的特征子空间并训练集成分类器。最后使用集成分类器进行分类。实验结果表明,针对使用nsF5及S-UNIWARD算法进行隐写的频域及空域图像,本算法较传统分类器在检测错误率方面分别平均下降约0.006 5和0.006 2,具有较好的检测效果。针对频域与空域中两种不同的隐写算法,与传统的集成分类器相比,该算法具有更高的检测精度。  相似文献   

9.

Employee turnover is the important issue in the recent day organizations. In this paper, a data mining based employee turnover predictor is developed in which ORACLE ERP dataset was used for sample training to predict the employee turnover with much higher accuracy. This paper deploys impactful algorithms and methodologies for the accurate prediction employee turnover taking place in any organization. First of all preprocessing is done as a precautionary step as always before proceeding with the core part of the proposed work. New Intensive Optimized PCA-Principal Component Analysis is used for feature selection and RFC-Random Forest Classifier is used for the classification purposes to classify accordingly to make the prediction more feasible. For classifying and predicting accurately, a methodology called Random Forest Classifier (RFC) classifier is deployed. The main objective of this work is to utilize Random Forest Classification methodology to break down fundamental purposes lying behind the worker turnover by making use of the information mining technique refer as Intensive Optimized PCA for feature selection. Comparative study taking the proposed novel work with the existing is made for showing the efficiency of this work. The performance of this proposed method was found to perform better with improved yields of ROC, accuracy, precision, recall, and F1 score when compared to other existing methodologies.

  相似文献   

10.
In applying pattern recognition methods in remote sensing problems, an inherent limitation is that there is almost always only a small number of training samples with which to design the classifier. A hybrid decision tree classifier design procedure that produces efficient and accurate classifiers for this situation is proposed. In doing so, several key questions are addressed, among them the question of the feature extraction techniques to be used and the mathematical relationship between sample size, dimensionality, and risk value. Empirical tests comparing the hybrid design classifier with a conventional single layered one are presented. They suggest that the hybrid design produces higher accuracy with fewer features. The need for fewer features is an important advantage, because it reflects favorably on both the size of the training set needed and the amount of computation time that will be needed in analysis  相似文献   

11.
Improved iterative scaling (IIS) is an algorithm for learning maximum entropy (ME) joint and conditional probability models, consistent with specified constraints, that has found great utility in natural language processing and related applications. In most IIS work on classification, discrete-valued “feature functions” are considered, depending on the data observations and class label, with constraints measured based on frequency counts, taken over hard (0–1) training set instances. Here, we consider the case where the training (and test) set consist of instances of probability mass functions on the features, rather than hard feature values. IIS extends in a natural way for this case. This has applications (1) to ME classification on mixed discrete-continuous feature spaces and (2) to ME aggregation of soft classifier decisions in ensemble classification. Moreover, we combine these methods, yielding a method, with proved learning convergence, that jointly performs (soft) decision-level and feature-level fusion in making ensemble decisions. We demonstrate favorable comparisons against standard Adaboost.M1, input-dependent boosting, and other supervised combining methods, on data sets from the UC Irvine Machine Learning repository.  相似文献   

12.

The occurrence of life-threatening ventricular arrhythmias (VAs) such as Ventricular tachycardia (VT) and Ventricular fibrillation (VF) leads to sudden cardiac death which requires detection at an early stage. The main aim of this work is to develop an automated system using machine learning tool for accurate prediction of VAs that may reduce the mortality rate. In this paper, a novel method using variational mode decomposition (VMD) based features and C4.5 classifier for detection of ventricular arrhythmias is presented. The VMD model was used to decompose the electrocardiography (ECG) signals to extract useful informative features. The method was tested for ECG signals obtained from PhysioNet database. Two standard databases i.e. CUDB (Creighton University Ventricular Tachyarrhythmia Database) and VFDB (MIT-BIH Malignant Ventricular Ectopy Database) were considered for this work. A set of time–frequency features were extracted and ranked by the gain ratio attribute evaluation method. The ranked features are subjected to support vector machine (SVM) and C4.5 classifier for classification of normal, VT and VF classes. The best detection was obtained with sensitivity of 97.97%, specificity of 99.15%, and accuracy of 99.18% for C4.5 classifier with a 5 s data analysis window. These results were better than SVM classifier result having an average accuracy of 86.87%. Hence, the proposed method demonstrates the efficiency in detecting the life-threatening VAs and can serve as an assistive tool to clinicians in the diagnosis process.

  相似文献   

13.
Application (app) ratings are feedback provided voluntarily by users and serve as important evaluation criteria for apps. However, these ratings can often be biased owing to insufficient or missing votes. Additionally, significant differences have been observed between numeric ratings and user reviews. This study aims to predict the numeric ratings of Google apps using machine learning classifiers. It exploits numeric app ratings provided by users as training data and returns authentic mobile app ratings by analyzing user reviews. An ensemble learning model is proposed for this purpose that considers term frequency/inverse document frequency (TF/IDF) features. Three TF/IDF features, including unigrams, bigrams, and trigrams, were used. The dataset was scraped from the Google Play store, extracting data from 14 different app categories. Biased and unbiased user ratings were discriminated using TextBlob analysis to formulate the ground truth, from which the classifier prediction accuracy was then evaluated. The results demonstrate the high potential for machine learning-based classifiers to predict authentic numeric ratings based on actual user reviews.  相似文献   

14.
Recently, it has been seen that the ensemble classifier is an effective way to enhance the prediction performance. However, it usually suffers from the problem of how to construct an appropriate classifier based on a set of complex data, for example, the data with many dimensions or hierarchical attributes. This study proposes a method to constructe an ensemble classifier based on the key attributes. In addition to its high-performance on precision shared by common ensemble classifiers, the calculation results are highly intelligible and thus easy for understanding. Furthermore, the experimental results based on the real data collected from China Mobile show that the key-attributes-based ensemble classifier has the good performance on both of the classifier construction and the customer churn prediction.  相似文献   

15.
通过分析数字调制信号及其非线性变换的功率谱,提出了两个新的调制特征参数,改进了两个相关特征的描述,设计了一种采用神经网络集成结构分类器的识别方案,实现了AWGN信道下常用数字调制信号的自动识别.仿真表明,谱形状特征参数具有很好的抗噪声能力,离散谱线特征参数对信号调制参数更加稳健性,神经网络集成分类器的识别性能显著优于单个网络分类器,SNR>5dB时该识别方法的总体识别率在94%以上.  相似文献   

16.
We present a new image quantification and classification method for improved pathological diagnosis of human renal cell carcinoma. This method combines different feature extraction methodologies, and is designed to provide consistent clinical results even in the presence of tissue structural heterogeneities and data acquisition variations. The methodologies used for feature extraction include image morphological analysis, wavelet analysis and texture analysis, which are combined to develop a robust classification system based on a simple Bayesian classifier. We have achieved classification accuracies of about 90% with this heterogeneous dataset. The misclassified images are significantly different from the rest of images in their class and therefore cannot be attributed to weakness in the classification system.  相似文献   

17.
In order to improve the accuracy of weakly-supervised semantic segmentation method,a segmentation and optimization algorithm that combines multi-scale feature was proposed.The new algorithm firstly constructs a multi-scale feature model based on transfer learning algorithm.In addition,a new classifier was introduced for category prediction to reduce the failure of segmentation due to the prediction of target class information errors.Then the designed multi-scale model was fused with the original transfer learning model by different weights to enhance the generalization performance of the model.Finally,the predictions class credibility was added to adjust the credibility of the corresponding class of pixels in the segmentation map,avoiding false positive segmentation regions.The proposed algorithm was tested on the challenging VOC 2012 dataset,the mean intersection-over-union is 58.8% on validation dataset and 57.5% on test dataset.It outperforms the original transfer-learning algorithm by 12.9% and 12.3%.And it performs favorably against other segmentation methods using weakly-supervised information based on category labels as well.  相似文献   

18.
基于不同Margin的人脸特征选择及识别方法   总被引:1,自引:0,他引:1  
Margin在机器学习中具有很重要的意义,基于margin的特征选择方法就是从分类的角度对特征集各特征的权重进行分析。该文对不同的margin进行了分析,提出将sample-margin和hypothesis-margin分别作为特征选择标准对SBS特征选择方法进行改进,然后设计具有最佳超参数的SVM多项式分类器进行人脸识别。实验在FRERT人脸图像库上进行并与Relief特征选择方法进行了比较,对SVM和NN分类器的实验结果也进行了分析。实验结果显示:该文提出的人脸识别特征选择及识别方法是有效、适用的。  相似文献   

19.
Event detection in a multimodal Twitter dataset is considered. We treat the hashtags in the dataset as instances with two modes: text and geolocation features. The text feature consists of a bag-of-words representation. The geolocation feature consists of geotags (i.e., geographical coordinates) of the tweets. Fusing the multimodal data we aim to detect, in terms of topic and geolocation, the interesting events and the associated hashtags. To this end, a generative latent variable model is assumed, and a generalized expectation-maximization (EM) algorithm is derived to learn the model parameters. The proposed method is computationally efficient, and lends itself to big datasets. Experimental results on a Twitter dataset from August 2014 show the efficacy of the proposed method.  相似文献   

20.
Learning handwriting categories fail to perform well when trained and tested on data from different databases. In this paper, we propose a novel large margin domain adaptation algorithm which is able to learn a transformation between training and test datasets in addition to adapting the parameters of classifier using a few or even no training labeled samples from target handwriting dataset. Additionally, we developed a framework of ensemble projection feature learning for datasets representation as a front end for our algorithm to utilize the abundant unlabeled samples in target domain. Experiments on different handwritten digit datasets adaptations demonstrate that the proposed large margin domain adaptation algorithm achieves superior classification accuracy comparing with the state of the art methods. Quantitative evaluation of the proposed algorithm shows that semi-supervised adaptation utilizing one sample per class of target domain set reduces the error rates by 64.72% comparing with a corresponding SVM classifier.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号