首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
典型网络高维数据软子空间聚类方法采用软子空间聚类算法,根据目标函数最优解判断聚类是否最优,最优解计算过程容易过度拟合陷入局部最优,导致分类结果精度低。故文中提出基于决策树的网络高维数据软子空间聚类方法,根据信息增益选择决策树节点,在信息增益基础上添加分裂信息项防止决策树节点过度分类,获取不同树节点属性类别划分结果。在此基础上采用后剪枝技术删除含有噪音和干扰属性结点,将包含样本数量最多的分类结果视为网络高维数据软子空间的分类结果。仿真实验结果表明,所提方法聚类分析正确率随着网络高维数据集维数的增加而增加,且随样本数量增加的同时运行时间增长幅度较低,用时较短,是一种应用价值高的网络高维数据软子空间聚类方法。  相似文献   

2.
孙悦  袁健 《电子科技》2019,32(4):60-64
针对基于单机的经典随机森林算法无法满足海量数据处理需求的问题,文中采用Spark分布式存储计算技术设计并实现了改进的随机森林算法。首先计算特征的重要程度,将特征分为公共特征、独有特征和非重要特征;然后按顺序和比例分别在各个特征子空间中随机选择特征;最后通过Spark集群进行实验,分析改进的随机森林算法分类性能、加速比和效率。结果证实改进的算法提高了随机森林构建效率,可以用来解决海量数据挖掘问题,具有良好的可扩展性。  相似文献   

3.
一种基于多图的集成直推分类方法   总被引:1,自引:0,他引:1  
基于图的直推分类器依赖于图结构。高维数据通常具有冗余和噪声特征,在其上构造的图不能充分反映数据的分布信息,分类器性能因此下降。为此,该文提出一种多图构建方法并把它应用到直推分类中。该方法首先生成多个随机子空间并在每个子空间上进行半监督判别分析,其次在每个判别子空间上构造图并训练一个直推分类器,最后投票融合这些分类器为一个集成分类器。实验结果表明,对比其它直推分类器,该文的集成分类器具有分类正确率高、对参数鲁棒等特点。  相似文献   

4.
针对并行深度森林在处理大数据时存在冗余及无关特征过多、类向量过长、模型收敛速度慢以及并行化训练效率低等问题,提出了基于Spark和三路交互信息的并行深度森林(PDF-STWII)算法。首先,提出基于特征交互的特征选择(FSFI)策略过滤原始特征,剔除无关及冗余特征;其次,提出多粒度向量消除(MGVE)策略,融合相似类向量,缩短类向量长度;再次,提出级联森林特征增强(CFFE)策略提高信息利用率,加快模型收敛速度;最后,结合Spark框架提出多级负载均衡(MLB)策略,通过自适应子森林划分和异构倾斜数据划分,提高并行化训练效率。实验结果表明,所提算法能显著提升模型分类效果,缩短并行化训练时间。  相似文献   

5.
陈若男  孙晓颖  刘国红 《电子学报》2017,45(7):1553-1558
针对核(kernel)空间下主用户频谱感知算法存在的计算任务繁重这一共性问题,提出一种低计算复杂度的Nystrom特征子空间匹配(NSM)新算法.该算法依据数据样本维的独立同分布特性随机地选择数据子集.在高维核空间下应用Nystrom近似获得主特征向量,用以分别构建主用户特征信号与次用户接收信号的Nystrom特征子空间.以此为基础计算相应的Frobenius距离,实现主用户检测.计算机仿真结果表明:与代表性的核空间下主用户频谱感知算法相比,所提算法在保证检测性能较为理想的前提下,可将相应的计算复杂度降低近66%.  相似文献   

6.
聚类是数据挖掘的一个重要方面,而对高维混合特征数据聚类仍然是一个具有挑战性的问题.针对高维混合特征数据下欧氏距离失去意义的问题,提出了一种基于随机贪婪的树状基学习器集成的森林聚类算法.模型能够利用树状基学习器集成的优点,同时处理离散和连续特征混合下的数据以及高维度的数据.借鉴随机森林计算相似度矩阵的方法,计算聚类森林中...  相似文献   

7.
提出了基于优化的随机子空间分类集成算法CEORS,该算法通过运用封装式特征选择和LSA降维两种方法对随机选择的特征子集进行了优化,并运用优化的特征子空间进行分类器的集成.实验结果表明,基于优化特征子空间的集成分类器性能优于Bagging和AdaBoost.  相似文献   

8.
多种因素可能对学生成绩造成影响,利用数据挖掘工具对学生的学习课程成绩进行预测分析,进而利用预测分析结果及时指正学生出现的不良学习行为,同时检查老师的教学效果。首先将改进随机森林算法在大数据平台上进行并行化改进后进行实践。然后通过将提出的改进随机森林算法进行并行化,将其运用到Spark+Kudu大数据平台上进行仿真实验。最后算法并行化主要根据随机森林算法中的决策树划分策略以及模拟退火算法构建多种群策略来进行。实验结果证明并行化策略能够有效提升数据集的分类效率,大幅度缩短算法执行时间。  相似文献   

9.
头部姿态估计是人类行为和注意力的关键,受到光照、噪声、身份、遮挡等许多因素的影响。为了提高非约束环境下的估计准确率和鲁棒性,该论文提出了树结构分层随机森林在非约束环境下的多类头部姿态估计。首先,为了消除不同环境的噪声影响,提取人脸区域的组合纹理特征,对人脸区域进行积极人脸子区域的分类,分类结果作为树结构分层随机森林的先验知识输入;其次,提出了一种树结构分层随机森林算法,分层估计多自由度下的头部姿态;再次,为了增强算法的分类能力,使用自适应高斯混合模型作为多层次子森林叶子节点的投票模型。在多个公共数据集上的多种非约束实验环境下进行头部姿态估计,最终实验结果表明所提算法在不同质量的图像上都有很好的估计准确率和鲁棒性。  相似文献   

10.
针对航空数据呈现高维化、海量化趋势但传统模型处理大数据时单机计算资源不足的问题,本文提出一种基于Spark并融合气象数据的并行化航班延误预测模型。该模型利用数据框完成航班数据和气象数据的融合,从而在单个航班数据后加入不同小时的气象数据。然后,采用并行化方式进行随机森林的特征划分和树的生成,可快速进行航班延误预测。实验结果表明融入气象数据后查全率和正确率均有提高,针对不同阈值的延误时间进行预测时,大阈值的预测准确率更高。同时,并行化模型较单机模型更快收敛,具有较强的加速比。   相似文献   

11.
Real world classification tasks may involve high dimensional missing data. The traditional approach to handling the missing data is to impute the data first, and then apply the traditional classification algorithms on the imputed data. This method first assumes that there exist a distribution or feature relations among the data, and then estimates missing items with existing observed values. A reasonable assumption is a necessary guarantee for accurate imputation. The distribution or feature relations of data, however, is often complex or even impossible to be captured in high dimensional data sets, leading to inaccurate imputation. In this paper, we propose a complete-case projection subspace ensemble framework, where two alternative partition strategies, namely bootstrap subspace partition and missing pattern-sensitive subspace partition, are developed for incomplete datasets with even missing patterns and uneven missing patterns, respectively. Multiple component classifiers are then separately trained in these subspaces. After that, a final ensemble classifier is constructed by a weighted majority vote of component classifiers. In the experiments, we demonstrate the effectiveness of the proposed framework over eight high dimensional UCI datasets. Meanwhile, we apply the two proposed partition strategies over data sets with different missing patterns. As indicated, the proposed algorithm significantly outperforms existing imputation methods in most cases.  相似文献   

12.
随机森林是近些年发展起来的新集成学习算法,具有较好的分类准确率。针对该算法计算复杂度较高的不足,提出了一种基于谱聚类划分的随机森林算法。首先,利用聚类效果较好的谱聚类算法对原始样本集的每一类进行聚类处理。然后,在每一聚类簇中随机选取一个样本作为代表,组成新训练样本集合。最后,在新训练样本集上训练随机森林分类器。该算法通过谱聚类技术对原始样本进行了初步划分,将位置相近的多个样本用簇内的一个样本代表,较大程度地减少了训练样本的个数。在Corel Image图像识别数据集上的实验表明,算法可以用较少的分类时间达到较高的分类精度。  相似文献   

13.
基于随机森林算法的食源性致病菌拉曼光谱识别   总被引:1,自引:0,他引:1  
药品食品的安全问题一直是人们关注的重点。相比于传统的食源性致病菌光谱检测方法,拉曼光谱法具有检测范围广、检测灵活、光谱特征突出等特点。本文以常见的食源性致病菌为研究对象,利用拉曼光谱仪采集了11种食源性致病菌样品的132个拉曼光谱数据,提出了一种基于主成分分析和随机森林算法的分类模型。实验结果表明,主成分分析结合随机森林算法的分类模型可以将食源性致病菌区分开,且分类准确度可达到91.36%。  相似文献   

14.
针对半导体排产控制问题,提出一种基于多叉树随机森林的数据分类综合排产算法。首先,以周计划投产品种为输入,采用多叉树随机森林数据分类方法,以品种名称、投产量、交货期和所属类别作为半导体排产的特征信息进行数据分类;其次,根据分类结果,以降低"改机"时间为目的,进而确定日投产品种和数量;最后,通过应用研究验证算法的可行性。实验结果表明:所提出的算法有效降低"改机"时间,具有一定的有效性和优越性。  相似文献   

15.
Aiming at the defect of vote principle in random forest algorithm which is incapable of distinguishing the differences between strong classifier and weak classifier,a weighted voting improved method was proposed,and an improved random forest classification (IRFCM) was proposed to detect Android malware on the basis of this method.The IRFCM chose Permission information and Intent information as attribute features from AndroidManifest.xml files and optimized them,then applied the model to classify the final feature vectors.The experimental results in Weka environment show that IRFCM has better classification accuracy and classification efficiency.  相似文献   

16.
针对斜划分决策树算法普遍存在时间效率低、部分算法仅能应用于二分类问题,提出了一种基于加权距离的聚类决策树算法。通过Relief-F算法为预测属性计算权重,并将权重用于树结点中数据的聚类过程,使用分簇结果对结点进行多路划分,得到可直接用于多分类问题的决策树。理论分析和实验结果表明,该算法与经典轴平行决策树相比,拥有更好的泛化能力以及相近的算法时间复杂度,与大部分斜决策树相比,在付出更少计算代价的前提下,获得了近似的正确率以及模型简洁度。  相似文献   

17.
Statistical classification of byperspectral data is challenging because the inputs are high in dimension and represent multiple classes that are sometimes quite mixed, while the amount and quality of ground truth in the form of labeled data is typically limited. The resulting classifiers are often unstable and have poor generalization. This work investigates two approaches based on the concept of random forests of classifiers implemented within a binary hierarchical multiclassifier system, with the goal of achieving improved generalization of the classifier in analysis of hyperspectral data, particularly when the quantity of training data is limited. A new classifier is proposed that incorporates bagging of training samples and adaptive random subspace feature selection within a binary hierarchical classifier (BHC), such that the number of features that is selected at each node of the tree is dependent on the quantity of associated training data. Results are compared to a random forest implementation based on the framework of classification and regression trees. For both methods, classification results obtained from experiments on data acquired by the National Aeronautics and Space Administration (NASA) Airborne Visible/Infrared Imaging Spectrometer instrument over the Kennedy Space Center, Florida, and by Hyperion on the NASA Earth Observing 1 satellite over the Okavango Delta of Botswana are superior to those from the original best basis BHC algorithm and a random subspace extension of the BHC.  相似文献   

18.
Link quality prediction is vital to the upper layer protocol design of wireless sensor networks.Selecting high quality links with the help of link quality prediction mechanisms can improve data transmission reliability and network communication efficiency.The Gaussian mixture model algorithm based on unsupervised clustering was employed to divide the link quality level.Zero-phase component analysis (ZCA) whitening was applied to remove the correlation between samples.The mean and variance of signal to noise ratio,link quality indicator,and received signal strength indicator were taken as the estimation parameters of link quality,and a link quality estimation model was constructed by using a random forest classification algorithm.The random forest regression algorithm was used to build a link quality prediction model,which predicted the link quality level at the next moment.In different scenarios,comparing with exponentially weighted moving average,triangle metric,support vector regression and linear regression prediction models,the proposed prediction model has higher prediction accuracy.  相似文献   

19.
A differential privacy algorithm DiffPRFs based on random forests was proposed.Exponential mechanism was used to select split point and split attribute in each decision tree building process,and noise was added according to Laplace mechanism.Differential privacy protection requirement was satisfied through overall process.Compared to existed algorithms,the proposed method does not require pre-discretization of continuous attributes which significantly reduces the performance cost of preprocessing in large multi-dimensional dataset.Classification is achieved conveniently and efficiently while maintains the high accuracy.Experimental results demonstrate the effectiveness and superiority of the algorithm compared to other classification algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号