首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
马宗杰  刘华文 《计算机应用》2014,34(7):2058-2060
针对多标签数据的标签相关性和高维问题,提出一种基于奇异值分解-偏最小二乘回归的多标签分类算法,该算法可以对多标签数据进行维数约简和回归分析。首先,将类别标签集合作为整体处理,对标签相关性进行考察; 其次,利用奇异值分解(SVD)技术得到样本和标签空间的得分向量,实施降维; 最后,在偏最小二乘回归(PLSR)的基础上构建多标签分类模型。实验结果表明,在四种维数较高的真实数据集上,该算法可以获得有效的分类结果。  相似文献   

2.
提出一种基于边界鉴别分析的递归维数约简算法.该算法把已求取边界鉴别向量正交于待求超平面法向量作为支持向量机(SVM)优化问题新的约束条件;然后对改进SVM进行递归求解,得到正交边界鉴别向量基;最后将数据样本在正交边界鉴别向量上投影实现维数约简.该算法不仅克服了现有维数约简算法难以支持小样本数据集、受数据样本分布影响等问题,而且抽取的特征向量具有更优的分类性能.仿真实验说明了算法的有效性.  相似文献   

3.
对包含大流量数据的高维度网络进行异常检测,必须加入维数约简处理以减轻系统在传输和存储方面的压力。介绍高速网络环境下网络流量异常检测过程以及维数约简方式,阐述流量数据常用特征和维数约简技术研究的最新进展。针对网络流量特征选择和流量特征提取2种特征降维方式,对现有算法进行归纳分类,分别描述算法原理及优缺点。此外,给出维数约简常用的数据集和评价指标,分析网络流量异常检测中维数约简技术研究面临的挑战,并对未来发展方向进行展望。  相似文献   

4.
黄莉莉  汤进  孙登第  罗斌 《计算机应用》2012,32(10):2888-2890
针对传统特征选择算法局限于单标签数据问题,提出一种多标签数据特征选择算法——多标签ReliefF算法。该算法依据多标签数据类别的共现性,假设样本各类标签的贡献值是相等的,结合三种贡献值计算方法,改进特征权值更新公式,最终获得有效的分类特征。分类实验结果表明,在特征维数相同的情况下,多标签ReliefF算法的分类正确率明显高于传统特征选择算法。  相似文献   

5.
《软件工程师》2017,(8):7-13
机器学习是近几年研究的热点,维数约简算法是机器学习的必要手段,本文从维数约简算法的定义讲起,介绍了几种典型的数据降维算法,其中包括线性降维和非线性降维,流形学习是非线性降维的代表算法。并且介绍了每个算法的构造过程及其特点,在此基础上分析了所有维数约简算法的执行效率时间和空间复杂度,并且给出了每个算法的特点和算法的核心思想,最后在此基础上给予总结,为后面研究者提供参考和借鉴。  相似文献   

6.
特征选择和维数约简在机器学习、模式识别和数据挖掘领域是很常用的方法,它们之间也具有一定的联系,但对它们的融合应用目前很少研究,从而融合特征选择和维数约简的思路被提出.该思路融合了主成分分析方法和遗传算法,提出PGS方法,并把它应用于基因microarray数据的预测分类,取得了较好的效果.  相似文献   

7.
传统PCA存在对异常值和特征噪声敏感等问题,基于L2,1范数的PCA算法改进了这些缺点。现有的基于L2,1范数的PCA算法是通过降低矩阵的秩来实现维数约简,而秩的计算复杂。针对这一问题,提出一种新的维数约简算法。该算法提出利用迹范数代替矩阵的秩来简化L2,1-PCA的计算,提高算法效率;对于算法的求解提出了基于拉格朗日乘子的方法并将算法应用扩展Yale B人脸数据集进行图像去噪。可视化的实验结果表明所提出的算法有效。  相似文献   

8.
为了克服现有方法在空气质量预测上存在的缺点,文中通过采用改进的离散型人工鱼群算法,并结合分形维数,提出基于人工鱼群和分形维数融合SVM的空气质量预测方法.首先对人工鱼群算法聚群、觅食行为及移动方式进行离散化改进,引入跳出局部最优策略和并行机制.然后将改进的离散型人工鱼群算法结合分形维数,约简空气质量数据集.最后采用基于高斯核SVM建立空气质量预测模型.在北京、上海和广州近2年的空气质量数据上的实验表明,文中方法预测性能较优,具有较高的稳定性和可信性.  相似文献   

9.
半监督维数约简是指借助于辅助信息与大量无标记样本信息从高维数据空间找到一个最优低维判别空间,便于后续的分类或聚类操作,它被看作是理解基因序列、文本与人脸图像等高维数据的有效方法。提出一个基于成对约束的半监督维数约简一般框架(SSPC)。该方法首先通过使用成对约束和无标号样本的内在几何结构学习一个判别邻接矩阵;其次,新方法应用学到的投影将原来高维空间中的数据映射到低维空间中,以至于聚类内的样本之间距离变得更加紧凑,而不同聚类间的样本之间距离变得尽可能得远。所提出的算法不仅能找到一个最佳的线性判别子空间,还可以揭示流形数据的非线性结构。在一些真实数据集上的实验结果表明,新方法的性能优于当前主流基于成对约束的维数约简算法的性能。  相似文献   

10.
属性选择是一种有效的数据预处理方法,可同时保留多变量时间序列重要变量的时序关系及其实际物理意义。针对很多实际数据无类别信息的问题,文中提出一种无监督属性选择算法并分析其复杂度。首先设计一种无需进行相空间重构的多变量时间序列分形维数计算方法,并将分形维数视为其本质维,利用属性子集的分形维数及其属性数目的变化作为子集优劣的评价标准。再优化离散粒子群算法以解决高维属性空间搜索的“组合爆炸”问题。最后利用典型混沌动力学系统所产生的多变量时间序列和UCI数据库的5组数据集进行仿真计算,结果表明该算法可在较短时间内找到较优的属性子集,具有较优的整体性能。  相似文献   

11.
Microaggregation is a statistical disclosure control technique for microdata disseminated in statistical databases. Raw microdata (i.e., individual records or data vectors) are grouped into small aggregates prior to publication. Each aggregate should contain at least k data vectors to prevent disclosure of individual information, where k is a constant value preset by the data protector. No exact polynomial algorithms are known to date to microaggregate optimally, i.e., with minimal variability loss. Methods in the literature rank data and partition them into groups of fixed-size; in the multivariate case, ranking is performed by projecting data vectors onto a single axis. In this paper, candidate optimal solutions to the multivariate and univariate microaggregation problems are characterized. In the univariate case, two heuristics based on hierarchical clustering and genetic algorithms are introduced which are data-oriented in that they try to preserve natural data aggregates. In the multivariate case, fixed-size and hierarchical clustering microaggregation algorithms are presented which do not require data to be projected onto a single dimension; such methods clearly reduce variability loss as compared to conventional multivariate microaggregation on projected data  相似文献   

12.
A new dimension reduction method is proposed for functional multivariate regression with a multivariate response and a functional predictor by extending the functional sliced inverse regression model. Naive application of existing dimension reduction techniques for univariate response will create too many hyper-rectangular slices. To avoid this curse of dimensionality, a new slicing method is proposed by clustering over the space of the multivariate response, which generates a much smaller set of slices of flexible shapes. The proposed method can be applied to any number of response variables and can be particularly useful for exploratory analysis. In addition, a new eigenvalue-based method for determining the dimensionality of the reduced space is developed. Real and simulation data examples are then presented to demonstrate the effectiveness of the proposed method.  相似文献   

13.
针对现有的聚类集成算法大都是无监督聚类集成算法且不能很好地处理高维数据的问题,设计一种基于PCA降维技术的成对约束半监督聚类集成算法(SSCEDR).SSCEDR方法使用PCA主成分分析对原始数据进行降维,结合半监督聚类集成技术,在降维后的空间中将成对约束等先验知识代入到聚类集成过程中.本文通过在多组数据集上实验来验证...  相似文献   

14.
This survey article considers methods and algorithms for fast estimation of data distance/similarity measures from formed real-valued vectors of small dimension. The methods do not use learning and mainly use random projection and sampling. Initial data are mainly high-dimensional vectors with different measures of distance (Euclidean, Manhattan, statistical, etc.) and similarity (dot product, etc.). Vector representations of non-vector data are also considered. The resultant vectors can also be used in similarity search algorithms, machine learning, etc.  相似文献   

15.
This paper is concerned with data science and analytics as applied to data from dynamic systems for the purpose of monitoring, prediction, and inference. Collinearity is inevitable in industrial operation data. Therefore, we focus on latent variable methods that achieve dimension reduction and collinearity removal. We present a new dimension reduction expression of state space framework to unify dynamic latent variable analytics for process data, dynamic factor models for econometrics, subspace identification of multivariate dynamic systems, and machine learning algorithms for dynamic feature analysis. We unify or differentiate them in terms of model structure, objectives with constraints, and parsimony of parameterization. The Kalman filter theory in the latent space is used to give a system theory foundation to some empirical treatments in data analytics. We provide a unifying review of the connections among the dynamic latent variable methods, dynamic factor models, subspace identification methods, dynamic feature extractions, and their uses for prediction and process monitoring. Both unsupervised dynamic latent variable analytics and the supervised counterparts are reviewed. Illustrative examples are presented to show the similarities and differences among the analytics in extracting features for prediction and monitoring.  相似文献   

16.
随着数据采集技术的发展,人们获取数据的途径呈多样化,所得到的数据往往具有多个视图,从而形成多视图数据。利用多视图数据不同的信息特征,设计相应的多视图学习策略以提高分类器的性能是多视图学习的研究目标。为更好地利用多视图数据,促进降维算法在实际中的应用,对多视图降维算法进行研究。分析多视图数据和多视图学习,在典型相关分析(CCA)的基础上追溯多视图CCA和核CCA,介绍多视图降维算法从两个视图到多个视图以及从线性到非线性的演化过程,总结各种融入判别信息和近邻信息的多视图降维算法,以更好地学习多视图降维算法。在此基础上,对比分析多视图降维算法的特点及存在的问题,并对未来的研究方向进行展望。  相似文献   

17.
IDR/QR: an incremental dimension reduction algorithm via QR decomposition   总被引:1,自引:0,他引:1  
Dimension reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-dimensional data. In the literature, a well-known dimension reduction algorithm is linear discriminant analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of singular value decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental dimension reduction algorithm, called IDR/QR, which applies QR decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically.  相似文献   

18.
二叉决策树生成算法的VC维上界   总被引:1,自引:1,他引:0  
在统计学习理论中,尤其对于分类问题,VC维扮演着中心作用。大多数常用算法的VC维未知。该文计算了二叉决策树生成算法的VC维上界,获得了定理2,认为该上界随决策树的复杂度和节点可调参数个数的增大而提高。作为补充,还计算了单变量决策树非叶子节点的VC维上界,获得了定理3。为了评估定理2的数值结果,通过实验验证了有关的经验结论,发现它们在决策树复杂度较大时能够与实际符合。比较定理2和经验结论发现两者存在较大的数值差别但是变化趋势相同。探讨了产生差别的原因以及定理对实际应用的指导意义。  相似文献   

19.
The rapid development of new technologies such as artificial intelligence and big data analysis requires the simultaneous development of cloud computing technology. The application of IoT-to-cloud setting has been fully applied in various industry sectors, such as sensor-cloud system which is composed of wireless sensor network and cloud computing technology. With the increasing amount and types of collected data, companies need to reduce the dimension of massive data in cloud servers for obtaining data analysis reports rapidly. Due to frequent cloud server data leaks, companies must adequately protect the privacy of some confidential data. To this end, we designed a dimension reduction method for ciphertext data in the sensor-cloud system based on the CKKS encryption scheme, principal component analysis (PCA) and linear discriminant analysis (LDA) dimension reduction algorithm. As data cannot be directly calculated using traditional PCA and LDA algorithm after encryption, we add some interactive operations and iterative calculations to replace some steps in traditional algorithms. Finally, we select the classification dataset IRIS which is commonly used in machine learning, and screen out the best encryption and calculation parameters, and efficiently realize the dimension reduction method of ciphertext data through a large number of experiments.  相似文献   

20.
现实世界中高维数据无处不在,然而在高维数据中往往存在大量的冗余和噪声信息,这导致很多传统聚类算法在对高维数据聚类时不能获得很好的性能.实践中发现高维数据的类簇结构往往嵌入在较低维的子空间中.因而,降维成为挖掘高维数据类簇结构的关键技术.在众多降维方法中,基于图的降维方法是研究的热点.然而,大部分基于图的降维算法存在以下两个问题:(1)需要计算或者学习邻接图,计算复杂度高;(2)降维的过程中没有考虑降维后的用途.针对这两个问题,提出一种基于极大熵的快速无监督降维算法MEDR. MEDR算法融合线性投影和极大熵聚类模型,通过一种有效的迭代优化算法寻找高维数据嵌入在低维子空间的潜在最优类簇结构. MEDR算法不需事先输入邻接图,具有样本个数的线性时间复杂度.在真实数据集上的实验结果表明,与传统的降维方法相比, MEDR算法能够找到更好地将高维数据投影到低维子空间的投影矩阵,使投影后的数据有利于聚类.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号