首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Lu  Gui-Fu  Zhao  Jinbiao 《Applied Intelligence》2022,52(6):6539-6551

How to design effective multi-view subspace clustering (MVSC) algorithms has recently become a research hotspot. In this paper, we propose a new MVSC algorithm, termed latent multi-view self-representation for clustering via the tensor nuclear norm (LMVS/TNN), which can seamlessly unify multi-view clustering and dimensionality reduction into a framework. Specifically, for each view data, LMVS/TNN learns the transformed data from the original space, which can maintain the original manifold structure, and each subspace representation matrix from the transformed latent space simultaneously. Furthermore, to use the high-order correlations and complementary information from multi-view data, LMVS/TNN constructs a third-order tensor by taking the representation matrix extracted from the transformed latent space as the frontal slice of the third-order tensor and the tensor is constrained by a new low-rank tensor constraint, i.e., the tensor nuclear norm (TNN). In addition, based on the augmented Lagrangian scheme, we develop an efficient procedure to solve LMVS/TNN. To verify the performance of LMVS/TNN, we conduct experiments on public datasets and find that LMVS/TNN outperforms some representative clustering algorithms.

  相似文献   

2.
Subspace clustering finds sets of objects that are homogeneous in subspaces of high-dimensional datasets, and has been successfully applied in many domains. In recent years, a new breed of subspace clustering algorithms, which we denote as enhanced subspace clustering algorithms, have been proposed to (1) handle the increasing abundance and complexity of data and to (2) improve the clustering results. In this survey, we present these enhanced approaches to subspace clustering by discussing the problems they are solving, their cluster definitions and algorithms. Besides enhanced subspace clustering, we also present the basic subspace clustering and the related works in high-dimensional clustering.  相似文献   

3.
A major challenge in subspace clustering is that subspace clustering may generate an explosive number of clusters with high computational complexity, which severely restricts the usage of subspace clustering. The problem gets even worse with the increase of the data’s dimensionality. In this paper, we propose to summarize the set of subspace clusters into k representative clusters to alleviate the problem. Typically, subspace clusters can be clustered further into k groups, and the set of representative clusters can be selected from each group. In such a way, only the most representative subspace clusters will be returned to user. Unfortunately, when the size of the set of representative clusters is specified, the problem of finding the optimal set is NP-hard. To solve this problem efficiently, we present two approximate methods: PCoC and HCoC. The greatest advantage of our methods is that we only need a subset of subspace clusters as the input instead of the complete set of subspace clusters. Precisely, only the clusters in low-dimensional subspaces are computed and assembled into representative clusters in high-dimensional subspaces. The approximate results can be found in polynomial time. Our performance study shows both the effectiveness and efficiency of these methods.  相似文献   

4.
随着数据维度的增加,传统聚类算法会出现聚类性能差的现象.SubKMeans是一种功能强大的子空间聚类算法,旨在为K-Means类算法搜索出一个最佳子空间,降低高维度影响,但是该算法需要用户事先指定聚类数目K值,而在实际使用中有时无法给出准确的K值.针对这一问题,引入成对约束,将成对约束与轮廓系数进行结合,提出了一种基于成对约束的SubKMeans聚类数确定算法.改进后的轮廓系数能够更加准确的评价聚类性能,从而实现K值确定,实验结果证明该方法的有效性.  相似文献   

5.
When dealing with high dimensional data, clustering faces the curse of dimensionality problem. In such data sets, clusters of objects exist in subspaces rather than in whole feature space. Subspace clustering algorithms have already been introduced to tackle this problem. However, noisy data points present in this type of data can have great impact on the clustering results. Therefore, to overcome these problems simultaneously, the fuzzy soft subspace clustering with noise detection (FSSC-ND) is proposed. The presented algorithm is based on the entropy weighting soft subspace clustering and noise clustering. The FSSC-ND algorithm uses a new objective function and update rules to achieve the mentioned goals and present more interpretable clustering results. Several experiments have been conducted on artificial and UCI benchmark datasets to assess the performance of the proposed algorithm. In addition, a number of cancer gene expression datasets are used to evaluate the performance of the proposed algorithm when dealing with high dimensional data. The results of these experiments demonstrate the superiority of the FSSC-ND algorithm in comparison with the state of the art clustering algorithms developed in earlier research.  相似文献   

6.
自适应的软子空间聚类算法   总被引:6,自引:0,他引:6  
陈黎飞  郭躬德  姜青山 《软件学报》2010,21(10):2513-2523
软子空间聚类是高维数据分析的一种重要手段.现有算法通常需要用户事先设置一些全局的关键参数,且没有考虑子空间的优化.提出了一个新的软子空间聚类优化目标函数,在最小化子空间簇类的簇内紧凑度的同时,最大化每个簇类所在的投影子空间.通过推导得到一种新的局部特征加权方式,以此为基础提出一种自适应的k-means型软子空间聚类算法.该算法在聚类过程中根据数据集及其划分的信息,动态地计算最优的算法参数.在实际应用和合成数据集上的实验结果表明,该算法大幅度提高了聚类精度和聚类结果的稳定性.  相似文献   

7.
Shared Nearest Neighbours (SNN) techniques are well known to overcome several shortcomings of traditional clustering approaches, notably high dimensionality and metric limitations. However, previous methods were limited to a single information source whereas such methods appear to be very well suited for heterogeneous data, typically in multi-modal contexts. In this paper, we propose a new technique to accelerate the calculation of shared neighbours and we introduce a new multi-source shared neighbours scheme applied to multi-modal image clustering. We first extend existing SNN-based similarity measures to the case of multiple sources and we introduce an original automatic source selection step when building candidate clusters. The key point is that each resulting cluster is built with its own optimal subset of modalities which improves the robustness to noisy or outlier information sources. We experiment our method in the scope of multi-modal search result clustering, visual search mining and subspace clustering. Experimental results on both synthetic and real data involving different information sources and several datasets show the effectiveness of our method.  相似文献   

8.
傅文进  吴小俊 《软件学报》2017,28(12):3347-3357
子空间聚类在运动分割、人脸聚类上得了广泛的应用,并且取得很好的聚类效果.针对稀疏子空间聚类和最小二乘回归子空间聚类求得的表示系数存在类内过于稀疏和类间过于稠密的问题,本文利用l2范数,提出一种基于欧氏距离的且具有组效应的加权低秩子空间聚类算法,此算法通过基于欧氏距离的加权方式,使得最终的表示系数在保证同一子空间数据点联系的同时,减小不同子空间数据点之间的联系.利用此表示系数建立相似矩阵J,将J应用到谱聚类得到聚类结果.实验结果表明,与当前流行的算法比较,本算法取得了较好的聚类效果.  相似文献   

9.
Liang  Naiyao  Yang  Zuyuan  Li  Zhenni  Han  Wei 《Applied Intelligence》2022,52(13):14607-14623

Incomplete multi-view clustering (IMC) has achieved widespread attention due to its advantage in fusing the multi-view information when the view samples are unobserved partly. Recently, it is shown that the clustering performance in the subspace can be improved by preserving the clustering structure of each view, but the problem of the inconsistent clustering structure caused by the incomplete graphs are seldom considered, restricting the clustering performance. Motivated by the clustering interpretation of the orthogonal non-negative matrix factorization, it is employed to unify the clustering structure of the data, and a new model called Incomplete Graph-regularized Orthogonal Non-negative Matrix Factorization (IGONMF) is proposed in this paper. In IGONMF, the reproduced representation is developed, based on which, a set of incomplete graphs are utilized to fully take advantage of the geometric structure of the data. And the orthogonality is further employed to alleviate the problem of the inconsistent clustering structure. Also, we design an effective iterative updating algorithm to solve the proposed model, along with its analysis on the convergence and the computational cost. Finally, experimental results on several real-world datasets indicate that our method is superior to the related state-of-the-art methods.

  相似文献   

10.
陶洋  鲍灵浪  胡昊 《计算机工程》2021,47(4):56-61,67
通过子空间聚类可获得高维数据的潜在子空间结构,但现有算法不能同时揭示数据全局低秩结构和局部稀疏结构特性,致使聚类性能受限。提出一种结构约束的对称低秩表示算法用于子空间聚类。在目标函数中添加结构约束和对称约束来限制低秩表示解的结构,构造一个加权稀疏和对称低秩的亲和度图,在此基础上,结合谱聚类方法实现高效的子空间聚类。实验结果表明,该算法能够准确表示复杂子空间结构,其在Extended Yale B和Hopkins 155基准数据集上的平均聚类误差分别为1.37%和1.43%,聚类性能优于LRR、SSC、LRRSC等算法。  相似文献   

11.
Almost all subspace clustering algorithms proposed so far are designed for numeric datasets. In this paper, we present a k-means type clustering algorithm that finds clusters in data subspaces in mixed numeric and categorical datasets. In this method, we compute attributes contribution to different clusters. We propose a new cost function for a k-means type algorithm. One of the advantages of this algorithm is its complexity which is linear with respect to the number of the data points. This algorithm is also useful in describing the cluster formation in terms of attributes contribution to different clusters. The algorithm is tested on various synthetic and real datasets to show its effectiveness. The clustering results are explained by using attributes weights in the clusters. The clustering results are also compared with published results.  相似文献   

12.
目的 高光谱图像的高维特性和非线性结构给聚类任务带来了"维数灾难"和线性不可分问题,以往的工作将特征提取过程与聚类过程互相剥离,难以同时优化。为了解决上述问题,提出了一种新的嵌入式深度神经网络模糊C均值聚类方法(EDFCC)。方法 EDFCC算法为了提取更加有效的深层特征,联合优化高光谱图像的特征提取和聚类过程,将模糊C均值聚类算法嵌入至深度自编码器网络中,可以保持两任务联合优化的优势,同时利用深度自编码器网络降维以及逼近任意非线性函数的能力,逐步将原始数据映射到潜在特征空间,提取数据的深层特征。所提方法采用模糊C均值聚类算法约束特征提取过程,学习适用于聚类的高光谱数据深层特征,动态调整聚类指示矩阵。结果 实验结果表明,EDFCC算法在Indian Pines和Pavia University两个高光谱数据集上的聚类精度分别达到了42.95%和60.59%,与当前流行的低秩子空间聚类算法(LRSC)相比分别提高了3%和4%,相比于基于自编码器的数据聚类算法(AEKM)分别提高了2%和3%。结论 EDFCC算法能够从高光谱图像的高维光谱信息中提取更加有效的深层特征,提升聚类精度,并且由于EDFCC算法不需要额外的训练过程,大大提升了聚类效率。  相似文献   

13.
预测子空间聚类PSC算法由于建立在PCA模型下,无法鲁棒地进行主元分析,导致在面对带有强噪声的数据时,聚类性能受到严重影响。为了提高PSC算法对噪声的鲁棒性,利用近年来受到广泛关注的RPCA分解技术得到数据的低秩结构,鲁棒地提取子空间,具体地,通过将RPCA模型融入PSC算法,提出了一种基于RPCA的预测子空间聚类算法。该算法在RPCA模型下检测强影响点,不但可以有效地进行变量选择和模型选择,而且更重要的是改善了PSC算法在噪声环境下的聚类性能。在真实基因表达数据集上的实验结果表明,改进后的算法较之经典的PSC算法无论在无噪声或加噪声环境下都表现出一定聚类优势及良好的鲁棒性。  相似文献   

14.
图嵌入正则化投影非负矩阵分解人脸图像特征提取   总被引:2,自引:2,他引:0       下载免费PDF全文
目的 针对投影非负矩阵分解(PNMF)不能揭示数据空间的流形几何结构和判别信息的缺点,提出一种图嵌入正则化投影非负矩阵分解(GEPNMF)人脸图像特征提取方法。 方法 首先构建了描述数据空间的流形几何结构和类间分离度的两个近邻图,然后采用它们的拉普拉斯矩阵设计了一个图嵌入正则项,并将该图嵌入正则项与PNMF的目标函数融合以建立GEPNMF的目标函数。由于引入了图嵌入正则项,GEPNMF求得的子空间能在保持数据空间的流形几何结构的同时,类间间距最大。此外,在GEPNMF目标函数中引入了一个正交正则项,以确保GEPNMF子空间基向量具有数据局部表示能力。最后,对求解GEPNMF目标函数的累乘更新规则(MUR)进行了详细推导,并从理论上证明了其收敛性。结果 在ORL、Yale和CMU PIE人脸图像数据库上分别进行了人脸识别实验,识别率分别达到了94.00%、64.33%和98.58%。结论 实验结果表明,GEPNMF提取的人脸图像特征用于人脸识别时,具有较高的识别率。  相似文献   

15.
Zhang  Guang-Yu  Chen  Xiao-Wei  Zhou  Yu-Ren  Wang  Chang-Dong  Huang  Dong  He  Xiao-Yu 《Applied Intelligence》2022,52(1):716-731

Multi-view subspace clustering has been an important and powerful tool for partitioning multi-view data, especially multi-view high-dimensional data. Despite great success, most of the existing multi-view subspace clustering methods still suffer from three limitations. First, they often recover the subspace structure in the original space, which can not guarantee the robustness when handling multi-view data with nonlinear structure. Second, these methods mostly regard subspace clustering and affinity matrix learning as two independent steps, which may not well discover the latent relationships among data samples. Third, many of them ignore the different importance of multiple views, whose performance may be badly affected by the low-quality views in multi-view data. To overcome these three limitations, this paper develops a novel subspace clustering method for multi-view data, termed Kernelized Multi-view Subspace Clustering via Auto-weighted Graph Learning (KMSC-AGL). Specifically, the proposed method implicitly maps the multi-view data from linear space into nonlinear space via kernel-induced functions, so as to exploit the nonlinear structure hidden in data. Furthermore, our method aims to enhance the clustering performance by learning a set of view-specific representations and their affinity matrix in a general framework. By integrating the view weighting strategy into this framework, our method can automatically assign the weights to different views, while learning an optimal affinity matrix that is well-adapted to the subsequent spectral clustering. Extensive experiments are conducted on a variety of multi-view data sets, which have demonstrated the superiority of the proposed method.

  相似文献   

16.
目的 针对现有广义均衡模糊C-均值聚类不收敛问题,提出一种改进广义均衡模糊聚类新算法,并将其推广至再生希尔伯特核空间以便提高该类算法的普适性。方法 在现有广义均衡模糊C-均值聚类目标函数的基础上,利用Schweizer T范数极限表达式的性质构造了新的广义均衡模糊C-均值聚类最优化目标函数,然后采用拉格朗日乘子法获取其迭代求解所对应的隶属度和聚类中心表达式,同时对其聚类中心迭代表达式进行修改并得到一类聚类性能显著改善的修正聚类算法;最后利用非线性函数将数据样本映射至高维特征空间获得核空间广义均衡模糊聚类算法。结果 对Iris标准文本数据聚类和灰度图像分割测试表明,提出的改进广义均衡模模糊聚类新算法及其修正算法具有良好的分类性能,核空间广义均衡模糊聚类算法对比现有融入类间距离的改进模糊C-均值聚类(FCS)算法和改进再生核空间的模糊局部C-均值聚类(KFLICM)算法能将图像分割的误分率降低10%30%。结论 本文算法克服了现有广义均衡模糊C-均值聚类算法的缺陷,同时改善了聚类性能,适合复杂数据聚类分析的需要。  相似文献   

17.
The background error covariance matrix, B, is often used in variational data assimilation for numerical weather prediction as a static and hence poor approximation to the fully dynamic forecast error covariance matrix, Pf. In this paper the concept of an Ensemble Reduced Rank Kalman Filter (EnRRKF) is outlined. In the EnRRKF the forecast error statistics in a subspace defined by an ensemble of states forecast by the dynamic model are found. These statistics are merged in a formal way with the static statistics, which apply in the remainder of the space. The combined statistics may then be used in a variational data assimilation setting. It is hoped that the nonlinear error growth of small-scale weather systems will be accurately captured by the EnRRKF, to produce accurate analyses and ultimately improved forecasts of extreme events.  相似文献   

18.
The performance of clustering in document space can be influenced by the high dimension of the vectors, because there exists a great deal of redundant information in the high-dimensional vectors, which may make the similarity between vectors inaccurate. Hence, it is very considerable to derive a low-dimensional subspace that contains less redundant information, so that document vectors can be grouped more reasonably. In general, learning a subspace and clustering vectors are treated as two independent steps; in this case, we cannot estimate whether the subspace is appropriate for the method of clustering or vice versa. To overcome this drawback, this paper combines subspace learning and clustering into an iterative procedure named adaptive subspace learning (ASL). Firstly, the intracluster similarity and the intercluster separability of vectors can be increased via the initial cluster indicators in the step of subspace learning, and then affinity propagation is adopted to partition the vectors into a specific number of clusters, so as to update the cluster indicators and repeat subspace learning. In ASL, the obtained subspace can become more suitable for the clustering with the iterative optimization. The proposed method is evaluated using NG20, Classic3 and K1b datasets, and the results are shown to be superior to the conventional methods of document clustering.  相似文献   

19.
As one of the most popular algorithms for cluster analysis, fuzzy c-means (FCM) and its variants have been widely studied. In this paper, a novel generalized version called double indices-induced FCM (DI-FCM) is developed from another perspective. DI-FCM introduces a power exponent r into the constraints of the objective function such that the fuzziness index m is generalized and a new criterion of selecting an appropriate fuzziness index m is defined. Furthermore, it can be explained from the viewpoint of entropy concept that the power exponent r facilitates the introduction of entropy-based constraints into fuzzy clustering algorithms. As an attractive and judicious application, DI-FCM is integrated with a fuzzy subspace clustering (FSC) algorithm so that a new fuzzy subspace clustering algorithm called double indices-induced fuzzy subspace clustering (DI-FSC) algorithm is proposed for high-dimensional data. DI-FSC replaces the commonly used Euclidean distance with the feature-weighted distance, resulting in having two fuzzy matrices in the objective function. A convergence proof of DI-FSC is also established by applying Zangwill’s convergence theorem. Several experiments on both artificial data and real data were conducted and the experimental results show the effectiveness of the proposed algorithm.  相似文献   

20.
目的 大数据环境下的多视角聚类是一个非常有价值且极具挑战性的问题。现有的适合大规模多视角数据聚类的方法虽然在一定程度上能够克服由于目标函数非凸性导致的局部最小值,但是缺乏对异常点鲁棒性的考虑,且在样本选择过程中忽略了视角多样性。针对以上问题,提出一种基于自步学习的鲁棒多样性多视角聚类模型(RD-MSPL)。方法 1)通过在目标函数中引入结构稀疏范数L2,1来建模异常点;2)通过在自步正则项中对样本权值矩阵施加反结构稀疏约束来增加在多个视角下所选择样本的多样性。结果 在Extended Yale B、Notting-Hill、COIL-20和Scene15公开数据集上的实验结果表明:1)在4个数据集上,所提出的RD-MSPL均优于现有的2个最相关多视角聚类方法。与鲁棒多视角聚类方法(RMKMC)相比,聚类准确率分别提升4.9%,4.8%,3.3%和1.3%;与MSPL相比,准确率分别提升7.9%,4.2%,7.1%和6.5%。2)通过自对比实验,证实了所提模型考虑鲁棒性和样本多样性的有效性;3)与单视角以及多个视角简单拼接的实验对比表明,RD-MSPL能够更有效地探索视角之间关联关系。结论 本文提出一种基于自步学习的鲁棒多样性多视角聚类模型,并针对该模型设计了一种高效求解算法。所提方法能够有效克服异常点对聚类性能的影响,在聚类过程中逐步加入不同视角下的多样性样本,在避免局部最小值的同时,能更好地获取不同视角的互补信息。实验结果表明,本文方法优于现有的相关方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号