首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
为了在只有少量已知标记的数据集中获得较好的聚类效果,提出了一种基于图收缩的半监督聚类算法。首先将整个样本空间中的数据表达为一个带权图,再根据给出的must-link约束,对图进行边收缩的修改,进而增强must-link约束。在此基础上引入图拉普拉斯算子,结合cannot-link约束将样本空间投影到一个特征子空间。最后在子空间上进行聚类分析。实验结果表明,该方法不仅提高了对复杂数据的聚类结果,而且在约束对数量较少时也能获得较好的结果。  相似文献   

2.
半监督图核降维方法   总被引:1,自引:0,他引:1       下载免费PDF全文
基于图结构的数据表示和分析,在机器学习领域正得到越来越广泛的关注。以往研究主要集中在为图数据定义一个度量其相似性关系的核函数即图核,一旦定义出图核,就可以用标准的支持向量机(SVM)来对图数据进行分类。将图核方法进行扩充,先利用核主成分分析(kPCA)对图核诱导的高维特征空间中的数据进行降维,得到与原始图数据相对应的低维向量表示的数据,然后对这些新得到的数据用传统机器学习方法进行分析;通过在kPCA中利用图数据中的成对约束形式的监督信息,得到基于图核的半监督降维方法。在MUTAG和PTC等标准图数据集上的实验结果验证了所提方法的有效性。  相似文献   

3.
谱聚类是基于谱图划分理论的一种聚类算法,传统的谱聚类算法属于无监督学习算法,只能利用单一数据来进行聚类。针对这种情况,提出一种基于密度自适应邻域相似图的半监督谱聚类(DAN-SSC)算法。DAN-SSC算法在传统谱聚类算法的基础上结合了半监督学习的思想,很好地解决了传统谱聚类算法无法充分利用所有数据,不得不对一些有标签数据进行舍弃的问题;将少量的成对约束先验信息扩散至整个空间,使其能更好地对聚类过程进行指导。实验结果表明,DAN-SSC算法具有可行性和有效性。  相似文献   

4.
We propose a graph model for mutual information based clustering problem. This problem was originally formulated as a constrained optimization problem with respect to the conditional probability distribution of clusters. Based on the stationary distribution induced from the problem setting, we propose a function which measures the relevance among data objects under the problem setting. This function is utilized to capture the relation among data objects, and the entire objects are represented as an edge-weighted graph where pairs of objects are connected with edges with their relevance. We show that, in hard assignment, the clustering problem can be approximated as a combinatorial problem over the proposed graph model when data is uniformly distributed. By representing the data objects as a graph based on our graph model, various graph based algorithms can be utilized to solve the clustering problem over the graph. The proposed approach is evaluated on the text clustering problem over 20 Newsgroup and TREC datasets. The results are encouraging and indicate the effectiveness of our approach.  相似文献   

5.
Multi-way partitioning of an undirected weighted graph where pairwise similarities are assigned as edge weights, provides an important tool for data clustering, but is an NP-hard problem. Spectral relaxation is a popular way of relaxation, leading to spectral clustering where the clustering is performed by the eigen-decomposition of the (normalized) graph Laplacian. On the other hand, semidefinite relaxation, is an alternative way of relaxing a combinatorial optimization, leading to a convex optimization. In this paper we employ a semidefinite programming (SDP) approach to the graph equipartitioning for clustering, where sufficient conditions for strong duality hold. The method is referred to as semidefinite spectral clustering, where the clustering is based on the eigen-decomposition of the optimal feasible matrix computed by SDP. Numerical experiments with several data sets, demonstrate the useful behavior of our semidefinite spectral clustering, compared to existing spectral clustering methods.  相似文献   

6.
This paper presents a novel pairwise constraint propagation approach by decomposing the challenging constraint propagation problem into a set of independent semi-supervised classification subproblems which can be solved in quadratic time using label propagation based on $k$ -nearest neighbor graphs. Considering that this time cost is proportional to the number of all possible pairwise constraints, our approach actually provides an efficient solution for exhaustively propagating pairwise constraints throughout the entire dataset. The resulting exhaustive set of propagated pairwise constraints are further used to adjust the similarity matrix for constrained spectral clustering. Other than the traditional constraint propagation on single-source data, our approach is also extended to more challenging constraint propagation on multi-source data where each pairwise constraint is defined over a pair of data points from different sources. This multi-source constraint propagation has an important application to cross-modal multimedia retrieval. Extensive results have shown the superior performance of our approach.  相似文献   

7.
Data clustering plays an important role in many disciplines, including data mining, machine learning, bioinformatics, pattern recognition, and other fields, where there is a need to learn the inherent grouping structure of data in an unsupervised manner. There are many clustering approaches proposed in the literature with different quality/complexity tradeoffs. Each clustering algorithm works on its domain space with no optimum solution for all datasets of different properties, sizes, structures, and distributions. In this paper, a novel cooperative clustering (CC) model is presented. It involves cooperation among multiple clustering techniques for the goal of increasing the homogeneity of objects within the clusters. The CC model is capable of handling datasets with different properties by developing two data structures, a histogram representation of the pair-wise similarities and a cooperative contingency graph. The two data structures are designed to find the matching sub-clusters between different clusterings and to obtain the final set of clusters through a coherent merging process. The cooperative model is consistent and scalable in terms of the number of adopted clustering approaches. Experimental results show that the cooperative clustering model outperforms the individual clustering algorithms over a number of gene expression and text documents datasets.  相似文献   

8.
潘振君  梁成  张化祥 《计算机应用》2021,41(12):3438-3446
针对多视图数据分析易受原始数据集噪声干扰,以及需要额外的步骤计算聚类结果的问题,提出一种基于一致图学习的鲁棒多视图子空间聚类(RMCGL)算法。首先,在各个视图下学习数据在子空间中的潜在鲁棒表示,并基于该表示得到各视图的相似度矩阵。随后,基于得到的多个相似度矩阵学习一个统一的相似度图。最后,通过对相似度图对应的拉普拉斯矩阵添加秩约束,确保得到的相似度图具有最优的聚类结构,并可直接得到最终的聚类结果。该过程在一个统一的优化框架中完成,能同时学习潜在鲁棒表示、相似度矩阵和一致图。RMCGL算法的聚类精度(ACC)在BBC、100leaves和MSRC数据集上比基于图的多视图聚类(GMC)算法分别提升了3.36个百分点、5.82个百分点和5.71个百分点。实验结果表明,该算法具有良好的聚类效果。  相似文献   

9.
现有的半监督聚类集成方法能利用先验信息,使集成的准确性、鲁棒性和稳定性得到提高,但在集成阶段加入成对约束信息时,只考虑了给定的约束信息而忽视了约束点与被约束点的邻域点之间的关系.针对此问题,提出了一种基于数据相关性的半监督模糊聚类集成方法.该方法首先利用半监督模糊聚类算法建立集成信息矩阵,并将其转换为相似性矩阵;然后,利用已知的约束信息及约束点与被约束点的邻域点之间的关系来修改相似性矩阵;最后,利用图划分算法得到最终的聚类结果.真实数据上的实验结果表明,提出的方法可以有效提高聚类质量.  相似文献   

10.
11.
In this paper, main components of a workflow system that are relevant to the correctness in the presence of concurrency are formalized based on set theory and graph theory. The formalization which constitutes the theoretical basis of the correctness criterion provided can be summarized as follows:-Activities of a workflow are represented through a notation based on set theory to make it possible to formalize the conceptual grouping of activities.-Control-flow is represented as a special graph based on this set definition, and it includes serial composition, parallel composition, conditional branching, and nesting of individual activities and conceptual activities themselves.-Data-flow is represented as a directed acyclic graph in conformance with the control-flow graph.The formalization of correctness of concurrently executing workflow instances is based on this framework by defining two categories of constraints on the workflow environment with which the workflow instances and their activities interact. These categories are:-Basic constraints that specify the correct states of a workflow environment.-Inter-activity constraints that define the semantic dependencies among activities such as an activity requiring the validity of a constraint that is set or verified by a preceding activity.Basic constraints graph and inter-activity constraints graph which are in conformance with the control-flow and data-flow graphs are then defined to represent these constraints. These graphs are used in formalizing the intervals among activities where an inter-activity constraint should be maintained and the intervals where a basic constraint remains invalid.A correctness criterion is defined for an interleaved execution of workflow instances using the constraints graphs. A concurrency control mechanism, namely Constraint Based Concurrency Control technique is developed based on the correctness criterion. The performance analysis shows the superiority of the proposed technique. Other possible approaches to the problem are also presented.  相似文献   

12.
一致性检验问题是主方向关系推理中非常重要的基础理论问题,提出了一种利用欧几里德空间坐标图实施一致性检验的新方法。首先对研究的问题进行了定义,阐述了方向关系的坐标图表示方法,从而使得对点物体方向关系约束集的一致性检验就转化为检测图中是否存在环的问题,通过一致性判定、环的检测、实施方法这3个环节来具体实现。其算法的时间复杂度是O(n+e),优于传统的O(n2)  相似文献   

13.
We study the problem of detecting a maximum embedded network submatrix in a {−1,0,+1}-matrix. Our aim is to solve the problem to optimality. We introduce a 0–1 integer linear programming formulation for this problem based on its representation over a signed graph. A polyhedral study is presented and a branch-and-cut algorithm is described for finding an optimal solution to the problem. Some computational experiments are carried out over a set of instances available in the literature as well as over a set of random instances.  相似文献   

14.
In recent years, semi-supervised clustering (SSC) has aroused considerable interests from the machine learning and data mining communities. In this paper we propose a novel SSC approach with enhanced spectral embedding (ESE), which not only considers the geometric structure information contained in data sets, but also can make use of the given side information such as pairwise constraints. Specially, we first construct a symmetry-favored k-NN graph, which is highly robust to noise and outliers, and can reflect the underlying manifold structures of data sets. Then we learn the enhanced spectral embedding towards an ideal data representation as consistent with the given pairwise constraints as possible. Finally, by using the regularization of spectral embedding we formulate learning the new data representation as a semidefinite-quadratic-linear programming (SQLP) problem, which can be efficiently solved. Experimental results on a variety of synthetic and real-world data sets show that our ESE approach outperforms the state-of-the-art SSC algorithms in terms of speed and quality on both vector-based and graph-based clustering.  相似文献   

15.
To obtain a user-desired and accurate clustering result in practical applications, one way is to utilize additional pairwise constraints that indicate the relationship between two samples, that is, whether these samples belong to the same cluster or not. In this paper, we put forward a discriminative learning approach which can incorporate pairwise constraints into the recently proposed two-class maximum margin clustering framework. In particular, a set of pairwise loss functions is proposed, which features robust detection and penalization for violating the pairwise constraints. Consequently, the proposed method is able to directly find the partitioning hyperplane, which can separate the data into two groups and satisfy the given pairwise constraints as much as possible. In this way, it makes fewer assumptions on the distance metric or similarity matrix for the data, which may be complicated in practice, than existing popular constrained clustering algorithms. Finally, an iterative updating algorithm is proposed for the resulting optimization problem. The experiments on a number of real-world data sets demonstrate that the proposed pairwise constrained two-class clustering algorithm outperforms several representative pairwise constrained clustering counterparts in the literature.  相似文献   

16.
一种结合主动学习的半监督文档聚类算法   总被引:1,自引:0,他引:1  
半监督文档聚类,即利用少量具有监督信息的数据来辅助无监督文档聚类,近几年来逐渐成为机器学习和数据挖掘领域研究的热点问题.由于获取大量监督信息费时费力,因此,国内外学者考虑如何获得少量但对聚类性能提高显著的监督信息.提出一种结合主动学习的半监督文档聚类算法,通过引入成对约束信息指导DBSCAN的聚类过程来提高聚类性能,得到一种半监督文档聚类算法Cons-DBSCAN.通过对约束集中所含信息量的衡量和对DBSCAN算法本身的分析,提出了一种启发式的主动学习算法,能够选取含信息量大的成对约束集,从而能够更高效地辅助半监督文档聚类.实验结果表明,所提出的算法能够高效地进行文档聚类.通过主动学习算法获得的成对约束集,能够显著地提高聚类性能.并且,算法的性能优于两个代表性的结合主动学习的半监督聚类算法.  相似文献   

17.
陈献  胡丽莹  林晓炜  陈黎飞 《计算机应用》2021,41(12):3447-3454
现有的有向图聚类算法大多基于向量空间中节点间的近似线性关系假设,忽略了节点间存在的非线性相关性。针对该问题,提出一种基于核非负矩阵分解(KNMF)的有向图聚类算法。首先,引入核学习方法将有向图的邻接矩阵投影到核空间,并通过特定的正则项约束原空间及核空间中节点间的相似性。其次,提出了图正则化核非对称NMF算法的目标函数,并在非负约束条件下通过梯度下降方法推导出一个聚类算法。该算法在考虑节点连边的方向性的同时利用核学习方法建模节点间的非线性关系,从而准确地揭示有向图中潜在的结构信息。最后,在专利-引文网络(PCN)数据集上的实验结果表明,簇的数目为2时,和对比算法相比,所提算法将DB值和DQF值分别提高了约0.25和8%,取得了更好的聚类质量。  相似文献   

18.
Context‐based email classification requires understanding of semantic and structural attributes of email. Most of the research has focused on generating semantic properties through structural components of email. By viewing emails as events (as a major subset of class of email), a rich contextual test‐bed representation for understanding of the semantic attributes of emails has been devised. The event‐ based emails have traditionally been studied based on simple structural properties. In this paper, we present a novel approach by first representing such class of emails as graphs, followed by heuristically applying graph mining and matching algorithm to pick templates representing contextual and semantic attributes that help classify emails. The classification templates used three key event classes: social, personal and professional. Results show that our graph mining and matching supported template‐based approach performs consistently well over event email data set with high accuracy.  相似文献   

19.
Multi-view learning algorithms typically assume a complete bipartite mapping between the different views in order to exchange information during the learning process. However, many applications provide only a partial mapping between the views, creating a challenge for current methods. To address this problem, we propose a multi-view algorithm based on constrained clustering that can operate with an incomplete mapping. Given a set of pairwise constraints in each view, our approach propagates these constraints using a local similarity measure to those instances that can be mapped to the other views, allowing the propagated constraints to be transferred across views via the partial mapping. It uses co-EM to iteratively estimate the propagation within each view based on the current clustering model, transfer the constraints across views, and then update the clustering model. By alternating the learning process between views, this approach produces a unified clustering model that is consistent with all views. We show that this approach significantly improves clustering performance over several other methods for transferring constraints and allows multi-view clustering to be reliably applied when given a limited mapping between the views. Our evaluation reveals that the propagated constraints have high precision with respect to the true clusters in the data, explaining their benefit to clustering performance in both single- and multi-view learning scenarios.  相似文献   

20.
针对多视角数据间互补与一致特性难以刻画问题,提出一种基于图卷积神经网络的多视角聚类方法。通过对样本不同视角间相同邻接子图基于图卷积神经网络学习到的表达进行约束,有效挖掘了多视角数据间的一致特性。通过共享图卷积神经网络参数、学习不同视角完整邻接图嵌入表达并串接得到多视角表达,有效挖掘了多视角数据间的互补特性。对上述多视角表达增加相对熵约束,使得最终学习到的多视角表达得以提升并符合聚类特性。在五个数据集上均取得了最好的聚类效果,说明所提出的基于图卷积神经网络的聚类方法可以有效挖掘视角间互补与一致特性并提升聚类性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号