首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 656 毫秒
1.
Attribute-based format is the main data representation format used by machine learning algorithms. When the attributes do not properly describe the initial data, performance starts to degrade. Some algorithms address this problem by internally changing the representation space, but the newly constructed features rarely have any meaning. We seek to construct, in an unsupervised way, new attributes that are more appropriate for describing a given dataset and, at the same time, comprehensible for a human user. We propose two algorithms that construct the new attributes as conjunctions of the initial primitive attributes or their negations. The generated feature sets have reduced correlations between features and succeed in catching some of the hidden relations between individuals in a dataset. For example, a feature like $sky \wedge \neg building \wedge panorama$ would be true for non-urban images and is more informative than simple features expressing the presence or the absence of an object. The notion of Pareto optimality is used to evaluate feature sets and to obtain a balance between total correlation and the complexity of the resulted feature set. Statistical hypothesis testing is employed in order to automatically determine the values of the parameters used for constructing a data-dependent feature set. We experimentally show that our approaches achieve the construction of informative feature sets for multiple datasets.  相似文献   

2.
动态变化的图数据在现实应用中广泛存在,有效地对动态网络异常数据进行挖掘,具有重要的科学价值和实践意义.大多数传统的动态网络异常检测算法主要关注于网络结构的异常,而忽视了节点和边的属性以及网络变化的作用.提出一种基于图神经网络的异常检测算法,将图结构、属性以及动态变化的信息引入模型中,来学习进行异常检测的表示向量.具体地,改进图上无监督的图神经网络框架DGI,提出一种面向动态网络无监督表示学习算法Dynamic-DGI.该方法能够同时提取网络本身的异常特性以及网络变化的异常特性,用于表示向量的学习.实验结果表明,使用该算法学得的网络表示向量进行异常检测,得到的结果优于最新的子图异常检测算法SpotLight,并且显著优于传统的网络表示学习算法.除了能够提升异常检测的准确度,该算法也能够挖掘网络中存在的有实际意义的异常.  相似文献   

3.
为了提高预测的准确性,文中结合机器学习中堆积(Stacking)集成框架,组合多个分类器对标记分布进行学习,提出基于标记分布学习的异态集成学习算法(HELA-LDL).算法构造两层模型框架,通过第一层结构将样本数据采用组合方式进行异态集成学习,融合各分类器的学习结果,将融合结果输入到第二层分类器,预测结果是带有置信度的标记分布.在专用数据集上的对比实验表明,HELA-LDL可以发挥各种算法在不同场景下的性能较优,稳定性分析进一步说明算法的有效性.  相似文献   

4.
Discretisation in Lazy Learning Algorithms   总被引:1,自引:0,他引:1  
This paper adopts the idea of discretising continuous attributes (Fayyad and Irani 1993) and applies it to lazy learning algorithms (Aha 1990; Aha, Kibler and Albert 1991). This approach converts continuous attributes into nominal attributes at the outset. We investigate the effects of this approach on the performance of lazy learning algorithms and examine it empirically using both real-world and artificial data to characterise the benefits of discretisation in lazy learning algorithms. Specifically, we have showed that discretisation achieves an effect of noise reduction and increases lazy learning algorithms' tolerance for irrelevant continuous attributes.The proposed approach constrains the representation space in lazy learning algorithms to hyper-rectangular regions that are orthogonal to the attribute axes. Our generally better results obtained using a more restricted representation language indicate that employing a powerful representation language in a learning algorithm is not always the best choice as it can lead to a loss of accuracy.  相似文献   

5.
This paper focuses on the semantics of Telos, a language for representing knowledge about information systems. Telos is intended to support the development of information systems, especially in the requirements modeling phase. An object-oriented representational framework is supported by Telos. Its features include aggregation, generalization, and classification, the treatment of attributes as first-class objects and the explicit representation of time. Telos also provides an assertion sublanguage for expressing deductive rules and integrity constraints. A possible-worlds semantics is defined for Telos knowledge bases. This semantics is intended to capture the peculiarities involved in the interpretation of temporal expressions. The integration of time has also inspired the treatment of existence in Telos. An ontology of objects based on the property of existence is proposed. In the spirit of KRYPTON, Telos knowledge bases are specified functionally, in terms of the operations provided for querying and updating them. This knowledge-level analysis will allow us to specify exactly what a knowledge base can be ASK-ed or TELL-ed about the domain of discourse. Soundness, consistency, and completeness results have also been proven to complete the specification of Telos knowledge bases. This formal account of the language provides a logical framework that can be used to verify the correctness of any proposed implementation of the system.  相似文献   

6.
李慧博  赵云霄  白亮 《计算机应用》2021,41(12):3432-3437
学习图中节点的潜在向量表示是一项重要且普遍存在的任务,旨在捕捉图中节点的各种属性。大量工作证明静态图表示已经能够学习到节点的部分信息,然而,真实世界的图是随着时间的推移而演变的。为了解决多数动态网络算法不能有效保留节点邻域结构和时态信息的问题,提出了基于深度神经网络(DNN)和门控循环单元(GRU)的动态网络表示学习方法DynAEGRU。该方法以自编码器作为框架,其中的编码器首先用DNN聚集邻域信息以得到低维特征向量,然后使用GRU网络提取节点时态信息,最后用解码器重构邻接矩阵并将其与真实图对比来构建损失。通过与几种静态图和动态图表示学习算法在3个数据集上进行实验分析,结果表明DynAEGRU具有较好的性能增益。  相似文献   

7.
Traditional rough set theory is mainly used to extract rules from and reduce attributes in databases in which attributes are characterized by partitions, while the covering rough set theory, a generalization of traditional rough set theory, does the same yet characterizes attributes by covers. In this paper, we propose a way to reduce the attributes of covering decision systems, which are databases characterized by covers. First, we define consistent and inconsistent covering decision systems and their attribute reductions. Then, we state the sufficient and the necessary conditions for reduction. Finally, we use a discernibility matrix to design algorithms that compute all the reducts of consistent and inconsistent covering decision systems. Numerical tests on four public data sets show that the proposed attribute reductions of covering decision systems accomplish better classification performance than those of traditional rough sets.  相似文献   

8.
Quotients and factors are important notions in the design of various computational procedures for regular languages and for the analysis of their logical properties. We propose a new representation of regular languages, by linear systems of language equations, which is suitable for the following computations: language reversal, left quotients and factors, right quotients and factors, and factor matrices. We present algorithms for the computation of all these notions, and indicate an application of the factor matrix to the computation of solutions of a particular language reconstruction problem. The advantage of these algorithms is that they all operate only on linear systems of language equations, while the design of the same algorithms for other representations often require translation to other representations.  相似文献   

9.
Ranking items is an essential problem in recommendation systems. Since comparing two items is the simplest type of queries in order to measure the relevance of items, the problem of aggregating pairwise comparisons to obtain a global ranking has been widely studied. Furthermore, ranking with pairwise comparisons has recently received a lot of attention in crowdsourcing systems where binary comparative queries can be used effectively to make assessments faster for precise rankings. In order to learn a ranking based on a training set of queries and their labels obtained from annotators, machine learning algorithms are generally used to find the appropriate ranking model which describes the data set the best.In this paper, we propose a probabilistic model for learning multiple latent rankings by using pairwise comparisons. Our novel model can capture multiple hidden rankings underlying the pairwise comparisons. Based on the model, we develop an efficient inference algorithm to learn multiple latent rankings as well as an effective inference algorithm for active learning to update the model parameters in crowdsourcing systems whenever new pairwise comparisons are supplied. The performance study with synthetic and real-life data sets confirms the effectiveness of our model and inference algorithms.  相似文献   

10.
多值多类标的数据分类是研究一个样本不但同时属于多个类别,而且在某些属性下也可能存在多个取值的问题。提出了一种结合多值分解和多类标学习的多值多类标分类框架(MDML),采用4种不同的多值分解策略,将问题转化为多类标问题,然后利用3种经典的多类标算法进行学习。实验结果表明,MDML与已有的多值多类标决策树算法相比,有效地提高了分类的性能,而且不同的组合方法适用于不同特点的数据集。  相似文献   

11.
A random set based knowledge representation framework for learning linguistic models is presented. Within this framework a number of algorithms for learning prototypes are proposed, based on grouping certain sets of attributes and evaluating joint mass assignments on labels. These mass assignments can then be combined with a Semi-Naïve Bayes classifier in order to determine classification probabilities. The potential of such linguistic classifiers is then illustrated by their application to a number of toy and benchmark problems. This framework also allows for the evaluation of linguistic queries as will be demonstrated on several well known data sets.  相似文献   

12.
Modelling capability for products to be designed and manufactured plays an important role in order to effectively construct and utilize CAD/CAM systems. Product models should represent all the information about products, which is utilized in manufacturing processes. Therefore it is required that they describe functional structures of machine products, and include not only geometric information but also various non-geometric data, such as physical, technological and management data. Presently there do not seem to exist definite methods or theories for constructing product models. In this paper, we first investigate the whole manufacturing process, and propose a system structure for integration of CAD/CAM, in which product modelling plays a fundamental role. Then requirements for product modelling are studied thoroughly, and a new representation framework for product models is proposed. It consists of an object concept called frame, relations among frames and attributes, and it can incorporate the existing various modelling capabilities, such as solid modelling. We use this representation framework in combination with our solid modelling package GEOMAP-III, and show the effectiveness of this approach by performing illustrative design experiments.  相似文献   

13.

Deep learning has catalysed progress in tasks such as face recognition and analysis, leading to a quick integration of technological solutions in multiple layers of our society. While such systems have proven to be accurate by standard evaluation metrics and benchmarks, a surge of work has recently exposed the demographic bias that such algorithms exhibit–highlighting that accuracy does not entail fairness. Clearly, deploying biased systems under real-world settings can have grave consequences for affected populations. Indeed, learning methods are prone to inheriting, or even amplifying the bias present in a training set, manifested by uneven representation across demographic groups. In facial datasets, this particularly relates to attributes such as skin tone, gender, and age. In this work, we address the problem of mitigating bias in facial datasets by data augmentation. We propose a multi-attribute framework that can successfully transfer complex, multi-scale facial patterns even if these belong to underrepresented groups in the training set. This is achieved by relaxing the rigid dependence on a single attribute label, and further introducing a tensor-based mixing structure that captures multiplicative interactions between attributes in a multilinear fashion. We evaluate our method with an extensive set of qualitative and quantitative experiments on several datasets, with rigorous comparisons to state-of-the-art methods. We find that the proposed framework can successfully mitigate dataset bias, as evinced by extensive evaluations on established diversity metrics, while significantly improving fairness metrics such as equality of opportunity.

  相似文献   

14.
Supervisor Synthesis for Real-Time Discrete Event Systems   总被引:1,自引:1,他引:1  
This paper introduces a formal framework to logically analyze and control real-time discrete event systems (RTDESs). Time Petri nets are extended to controlled time Petri nets (CtlTPNs) to model the dynamics of RTDESs that can be controlled by real-time supervisors. The logical behaviors of CtlTPNs are represented by control class graphs (CCGs) which are untimed automata with timing and control information in their state transition labels. We prove that the CCG corresponding to a CtlTPN expresses completely the logical behavior of the CtlTPN. The real-time supervisor is based on a nondeterministic logical supervisor for the CCG, including the delay for control computations to ensure the supervisor is acceptable in a true real-time environment. We prove the existence of a unique maximal controllable sublanguage of a given specification language and present an algorithm to construct the sublanguage. We also prove that the real-time supervisor meets the prespecified real-time behavior and present an online control algorithm to implement real-time supervisors. The concepts and algorithms are illustrated for an example of packet reception processes in a communication network.  相似文献   

15.
为了消除城市地理信息系统中的信息孤岛,研究了可以动态服务装配和松散耦合集成的城市地理信息服务集成框架.采用最新的网格架构体系--OGSA,建立一个虚拟组织,在虚拟组织中,可共享的城市空间信息以网格服务的形式提供给用户或应用.应用和用户能够创建临时服务,还可以发现这些服务并得到其特性.基于OGSA的城市空间信息共享的虚拟组织的建立,实现了可跨越异构,地理分布环境中的城市空间信息的访问与集成,为空间信息全面共享提供了实用可行的解决思路和实施方案.  相似文献   

16.
This paper introduces a number of reliability criteria for computer-aided diagnostic systems for breast cancer. These criteria are then used to analyze some published neural network systems. It is also shown that the property of monotonicity for the data is rather natural in this medical domain, and it has the potential to significantly improve the reliability of breast cancer diagnosis while maintaining a general representation power. A central part of this paper is devoted to the representation/narrow vicinity hypothesis, upon which existing computer-aided diagnostic methods heavily rely. The paper also develops a framework for determining the validity of this hypothesis. The same framework can be used to construct a diagnostic procedure with improved reliability.  相似文献   

17.
18.
We present a method and a tool for the verification of causal and temporal properties for embedded systems.We analyze trace streams resulting from the execution of virtual prototypes that combine simulated hardware and embedded software.The main originality lies in the use of logical clocks to abstract away irrelevant information from the trace.We propose a model-based approach that relies on domain specific languages(DSL).A first DSL,called TISL(trace item specification language),captures the relevant data structures.A second DSL,called STML(simulation trace mapping language),abstracts the simulation raw data into logical clocks,abstracting simulation data into relevant observation probes and thus reducing the trace streams size.The third DSL,called TPSL,defines a set of behavioral patterns that include widely used temporal properties.This is meant for users who are not familiar with temporal logics.Each pattern is transformed into an automata.All the automata are executed concurrently and each one raises an error if and when the related TPSL property is violated.The contribution is the integration of this pattern-based property specification language into the SimSoC virtual prototyping framework without requiring to recompile all the simulation models when the properties evolve.We illustrate our approach with experiments that show the possibility to use multi-core platforms to parallelize the simulation and verification processes,thus reducing the verification time.  相似文献   

19.
以往半监督多示例学习算法常把未标记包分解为示例集合,使用传统的半监督单示例学习算法确定这些示例的潜在标记以对它们进行利用。但该类方法认为多示例样本的分类与其概率密度分布紧密相关,且并未考虑包结构对包分类标记的影响。提出一种基于包层次的半监督多示例核学习方法,直接利用未标记包进行半监督学习器的训练。首先通过对示例空间聚类把包转换为概念向量表示形式,然后计算概念向量之间的海明距离,在此基础上计算描述包光滑性的图拉普拉斯矩阵,进而计算包层次的半监督核,最后在多示例学习标准数据集和图像数据集上测试本算法。测试表明本算法有明显的改进效果。  相似文献   

20.
Graph neural networks(GNNs) have shown great power in learning on graphs.However,it is still a challenge for GNNs to model information faraway from the source node.The ability to preserve global information can enhance graph representation and hence improve classification precision.In the paper,we propose a new learning framework named G-GNN(Global information for GNN) to address the challenge.First,the global structure and global attribute features of each node are obtained via unsupervised pre-training,and those global features preserve the global information associated with the node.Then,using the pre-trained global features and the raw attributes of the graph,a set of parallel kernel GNNs is used to learn different aspects from these heterogeneous features.Any general GNN can be used as a kernal and easily obtain the ability of preserving global information,without having to alter their own algorithms.Extensive experiments have shown that state-of-the-art models,e.g.,GCN,GAT,Graphsage and APPNP,can achieve improvement with G-GNN on three standard evaluation datasets.Specially,we establish new benchmark precision records on Cora(84.31%) and Pubmed(80.95%) when learning on attributed graphs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号