首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A Bayesian Method for the Induction of Probabilistic Networks from Data   总被引:108,自引:3,他引:108  
This paper presents a Bayesian method for constructing probabilistic networks from databases. In particular, we focus on constructing Bayesian belief networks. Potential applications include computer-assisted hypothesis testing, automated scientific discovery, and automated construction of probabilistic expert systems. We extend the basic method to handle missing data and hidden (latent) variables. We show how to perform probabilistic inference by averaging over the inferences of multiple belief networks. Results are presented of a preliminary evaluation of an algorithm for constructing a belief network from a database of cases. Finally, we relate the methods in this paper to previous work, and we discuss open problems.  相似文献   

2.
聚类分析根据类对象划分为Q型聚类和R型聚类,基于贝叶斯方法的Q型聚类算法,详细说明该算法的基本思想和具体实现过程.实验结果表明算法的可行性,该算法对于数据挖掘具有一定的参考价值.  相似文献   

3.
采用构造型神经网络对大规模模式进行聚类,其中利用商空间粒度分析法选择最优粒度聚类。该方法既发挥了构造型神经网络计算复杂度低的优点,又利用了商空间理论选取最优粒度聚类。对大规模复杂数据聚类实验结果表明该方法是实效的。  相似文献   

4.
Bayesian Networks for Data Mining   总被引:80,自引:0,他引:80  
A Bayesian network is a graphical model that encodesprobabilistic relationships among variables of interest. When used inconjunction with statistical techniques, the graphical model hasseveral advantages for data modeling. One, because the model encodesdependencies among all variables, it readily handles situations wheresome data entries are missing. Two, a Bayesian network can be used tolearn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequencesof intervention. Three, because the model has both a causal andprobabilistic semantics, it is an ideal representation for combiningprior knowledge (which often comes in causal form) and data. Four,Bayesian statistical methods in conjunction with Bayesian networksoffer an efficient and principled approach for avoiding theoverfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarizeBayesian statistical methods for using data to improve these models.With regard to the latter task, we describe methods for learning boththe parameters and structure of a Bayesian network, includingtechniques for learning with incomplete data. In addition, we relateBayesian-network methods for learning to techniques for supervised andunsupervised learning. We illustrate the graphical-modeling approachusing a real-world case study.  相似文献   

5.
We present a method of constructive induction aimed at learning tasks involving multivariate time series data. Using metafeatures, the scope of attribute-value learning is expanded to domains with instances that have some kind of recurring substructure, such as strokes in handwriting recognition, or local maxima in time series data. The types of substructures are defined by the user, but are extracted automatically and are used to construct attributes.Metafeatures are applied to two real domains: sign language recognition and ECG classification. Using metafeatures we are able to generate classifiers that are either comprehensible or accurate, producing results that are comparable to hand-crafted preprocessing and comparable to human experts.  相似文献   

6.
以EM算法为基础,在给定贝叶斯网络结构情况下。研究分析了Voting EM算法并利用该算法对防洪决策贝叶斯网络进行在线参数学习,将该算法与EM算法的学习结果进行了比较分析,结果表明Voting EM算法不但能够进行在线参数学习,而且也具有较高的学习精度.  相似文献   

7.
基于小生境遗传算法的贝叶斯网络结构学习算法研究*   总被引:1,自引:1,他引:1  
在数据缺失的情况下讨论一种贝叶斯网络的结构学习算法.该算法结合了小生境遗传算法和EM算法,最后通过试验说明了该算法的有效性.  相似文献   

8.
Interactive Concept-Learning and Constructive Induction by Analogy   总被引:1,自引:0,他引:1  
The available concept-learners only partially fulfill the needs imposed by the learning apprentice generation of learners. We present a novel approach to interactive concept-learning and constructive induction that better fits the requirements imposed by the learning apprentice paradigm. The approach is incorporated in the system Clint-Cia, which integrates several user-friendly features into one working whole: it is interactive, generates examples, shifts its bias, identifies concepts in the limit, copes with indirect relevance, recovers from errors, performs constructive induction and invents new concepts by analogy to previously learned ones.  相似文献   

9.
贝叶斯网络是用来描述不确定变量之间潜在依赖关系的图形模型。从完备数据集上学习贝叶斯网络是一个研究热点。分析了完备数据集上构建贝叶斯网的常见理论方法。  相似文献   

10.
Recently, a massive quantity of data is being produced from a distinct number of sources and the size of the daily created on the Internet has crossed two Exabytes. At the same time, clustering is one of the efficient techniques for mining big data to extract the useful and hidden patterns that exist in it. Density-based clustering techniques have gained significant attention owing to the fact that it helps to effectively recognize complex patterns in spatial dataset. Big data clustering is a trivial process owing to the increasing quantity of data which can be solved by the use of Map Reduce tool. With this motivation, this paper presents an efficient Map Reduce based hybrid density based clustering and classification algorithm for big data analytics (MR-HDBCC). The proposed MR-HDBCC technique is executed on Map Reduce tool for handling the big data. In addition, the MR-HDBCC technique involves three distinct processes namely pre-processing, clustering, and classification. The proposed model utilizes the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) technique which is capable of detecting random shapes and diverse clusters with noisy data. For improving the performance of the DBSCAN technique, a hybrid model using cockroach swarm optimization (CSO) algorithm is developed for the exploration of the search space and determine the optimal parameters for density based clustering. Finally, bidirectional gated recurrent neural network (BGRNN) is employed for the classification of big data. The experimental validation of the proposed MR-HDBCC technique takes place using the benchmark dataset and the simulation outcomes demonstrate the promising performance of the proposed model interms of different measures.  相似文献   

11.
一种基于不完整数据的朴素贝叶斯分类器   总被引:1,自引:0,他引:1       下载免费PDF全文
贝叶斯网络因其对属性间因果关系的表达能力而成为处理不完整数据的强有力的工具。然而绝大多数的贝叶斯分类器都是基于完整数据的,并且在现实世界中数据往往是不完整的,因此利用不完整数据构建有效的贝叶斯分类器是一个重要而又具有挑战性的问题。 通过分析著名的基于不完整数据的RBC分类器的不足,在BC (Bound and Collapse)方法和EM算法的基础上给出了一种基于不完整数据的分类器构建方法。实验结果表明了该算法的有效性。  相似文献   

12.
贝叶斯网络结构学习分析   总被引:5,自引:0,他引:5  
贝叶斯网络结构学习(以下简称结构学习)的目标是寻找对先验知识和数据拟合得最好的网络结构。结构学习有两种方式,一种是模型选择,即选择一个最好的网络结构;另一种是选择性的模型平均,即选择合适数量的网络结构,以这些网络结构代表所有的网络结构。我们从限定的结构学习与非限定的结构学习两类  相似文献   

13.
基于区间数聚类的无线传感器网络定位方法   总被引:2,自引:0,他引:2  
彭宇  罗清华  王丹  彭喜元 《自动化学报》2012,38(7):1190-1199
在基于接收信号强度指示(Received signal strength indicator, RSSI) 测距的无线传感器网络(Wireless sensor network, WSN)定位方法应用过程中, 信号强度与对应通信距离的对数成线性关系的假设在实际无线通信环境下几乎不能满足, 从而导致定位误差较大. 针对此问题, 本文首先利用区间数表示方法结合实际定位环境中RSSI数据的统计信息表示RSSI的分布区域, 并采用区间数聚类方法实现距离估计, 以减小由于RSSI值不确定性引起的距离估计误差, 然后利用这些距离估计值实现基于测距的WSN定位方法. 采用三种实际通信环境下RSSI测量数据完成的定位实验结果表明, 本文提出的基于区间数聚类RSSI-通信距离(RSSI-D)估计的定位方法可有效地提高定位精度.  相似文献   

14.
In this paper we outline a new method for clustering that is based on a binary representation of data records. The binary database relates each entity to all possible attribute values (domain) that entity may assume. The resulting binary matrix allows for similarity and clustering calculation by using the positive (1 bits) of the entity vector. We formulate two indexes: Pair Similarity Index (PSI) to measure similarity between two entities and Group Similarity Index (GSI) to measure similarity within a group of entities. A threshold factor for each attribute domain is defined that is dependent on the domain but independent of the number of entities in the group. The similarity measure provides simplicity of storage and efficiency of calculation. A comparison of our similarity index to other indexes is made. Experiments with sample data indicate a 48% improvement of group similarity over standard methods pointing to the potential and merit of the binary approach to clustering and data mining.  相似文献   

15.
贝叶斯网络的学习可以分为结构学习和参数学习。期望最大化(EM)算法通常用于不完整数据的参数学习,但是由于EM算法计算相对复杂,存在收敛速度慢和容易局部最大化等问题,传统的EM算法难于处理大规模数据集。研究了EM算法的主要问题,采用划分数据块的方法将大规模数据集划分为小的样本集来处理,降低了EM算法的计算量,同时也提高了计算精度。实验证明,该改进的EM算法具有较高的性能。  相似文献   

16.
Bayesian Clustering by Dynamics   总被引:4,自引:0,他引:4  
Ramoni  Marco  Sebastiani  Paola  Cohen  Paul 《Machine Learning》2002,47(1):91-121
This paper introduces a Bayesian method for clustering dynamic processes. The method models dynamics as Markov chains and then applies an agglomerative clustering procedure to discover the most probable set of clusters capturing different dynamics. To increase efficiency, the method uses an entropy-based heuristic search strategy. A controlled experiment suggests that the method is very accurate when applied to artificial time series in a broad range of conditions and, when applied to clustering sensor data from mobile robots, it produces clusters that are meaningful in the domain of application.  相似文献   

17.
共同进化算法是一种新的进化算法,由于它采用了解空间分离编码,能有效地克服一般进化算法中固有的早熟收敛问题。该文针对数据聚类问题——当前数据挖掘与探查性数据分析中的一个重要课题——将数据聚类问题抽象成为一个赋值图的分割问题,应用共同进化算法来加以解决,使得聚类的结果不必依赖于初始聚类中心,并对该算法的性能加以分析。将该算法与一般的遗传算法相比较,通过实验证明了该算法的优越性能。  相似文献   

18.
传统的K-modes算法采用简单的属性匹配方式计算同一属性下不同属性值的距离,并且计算样本距离时令所有属性权重相等。在此基础上,综合考虑有序型分类数据中属性值的顺序关系、无序型分类数据中不同属性值之间的相似性以及各属性之间的关系等,提出一种更加适用于混合型分类数据的改进聚类算法,该算法对无序型分类数据和有序型分类数据采用不同的距离度量,并且用平均熵赋予相应的权重。实验结果表明,改进算法在人工数据集和真实数据集上均有比K-modes算法及其改进算法更好的聚类效果。  相似文献   

19.
一种半监督K均值多关系数据聚类算法   总被引:3,自引:1,他引:3  
提出了一种半监督K均值多关系数据聚类算法.该算法在K均值聚类算法的基础上扩展了其初始类簇的选择方法和对象相似性度量方法,以用于多关系数据的半监督学习.为了获取高性能,该算法在聚类过程中充分利用了标记数据、对象属性及各种关系信息.多关系数据库Movie上的实验结果验证了该算法的有效性.  相似文献   

20.
在数据挖掘过程中,缺损数据是不可避免的,因此,数据预处理是必不可少的前提工作。在传统的数据预处理工作中,朴素贝叶斯算法是最常用的缺损数据修补算法。然而,现实世界中的数据经常不满足其属性独立性假设,分类结果不令人满意。文章基于聚类分析思想,提出了一种改进的贝叶斯算法。对大量数据的计算结果表明此方法的合理性、可信度优于朴素贝叶斯算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号