首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 78 毫秒
1.
中文网页自动分类研究   总被引:2,自引:0,他引:2       下载免费PDF全文
本文以实践经验为基础,对网页的源程序公共结构进行了分析,并充分考虑到中文网页的文字特点,给出了一种中文网页的分类方法与实现手段,结果表明该方法行之有效。  相似文献   

2.
基于贝叶斯网络的不确定性知识处理研究   总被引:15,自引:4,他引:11  
贝叶斯网络因其在处理不确定性知识方面的优势近来受到数据挖掘等领域的重视。与当前流行的数据挖掘算法包括决策树、神经网络和遗传算法等相比,贝叶斯网络更易于理解,且有很好的预测效果,适用于处理那些本身存在着固有的不确定性的领域。在比较了贝叶斯网络处理不确定性知识的优势的基础上,描述了用贝叶斯网络进行数据挖掘的过程及其主要研究方向,最后对贝叶斯网络的应用领域、研究现状和前景进行了分析和展望。  相似文献   

3.
一个网页自动分类系统的设计   总被引:2,自引:0,他引:2  
本文介绍了设计的一个网页自动分类系统。介绍了预处理 ,批量训练 ,特征选择 ,在线测试和重归档等模块的设计过程。系统采用有指导的学习方法 ,选取 Naive Bayes作分类模型和信息增益作为特征提取方法。测试结果表明 ,系统获得了较好的精度  相似文献   

4.
本文描述一种基于定性推理的网页分类方法,即通过对网页属性与样本集的相关性来得出结果。我们通过做实验对该方法进行了测试,获得了满意的结果。  相似文献   

5.
基于贝叶斯网络的本体不确定性推理   总被引:1,自引:0,他引:1  
运用OWL语言扩展了本体对领域知识的不确定性表示,并基于贝叶斯网络实现了本体领域知识的不确定性推理。实验表明将贝叶斯网络与本体结合起来,能够充分发挥本体在知识描述方面的优势和贝叶斯网络的推理能力,实现依据部分信息的概率描述获取知识,指导实践。  相似文献   

6.
基于贝叶斯网络的故障诊断策略优化方法   总被引:10,自引:0,他引:10  
通过分析设备故障诊断与维修所面临的主要问题以及当前常用诊断策略存在的局限性,研究基于贝叶斯网络的故障诊断策略优化方法。提出了适合于表达诊断问题的基于故障假设一观测一维修操作节点的贝叶斯网络结构,阐述了基于贝叶斯网络的故障诊断策略优化方法的基本思想和优化算法。该方法综合考虑了多故障、有观测操作以及操作之间有依赖关系等情况。最后通过应用实例,证实了该方法在信患不确定条件下进行诊断与维修决策的有效性。  相似文献   

7.
王冠  裘正定 《微机发展》2005,15(3):136-138,141
AIP(All day Information Pursue)平台,即全天候信息跟踪平台,作为关注多方面消息的企业或团体查看Internet上新信息的一种解决方案,弥补了搜索引擎一些方面的不足。它能够从Internet上获取每日的新信息,利用网页自动分类去除不相关文章。通过此平台.用户可以按时间、按类别来查看信息,也可以对文章加以标注推荐给别人阅读。  相似文献   

8.
基于自动分类的网页机器人   总被引:2,自引:0,他引:2  
康平波  王文杰 《计算机工程》2003,29(21):123-124,127
随着互联网的普及和发展,网络上的信息资源越来越丰富,它需要高效智能的工具来完成信息资源的采集。WWW上的网页抓取器,又称Robot讨论了抓取器与文本自动分类器相结合,对用户要求领域网页的收集。抓取器找到相关链接进行抓取,而避免对非相关链接的抓取。这样可以节省硬件、网络资源和提高抓取器的效率。  相似文献   

9.
贝叶斯网络是人工智能中不确定知识表示和推理的有力工具。介绍了贝叶斯网络的概念,给出一个实例,分析了贝叶斯网络推理的方法和过程。  相似文献   

10.
郑津  景彦昊 《福建电脑》2014,(4):153-154
本文主要分析了中文网页自动分类所要用到的功能和技术,给出了一个可行的中文网页自动分类系统的基本构架。  相似文献   

11.
The number of Internet users and the number of web pages being added to WWW increase dramatically every day.It is therefore required to automatically and e?ciently classify web pages into web directories.This helps the search engines to provide users with relevant and quick retrieval results.As web pages are represented by thousands of features,feature selection helps the web page classifiers to resolve this large scale dimensionality problem.This paper proposes a new feature selection method using Ward s minimum variance measure.This measure is first used to identify clusters of redundant features in a web page.In each cluster,the best representative features are retained and the others are eliminated.Removing such redundant features helps in minimizing the resource utilization during classification.The proposed method of feature selection is compared with other common feature selection methods.Experiments done on a benchmark data set,namely WebKB show that the proposed method performs better than most of the other feature selection methods in terms of reducing the number of features and the classifier modeling time.  相似文献   

12.
先验信息不确定条件下贝叶斯网结构学习方法   总被引:2,自引:1,他引:1       下载免费PDF全文
针对先验信息不确定条件下的贝叶斯网络学习问题,提出了一种非确定先验结构信息贝叶斯网络的结构学习方法。主要在以下几个方面开展了工作:提出了一种贝叶斯网络结构的不确定先验信息表示方法;改进了MDL测度,提出了SMDL测度,使之能在学习过程中考虑先验信息的不确定性;基于模拟退火算法,对问题进行求解。最后通过实验对算法的可行性进行了验证。  相似文献   

13.
中文网页分类技术是数据挖掘中一个研究热点领域,而支持向量机(SVM)是一种高效的分类识别方法,在解决高维模式识别问题中表现出许多特有的优势.提出了基于支持向量机的中文网页分类方法,其中包括对该过程中的网页文本预处理、特征提取和多分类算法等关键技术的介绍.实验表明,该方法训练数据规模大大减少,训练效率较高,同时具有较好的精确率和召回率.  相似文献   

14.
With the development of mobile technology, the users browsing habits are gradually shifted from only information retrieval to active recommendation. The classification mapping algorithm between users interests and web contents has been become more and more difficult with the volume and variety of web pages. Some big news portal sites and social media companies hire more editors to label these new concepts and words, and use the computing servers with larger memory to deal with the massive document classification, based on traditional supervised or semi-supervised machine learning methods. This paper provides an optimized classification algorithm for massive web page classification using semantic networks, such as Wikipedia, WordNet. In this paper, we used Wikipedia data set and initialized a few category entity words as class words. A weight estimation algorithm based on the depth and breadth of Wikipedia network is used to calculate the class weight of all Wikipedia Entity Words. A kinship-relation association based on content similarity of entity was therefore suggested optimizing the unbalance problem when a category node inherited the probability from multiple fathers. The keywords in the web page are extracted from the title and the main text using N-gram with Wikipedia Entity Words, and Bayesian classifier is used to estimate the page class probability. Experimental results showed that the proposed method obtained good scalability, robustness and reliability for massive web pages.  相似文献   

15.
Bayesian networks are graphical models that describe dependency relationships between variables, and are powerful tools for studying probability classifiers. At present, the causal Bayesian network learning method is used in constructing Bayesian network classifiers while the contribution of attribute to class is over-looked. In this paper, a Bayesian network specifically for classification-restricted Bayesian classification networks is proposed. Combining dependency analysis between variables, classification accuracy evaluation criteria and a search algorithm, a learning method for restricted Bayesian classification networks is presented. Experiments and analysis are done using data sets from UCI machine learning repository. The results show that the restricted Bayesian classification network is more accurate than other well-known classifiers.  相似文献   

16.
史建国  高晓光 《计算机应用》2012,32(7):1943-1946
离散动态贝叶斯网络是对时间序列进行建模和推理的重要工具,具有广泛的建模应用价值,但是其推理算法还有待进一步完善。针对构离散动态贝叶斯网络的推理算法难以理解、编程计算难、推理速度慢的问题,给出了实现离散动态贝叶斯推理算法的数据结构,推导了进行计算机编程计算的推理算法和编程步骤,并通过实例进行了算理验证。  相似文献   

17.
极限学习机ELM不同于传统的神经网络学习算法(如BP算法),是一种高效的单隐层前馈神经网络(SLFNs)学习算法。将极限学习机引入到中文网页分类任务中。对中文网页进行预处理,提取其特性信息,从而形成网页特征树,产生定长编码作为极限学习机的输入数据。实验结果表明该方法能够有效地分类网页。  相似文献   

18.
Processing lineages (also called provenances) over uncertain data consists in tracing the origin of uncertainty based on the process of data production and evolution. In this paper, we focus on the representation and processing of lineages over uncertain data, where we adopt Bayesian network (BN), one of the popular and important probabilistic graphical models (PGMs), as the framework of uncertainty representation and inferences. Starting from the lineage expressed as Boolean formulae for SPJ (Selection–Projection–Join) queries over uncertain data, we propose a method to transform the lineage expression into directed acyclic graphs (DAGs) equivalently. Specifically, we discuss the corresponding probabilistic semantics and properties to guarantee that the graphical model can support effective probabilistic inferences in lineage processing theoretically. Then, we propose the function-based method to compute the conditional probability table (CPT) for each node in the DAG. The BN for representing lineage expressions over uncertain data, called lineage BN and abbreviated as LBN, can be constructed while generally suitable for both safe and unsafe query plans. Therefore, we give the variable-elimination-based algorithm for LBN's exact inferences to obtain the probabilities of query results, called LBN-based query processing. Then, we focus on obtaining the probabilities of inputs or intermediate tuples conditioned on query results, called LBN-based inference query processing, and give the Gibbs-sampling-based algorithm for LBN's approximate inferences. Experimental results show the efficiency and effectiveness of our methods.  相似文献   

19.
The complexity of web information environments and multiple‐topic web pages are negative factors significantly affecting the performance of focused crawling. A highly relevant region in a web page may be obscured because of low overall relevance of that page. Segmenting the web pages into smaller units will significantly improve the performance. Conquering and traversing irrelevant page to reach a relevant one (tunneling) can improve the effectiveness of focused crawling by expanding its reach. This paper presents a heuristic‐based method to enhance focused crawling performance. The method uses a Document Object Model (DOM)‐based page partition algorithm to segment a web page into content blocks with a hierarchical structure and investigates how to take advantage of block‐level evidence to enhance focused crawling by tunneling. Page segmentation can transform an uninteresting multi‐topic web page into several single topic context blocks and some of which may be interesting. Accordingly, focused crawler can pursue the interesting content blocks to retrieve the relevant pages. Experimental results indicate that this approach outperforms Breadth‐First, Best‐First and Link‐context algorithm both in harvest rate, target recall and target length. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号