首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
机器学习与网络信息处理   总被引:2,自引:0,他引:2  
机器学习在网络信息处理中占有重要地位。GHunt是一个采用多项机器学习技术的网络信息智能获取与处理系统。首先,这一系统支持分布式的网络信息并行搜索与内容过滤;其次,采用机器学习技术,包括文本分类、聚类,文本概念抽取,从概念层次理解文本信息;再次,基于概念语义空间有效地统一文本信息管理;最后提供高效的基于概念语义的文本信息检索,以及个性化的专题组织与信息推送服务。文中着重阐述了系统中所用到的机器学习技术。  相似文献   

2.
将Rough集理论应用于规则归纳系统,提出了一种基于粗糙集获取规则知识库的增量式学习方法,能够有效处理决策表中不一致情形,采用启发式算法获取决策表的最简规则,当新对象加入时在原有规则集基础上进行规则知识库的增量式更新,避免了为更新规则而重新运行规获取算法。并用UCI中多个数据集从规则集的规则数目、数据浓缩率、预测能力等指标对该算法进行了测试。实验表明了该算法的有效性。  相似文献   

3.
为构建一种具有实时性的配电网监控信息智能分析规则库,提出了基于机器学习的配电网监控信息智能分析规则库构建方法。将规则库中全部配电网监控规则头排序并设成主链,将规则导进链表里生成规则集,保证各个监控信息数据包都存在一个分析规则。使用基于机器学习的配电网故障数据分类方法,识别配电网监控信息中的故障数据,并提取故障数据频繁项...  相似文献   

4.
为解决核电文件分发面临准确性不高以及处理时间长的问题,文章提出建立一套自动化、智能化的文件分发系统方案,并从基于规则和利用机器学习进行文件智能化分发两个方面展开研究。通过对文件关键信息的识别和提取,建立基于关键信息的规则矩阵,同时附加规则执行反馈机制以完善规则矩阵。此外,对系统的数据来源和数据存储进行分析设计,借助机器学习完善分发规则以及利用算法计算出供系统使用的文件推荐列表。通过在原分发流程上增加规则引擎和智能推荐环节,大大增强了系统的自动化和智能化程度。从实际应用情况来看,智能化的文件分发系统能有效地提高分发准确性和及时性,实现了从人工向智能的跨越。为进一步提升应用效果,文章从语言算法模型和深度学习框架角度提出了后续的改进方向。  相似文献   

5.
Legal text retrieval traditionally relies upon external knowledge sources such as thesauri and classification schemes, and an accurate indexing of the documents is often manually done. As a result not all legal documents can be effectively retrieved. However a number of current artificial intelligence techniques are promising for legal text retrieval. They sustain the acquisition of knowledge and the knowledge-rich processing of the content of document texts and information need, and of their matching. Currently, techniques for learning information needs, learning concept attributes of texts, information extraction, text classification and clustering, and text summarization need to be studied in legal text retrieval because of their potential for improving retrieval and decreasing the cost of manual indexing. The resulting query and text representations are semantically much richer than a set of key terms. Their use allows for more refined retrieval models in which some reasoning can be applied. This paper gives an overview of the state of the art of these innovativetechniques and their potential for legal text retrieval.  相似文献   

6.
ACIRD: intelligent Internet document organization and retrieval   总被引:6,自引:0,他引:6  
This paper presents an intelligent Internet information system, Automatic Classifier for the Internet Resource Discovery (ACIRD), which uses machine learning techniques to organize and retrieve Internet documents. ACIRD consists of a knowledge acquisition process, document classifier, and two-phase search engine. The knowledge acquisition process of ACIRD automatically learns classification knowledge from classified Internet documents. The document classifier applies learned classification knowledge to classify newly collected Internet documents into one or more classes. Experimental results indicate that ACIRD performs as well or better than human experts in both knowledge acquisition and document classification. By using the learned classification knowledge and the given class lattice, the ACIRD two-phase search engine responds to user queries with hierarchically structured navigable results (instead of a conventional flat ranked document list), which greatly aids users in locating information from numerous, diversified Internet documents  相似文献   

7.
Support Vector Machines (SVM) has been developed for Chinese official document classification in One-against-All (OAA) multi-class scheme. Several data retrieving techniques including sentence segmentation, term weighting, and feature extraction are used in preprocess. We observe that most documents of which contents are indistinguishable make poor classification results. The traditional solution is to add misclassified documents to the training set in order to adjust classification rules. In this paper, indistinguishable documents are observed to be informative for strengthening prediction performance since their labels are predicted by the current model in low confidence. A general approach is proposed to utilize decision values in SVM to identify indistinguishable documents. Based on verified classification results and distinguishability of documents, four learning strategies that select certain documents to training sets are proposed to improve classification performance. Experiments report that indistinguishable documents are able to be identified in a high probability and are informative for learning strategies. Furthermore, LMID that adds both of misclassified documents and indistinguishable documents to training sets is the most effective learning strategy in SVM classification for large set of Chinese official documents in terms of computing efficiency and classification accuracy.  相似文献   

8.
Induction of multiple fuzzy decision trees based on rough set technique   总被引:5,自引:0,他引:5  
The integration of fuzzy sets and rough sets can lead to a hybrid soft-computing technique which has been applied successfully to many fields such as machine learning, pattern recognition and image processing. The key to this soft-computing technique is how to set up and make use of the fuzzy attribute reduct in fuzzy rough set theory. Given a fuzzy information system, we may find many fuzzy attribute reducts and each of them can have different contributions to decision-making. If only one of the fuzzy attribute reducts, which may be the most important one, is selected to induce decision rules, some useful information hidden in the other reducts for the decision-making will be losing unavoidably. To sufficiently make use of the information provided by every individual fuzzy attribute reduct in a fuzzy information system, this paper presents a novel induction of multiple fuzzy decision trees based on rough set technique. The induction consists of three stages. First several fuzzy attribute reducts are found by a similarity based approach, and then a fuzzy decision tree for each fuzzy attribute reduct is generated according to the fuzzy ID3 algorithm. The fuzzy integral is finally considered as a fusion tool to integrate the generated decision trees, which combines together all outputs of the multiple fuzzy decision trees and forms the final decision result. An illustration is given to show the proposed fusion scheme. A numerical experiment on real data indicates that the proposed multiple tree induction is superior to the single tree induction based on the individual reduct or on the entire feature set for learning problems with many attributes.  相似文献   

9.
Fuzzy rule induction in a set covering framework   总被引:1,自引:0,他引:1  
  相似文献   

10.
集成多个传感器的智能片上系统( SoC)在物联网得到了广泛的应用.在融合多个传感器数据的分类算法方面,传统的支持向量机( SVM)单分类器不能直接对传感器数据流进行小样本增量学习.针对上述问题,提出一种基于Bagging-SVM的集成增量算法,该算法通过在增量数据中采用Bootstrap方式抽取训练集,构造能够反映新信息变化的集成分类器,然后将新老分类器集成,实现集成增量学习.实验结果表明:该算法相比SVM单分类器能够有效降低分类误差,提高分类准确率,且具有较好的泛化能力,可以满足当下智能传感器系统基于小样本数据流的在线学习需求.  相似文献   

11.
启发式知识获取方法研究   总被引:3,自引:0,他引:3  
归纳学习是解决知识自动获取的有效方法,针对ID3算法、基于粗集的归纳学习以及其它一些归纳学习方法存在的问题,提出了一种新的归纳学习算法ITIL。此算法用信息增益为启发式,选择尽量少的重要属性或组合,以可分辨性为依据提取规则,许多实例表明,这些规则不仅简单,而且冗余小,作为知识获取模块的一部分,ITIL已被集成到一个“基于知识发现的医疗诊断辅助系统”动态知识库子系统中。  相似文献   

12.
分析了SVM增量学习过程中, 样本SV集跟非SV集的转化, 考虑到初始非SV集和新增样本对分类信息的影响, 改进了原有KKT条件, 并结合改进了的错误驱动策略, 提出了新的基于KKT条件下的错误驱动增量学习算法, 在不影响处理速度的前提下, 尽可能多的保留原始样本中的有用信息, 剔除新增样本中的无用信息, 提高分类器精度, 最后通过实验表明该算法在优化分类器效果, 提高分类器性能方面上有良好的作用。  相似文献   

13.
In today’s world of excessive development in technologies, sustainability and adaptability of computer applications is a challenge, and future prediction became significant. Therefore, strong artificial intelligence (AI) became important and, thus, statistical machine learning (ML) methods were applied to serve it. These methods are very difficult to understand, and they predict the future without showing how. However, understanding of how machines make their decision is also important, especially in information system domain. Consequently, incremental covering algorithms (CA) can be used to produce simple rules to make difficult decisions. Nevertheless, even though using simple CA as the base of strong AI agent would be a novel idea but doing so with themethods available in CA is not possible. It was found that having to accurately update the discovered rules based on new information in CA is a challenge and needs extra attention. In specific, incomplete data with missing classes is inappropriately considered, whereby the speed and data size was also a concern, and future none existing classes were neglected. Consequently, this paper will introduce a novel algorithm called RULES-IT, in order to solve the problems of incremental CA and introduce it into strong AI. This algorithm is the first incremental algorithm in its family, and CA as a whole, that transfer rules of different domains to improve the performance, generalize the induction, take advantage of past experience in different domain, and make the learner more intelligent. It is also the first to introduce intelligent aspects into incremental CA, including consciousness, subjective emotions, awareness, and adjustment. Furthermore, all decisions made can be understood due to the simple representation of repository as rules. Finally, RULES-IT performance will be benchmarked with six different methods and compared with its predecessors to see the effect of transferring rules in the learning process, and to prove how RULES-IT actually solved the shortcoming of current incremental CA in addition to its improvement in the total performance.  相似文献   

14.
马建刚  张鹏  马应龙 《计算机应用》2019,39(5):1293-1298
随着全国司法机关智能化建设的深入推进,通过信息化建设应用所积累的海量司法文书为开展司法智能服务提供了司法数据分析基础。通过司法文书的相似性分析实现类案推送,可以为司法人员提供智能辅助办案决策支持,从而提高办案的质量和效率。针对面向通用领域的文本分类方法因没有考虑特定司法领域文本的复杂结构和知识语义而导致司法文本分类的效能低问题,提出一种基于司法知识块摘要和词转移距离(WMD)的高效司法文档分类方法。首先为司法文书构建领域本体知识模型,进而基于领域本体,利用信息抽取技术获取司法文档中核心知识块摘要;然后基于司法文本的知识块摘要利用WMD进行司法文档相似度计算;最后利用K最近邻算法进行司法文本分类。以两个典型罪名的案件文档集作为实验数据,与传统的WMD文档相似度计算方法进行对比,实验结果表明,所提方法能明显提高司法文本分类的正确率(分别有5.5和9.9个百分点的提升),同时也降低了文档分类所需的时间(速度分别提升到原来的52.4和89.1倍)。  相似文献   

15.
The degree of malignancy in brain glioma is assessed based on magnetic resonance imaging (MRI) findings and clinical data before operation. These data contain irrelevant features, while uncertainties and missing values also exist. Rough set theory can deal with vagueness and uncertainty in data analysis, and can efficiently remove redundant information. In this paper, a rough set method is applied to predict the degree of malignancy. As feature selection can improve the classification accuracy effectively, rough set feature selection algorithms are employed to select features. The selected feature subsets are used to generate decision rules for the classification task. A rough set attribute reduction algorithm that employs a search method based on particle swarm optimization (PSO) is proposed in this paper and compared with other rough set reduction algorithms. Experimental results show that reducts found by the proposed algorithm are more efficient and can generate decision rules with better classification performance. The rough set rule-based method can achieve higher classification accuracy than other intelligent analysis methods such as neural networks, decision trees and a fuzzy rule extraction algorithm based on Fuzzy Min-Max Neural Networks (FRE-FMMNN). Moreover, the decision rules induced by rough set rule induction algorithm can reveal regular and interpretable patterns of the relations between glioma MRI features and the degree of malignancy, which are helpful for medical experts.  相似文献   

16.
极小极大规则学习及在决策树规则简化中的应用   总被引:3,自引:0,他引:3  
文中在粗糙集理论中的约简概念的启发下提出极小规则和极大规则的概念及极小极大规则学习。  相似文献   

17.
In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.  相似文献   

18.
Instance-Based Learning Algorithms   总被引:46,自引:1,他引:45  
Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to solve incremental learning tasks. In this paper, we describe a framework and methodology, called instance-based learning, that generates classification predictions using only specific instances. Instance-based learning algorithms do not maintain a set of abstractions derived from specific instances. This approach extends the nearest neighbor algorithm, which has large storage requirements. We describe how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy. While the storage-reducing algorithm performs well on several real-world databases, its performance degrades rapidly with the level of attribute noise in training instances. Therefore, we extended it with a significance test to distinguish noisy instances. This extended algorithm's performance degrades gracefully with increasing noise levels and compares favorably with a noise-tolerant decision tree algorithm.  相似文献   

19.
Decision trees have been widely used in data mining and machine learning as a comprehensible knowledge representation. While ant colony optimization (ACO) algorithms have been successfully applied to extract classification rules, decision tree induction with ACO algorithms remains an almost unexplored research area. In this paper we propose a novel ACO algorithm to induce decision trees, combining commonly used strategies from both traditional decision tree induction algorithms and ACO. The proposed algorithm is compared against three decision tree induction algorithms, namely C4.5, CART and cACDT, in 22 publicly available data sets. The results show that the predictive accuracy of the proposed algorithm is statistically significantly higher than the accuracy of both C4.5 and CART, which are well-known conventional algorithms for decision tree induction, and the accuracy of the ACO-based cACDT decision tree algorithm.  相似文献   

20.
The most extended way of acquiring information for knowledge based systems is to do it manually. However, the high cost of this approach and the availability of alternative Knowledge Sources has lead to an increasing use of automatic acquisition approaches. In this paper we present M-TURBIO, a Text-Based Intelligent System (TBIS) that extracts information contained in restricted-domain documents. The system acquires part of its knowledge about the structure of the documents and the way the information is presented (i.e., syntactic-semantic rules) from a training set of these. Then, a database is created by means of applying these syntactic-semantic rules to extract the information contained in the whole document.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号