首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
Most search techniques within ILP require the evaluation of a large number of inconsistent clauses. However, acceptable clauses typically need to be consistent, and are only found at the “fringe” of the search space. A search approach is presented, based on a novel algorithm called QG (Quick Generalization). QG carries out a random-restart stochastic bottom-up search which efficiently generates a consistent clause on the fringe of the refinement graph search without needing to explore the graph in detail. We use a Genetic Algorithm (GA) to evolve and re-combine clauses generated by QG. In this QG/GA setting, QG is used to seed a population of clauses processed by the GA. Experiments with QG/GA indicate that this approach can be more efficient than standard refinement-graph searches, while generating similar or better solutions. Editors: Ramon Otero, Simon Colton.  相似文献   

2.
This paper addresses the problem of scheduling parts in job shop cellular manufacturing systems by considering exceptional parts that need to visit machines in different cells and reentrant parts which need to visit some machines more than once in non-consecutive manner. Initially, an integer linear programming (ILP) model is presented for the problem to minimize the makespan, which considers intercellular moves and non-consecutive multiple processing of parts on a machine. Due to the complexity of the model, a simulated annealing (SA) based solution approach is developed to solve the problem. To increase the efficiency of the search algorithm, a neighborhood structure based on the concept of blocks is applied. Subsequently, the efficiency of the ILP model and the performance of the proposed SA are assessed over a set of problem instances taken from the literature. The proposed ILP model was coded in Lingo 8.0 and the solution obtained by the proposed SA was compared to the optimal values. The computational results demonstrate that the proposed ILP model and SA algorithm are effective and efficient for this problem.  相似文献   

3.
Inductive logic programming (ILP) algorithms are classification algorithms that construct classifiers represented as logic programs. ILP algorithms have a number of attractive features, notably the ability to make use of declarative background (user-supplied) knowledge. However, ILP algorithms deal poorly with large data sets (>104 examples) and their widespread use of the greedy set-covering algorithm renders them susceptible to local maxima in the space of logic programs.This paper presents a novel approach to address these problems based on combining the local search properties of an inductive logic programming algorithm with the global search properties of an evolutionary algorithm. The proposed algorithm may be viewed as an evolutionary wrapper around a population of ILP algorithms.The evolutionary wrapper approach is evaluated on two domains. The chess-endgame (KRK) problem is an artificial domain that is a widely used benchmark in inductive logic programming, and Part-of-Speech Tagging is a real-world problem from the field of Natural Language Processing. In the latter domain, data originates from excerpts of the Wall Street Journal. Results indicate that significant improvements in predictive accuracy can be achieved over a conventional ILP approach when data is plentiful and noisy.  相似文献   

4.
本体采用基于语法词汇的表述方式,使本体自身表示可能存在模糊性、错误理解等问题,部分本体的概念可以通过自身的上下文信息推测出其含义,但是有些本体根据已有信息不能清晰表达其概念的确切含义.针对这个问题,提出基于背景知识的本体注释方法,对本体本身进行注释和澄清.包括基于WordNet和Web搜索引擎的注释方法,利用WordNet查找本体概念的正确词义,利用Web搜索引擎搜索本体概念的snippets,分别将词义和snippets作为其属性注释到本体中.实验表明本体注释率达到99.12%,表明本文方法的是可行的,本体注释正确率达到80.76%,比同类方法更高.  相似文献   

5.
Inductive Logic Programming (ILP) studies learning from examples, within the framework provided by clausal logic. ILP has become a popular subject in the field of data mining due to its ability to discover patterns in relational domains. Several ILP-based concept discovery systems are developed which employs various search strategies, heuristics and language pattern limitations. LINUS, GOLEM, CIGOL, MIS, FOIL, PROGOL, ALEPH and WARMR are well-known ILP-based systems. In this work, firstly introductory information about ILP is given, and then the above-mentioned systems and an ILP-based concept discovery system called C2D are briefly described and the fundamentals of their mechanisms are demonstrated on a running example. Finally, a set of experimental results on real-world problems are presented in order to evaluate and compare the performance of the above-mentioned systems.  相似文献   

6.
Nearly two decades of research in the area of Inductive Logic Programming (ILP) have seen steady progress in clarifying its theoretical foundations and regular demonstrations of its applicability to complex problems in very diverse domains. These results are necessary, but not sufficient, for ILP to be adopted as a tool for data analysis in an era of very large machine-generated scientific and industrial datasets, accompanied by programs that provide ready access to complex relational information in machine-readable forms (ontologies, parsers, and so on). Besides the usual issues about the ease of use, ILP is now confronted with questions of implementation. We are concerned here with two of these, namely: can an ILP system construct models efficiently when (a) Dataset sizes are too large to fit in the memory of a single machine; and (b) Search space sizes becomes prohibitively large to explore using a single machine. In this paper, we examine the applicability to ILP of a popular distributed computing approach that provides a uniform way for performing data and task parallel computations in ILP. The MapReduce programming model allows, in principle, very large numbers of processors to be used without any special understanding of the underlying hardware or software involved. Specifically, we show how the MapReduce approach can be used to perform the coverage-test that is at the heart of many ILP systems, and to perform multiple searches required by a greedy set-covering algorithm used by some popular ILP systems. Our principal findings with synthetic and real-world datasets for both data and task parallelism are these: (a) Ignoring overheads, the time to perform the computations concurrently increases with the size of the dataset for data parallelism and with the size of the search space for task parallelism. For data parallelism this increase is roughly in proportion to increases in dataset size; (b) If a MapReduce implementation is used as part of an ILP system, then benefits for data parallelism can only be expected above some minimal dataset size, and for task parallelism can only be expected above some minimal search-space size; and (c) The MapReduce approach appears better suited to exploit data-parallelism in ILP.  相似文献   

7.
ContextThe functionality of a software system is most often expressed in terms of concepts from its problem or solution domains. The process of finding where these concepts are implemented in the source code is known as concept location and it is a prerequisite of software change.ObjectiveWe investigate a static approach to concept location named DepIR that combines program dependency search (DepS) with information retrieval-based search (IR). In this approach, programmers explore the static program dependencies of the source code components retrieved by the IR search engine.MethodThe paper presents an empirical study that compares DepIR with its constituent techniques. The evaluation is based on an empirical method of reenactment that emulates the steps of concept location for 50 past changes mined from software repositories of five software systems.ResultsThe results of the study indicate that DepIR significantly outperforms both DepS and IR.ConclusionDepIR allows developers to perform concept location efficiently. It allows finding concepts even with queries that do not rank the relevant software components highly. Since formulating a good query is not always easy, this tolerance of lower-quality queries significantly broadens the usability of DepIR compared to the traditional IR.  相似文献   

8.
Theory Revision from Examples is the process of repairing incorrect theories and/or improving incomplete theories from a set of examples. This process usually results in more accurate and comprehensible theories than purely inductive learning. However, so far, progress on the use of theory revision techniques has been limited by the large search space they yield. In this article, we argue that it is possible to reduce the search space of a theory revision system by introducing stochastic local search. More precisely, we introduce a number of stochastic local search components at the key steps of the revision process, and implement them on a state-of-the-art revision system that makes use of the most specific clause to constrain the search space. We show that with the use of these SLS techniques it is possible for the revision system to be executed in a feasible time, while still improving the initial theory and in a number of cases even reaching better accuracies than the deterministic revision process. Moreover, in some cases the revision process can be faster and still achieve better accuracies than an ILP system learning from an empty initial hypothesis or assuming an initial theory to be correct.  相似文献   

9.
The development of high-speed networking applications requires improvements to data communication protocols in order to efficiently provide the required services to the applications. In this survey paper, we will first give a rapid presentation of protocol optimization techniques and of some high-speed transport protocols developed for specific application needs.We will then present a new protocol architecture based on the “Application Level Framing” (ALF) and “Integrated Layer Processing” (ILP) concepts. ALF states that applications send data as a sequence of autonomous “frames”, which will be at the same time the unit of transmission, the unit of control and the unit of processing. This allows for efficient design of application specific protocols. It also enables ILP, i.e., the integration of multiple transmission control layers in a single loop, maximizing the efficiency of modern processors.We will present both the architectural design aspects and the corresponding implementation problems. Practical experimentations with ALF and ILP will also be described and their benefits and limitations assessed. We will also present a new approach to network architecture namely the active networks approach, and discuss its pros and cons.  相似文献   

10.
11.
Theory revision systems are designed to improve the accuracy of an initial theory, producing more accurate and comprehensible theories than purely inductive methods. Such systems search for points where examples are misclassified and modify them using revision operators. This includes trying to add antecedents to clauses usually following a top-down approach, considering all the literals of the knowledge base. Such an approach leads to a huge search space which dominates the cost of the revision process. ILP Mode Directed Inverse Entailment systems restrict the search for antecedents to the literals of the bottom clause. In this work the bottom clause and mode declarations are introduced in a first-order logic theory revision system aiming to improve the efficiency of the antecedent addition operation and, consequently, also of the whole revision process. Experimental results compared to revision system FORTE show that the revision process is on average 55 times faster, generating more comprehensible theories and still not significantly decreasing the accuracies obtained by the original revision process. Moreover, the results show that when the initial theory is approximately correct, it is more efficient to revise it than learn from scratch, obtaining significantly better accuracies. They also show that using the proposed theory revision system to induce theories from scratch is faster and generates more compact theories than when the theory is induced using a traditional ILP system, obtaining competitive accuracies. This is an extended and revised version of the ILP 2008 paper (Duboc et al. 2008).  相似文献   

12.
Inductive logic programming (ILP) induces concepts from a set of positive examples, a set of negative examples, and background knowledge. ILP has been applied on tasks such as natural language processing, finite element mesh design, network mining, robotics, and drug discovery. These data sets usually contain numerical and multivalued categorical attributes; however, only a few relational learning systems are capable of handling them in an efficient way. In this paper, we present an evolutionary approach, called Grouping and Discretization for Enriching the Background Knowledge (GDEBaK), to deal with numerical and multivalued categorical attributes in ILP. This method uses evolutionary operators to create and test numerical splits and subsets of categorical values in accordance with a fitness function. The best subintervals and subsets are added to the background knowledge before constructing candidate hypotheses. We implemented GDEBaK embedded in Aleph and compared it to lazy discretization in Aleph and discretization in Top‐down Induction of Logical Decision Trees (TILDE) systems. The results obtained showed that our method improves accuracy and reduces the number of rules in most cases. Finally, we discuss these results and possible lines for future work.  相似文献   

13.
In this paper, we propose an efficient and novel Lagrangian relaxation method which incorporates a new integer linear programming (ILP) formulation to optimally partition a giant tour in the context of a capacitated vehicle routing problem (CVRP). This approach, which we call Lagrangian split (Ls), is more versatile than the ILP which, in most cases, can be intractable using a conventional solver. An effective repair mechanism followed by a local search are also embedded into the process. The mathematical validity of the repair mechanism and its time complexity are also provided. An integration of Ls into a powerful variable neighbourhood search (VNS) is also presented. Computational experiments are conducted to demonstrate that Ls provides encouraging results when applied on benchmark instances and that the integration of Ls into a metaheuristic scheme produces good results when compared to those found by state-of-the-art methods.  相似文献   

14.
In this article, we focus on solving the power dominating set problem and its connected version. These problems are frequently used for finding optimal placements of phasor measurement units in power systems. We present an improved integer linear program (ILP) for both problems. In addition, a greedy constructive algorithm and a local search are developed. A greedy randomised adaptive search procedure (GRASP) algorithm is created to find near optimal solutions for large scale problem instances. The performance of the GRASP is further enhanced by extending it to the novel fixed set search (FSS) metaheuristic. Our computational results show that the proposed ILP has a significantly lower computational cost than existing ILPs for both versions of the problem. The proposed FSS algorithm manages to find all the optimal solutions that have been acquired using the ILP. In the last group of tests, it is shown that the FSS can significantly outperform the GRASP in both solution quality and computational cost.  相似文献   

15.
In contrast to traditional multi-objective problems the concept-based version of such problems involves sets of particular solutions, which represent predefined conceptual solutions. This paper addresses the concept-based multi-objective problem by proposing two novel multi objective evolutionary algorithms. It also compares two major search approaches.The suggested algorithms deal with resource sharing among concepts, and within each concept, while simultaneously evolving concepts towards a Pareto front by way of their representing sets. The introduced algorithms, which use a simultaneous search approach, are compared with a sequential one. For this purpose concept-based performance indicators are suggested and used. The comparison study includes both the computational time and the quality of the concept-based front representation. Finally, the effect on the computational time of both the concept fitness evaluation time and concept optimality, for both the sequential and simultaneous approaches, is highlighted.  相似文献   

16.
To date, Inductive Logic Programming (ILP) systems have largely assumed that all data needed for learning have been provided at the onset of model construction. Increasingly, for application areas like telecommunications, astronomy, text processing, financial markets and biology, machine-generated data are being generated continuously and on a vast scale. We see at least four kinds of problems that this presents for ILP: (1) it may not be possible to store all of the data, even in secondary memory; (2) even if it were possible to store the data, it may be impractical to construct an acceptable model using partitioning techniques that repeatedly perform expensive coverage or subsumption-tests on the data; (3) models constructed at some point may become less effective, or even invalid, as more data become available (exemplified by the “drift” problem when identifying concepts); and (4) the representation of the data instances may need to change as more data become available (a kind of “language drift” problem). In this paper, we investigate the adoption of a stream-based on-line learning approach to relational data. Specifically, we examine the representation of relational data in both an infinite-attribute setting, and in the usual fixed-attribute setting, and develop implementations that use ILP engines in combination with on-line model-constructors. The behaviour of each program is investigated using a set of controlled experiments, and performance in practical settings is demonstrated by constructing complete theories for some of the largest biochemical datasets examined by ILP systems to date, including one with a million examples; to the best of our knowledge, the first time this has been empirically demonstrated with ILP on a real-world data set.  相似文献   

17.
邢双双  刘名威  彭鑫 《软件学报》2022,33(11):4027-4045
开源及企业软件项目和各类软件开发网站上的代码片段是重要的软件开发资源.然而,很多开发者代码搜索需求反映的代码的高层意图和主题难以通过基于代码文本的信息检索技术来实现精准的代码搜索.因此,反映代码整体意图和主题的语义标签对于改进代码搜索、辅助代码理解都具有十分重要的作用.现有的标签生成技术主要面向文本内容或依赖于历史数据,无法满足大范围代码语义标注和辅助搜索、理解的需要.针对这一问题,提出了一种基于知识图谱的代码语义标签自动生成方法KGCodeTagger.该方法通过基于API文档和软件开发问答文本的概念和关系抽取构造软件知识图谱,作为代码语义标签生成的基础.针对给定的代码,该方法识别并抽取出通用API调用或概念提及,并链接到软件知识图谱中的相关概念上.在此基础上,该方法进一步识别与所链接的概念相关的其他概念作为候选,然后按照多样性和代表性排序,产生最终的代码语义标签.通过实验对KGCodeTagger软件知识图谱构建的各个步骤进行了评估,并通过与几个已有的基准方法的比较,对所生成的代码语义标签质量进行了评估.实验结果表明,KGCodeTagger的软件知识图谱构建步骤是合理有效的,该方法所生成的代码语义标签是高质量、有意义的,能够帮助开发人员快速理解代码的意图.  相似文献   

18.
Searching the hypothesis space bounded below by a bottom clause is the basis of several state-of-the-art ILP systems (e.g. Progol, Aleph). These systems use refinement operators together with search heuristics to explore a bounded hypothesis space. It is known that the search space of these systems is limited to a sub-graph of the general subsumption lattice. However, the structure and properties of this sub-graph have not been properly characterised. In this paper firstly, we characterise the hypothesis space considered by the ILP systems which use a bottom clause to constrain the search. In particular, we discuss refinement in Progol as a representative of these ILP systems. Secondly, we study the lattice structure of this bounded hypothesis space. Thirdly, we give a new analysis of refinement operators, least generalisation and greatest specialisation in the subsumption order relative to a bottom clause. The results of this study are important for better understanding of the constrained refinement space of ILP systems such as Progol and Aleph, which proved to be successful for solving real-world problems (despite being incomplete with respect to the general subsumption order). Moreover, characterising this refinement sub-lattice can lead to more efficient ILP algorithms and operators for searching this particular sub-lattice. For example, it is shown that, unlike for the general subsumption order, efficient least generalisation operators can be designed for the subsumption order relative to a bottom clause.  相似文献   

19.
谭喆  胡学钢 《计算机应用》2009,29(5):1409-1411
现有的概念格并行/分布式构造算法在处理较大规模数据时,需要搜索大量不相关概念,降低了算法性能。为此,提出了一种基于索引的概念格分布式构造方法——LCBI,插入新概念时先利用索引快速找出新概念的极大相关概念,再对所有极大相关概念的子概念进行自顶向下地并行搜索以找出它们的交叉子概念,从而减少了搜索范围。理论分析和实验表明,在处理大规模稠密数据时,LCBI比其他分布式算法具有较明显的优势。  相似文献   

20.
In the case of network malfunction a network with restoration capability requires spare capacity to be used. Optimization of the spare capacity in this case is to find the minimum amount of spare capacity for the network to survive from network component failures. In this paper, the optimization of the spare capacity problem is investigated for the wavelength division multiplexing (WDM) mesh networks without wavelength conversion. To minimize the spare capacity, we will optimize both the routing and the wavelength assignment. This combinatorial problem is usually called the routing and wavelength assignment (RWA) problem and it is well known to be NP-hard. We give an integer linear programming (ILP) formulation for the problem. Due to the excessive run-times of the ILP, we propose a hybrid genetic algorithm approach (GA) for the problem. For benchmarking purpose, simulated annealing (SA) and Tabu search (TS) are also applied to this problem. To validate the effectiveness of the proposed method, the approach is applied to the China network, which has a more complicated network topology. Simulation results are very favorable to the GA approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号