首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
Genetic process mining: an experimental evaluation   总被引:4,自引:0,他引:4  
One of the aims of process mining is to retrieve a process model from an event log. The discovered models can be used as objective starting points during the deployment of process-aware information systems (Dumas et al., eds., Process-Aware Information Systems: Bridging People and Software Through Process Technology. Wiley, New York, 2005) and/or as a feedback mechanism to check prescribed models against enacted ones. However, current techniques have problems when mining processes that contain non-trivial constructs and/or when dealing with the presence of noise in the logs. Most of the problems happen because many current techniques are based on local information in the event log. To overcome these problems, we try to use genetic algorithms to mine process models. The main motivation is to benefit from the global search performed by this kind of algorithms. The non-trivial constructs are tackled by choosing an internal representation that supports them. The problem of noise is naturally tackled by the genetic algorithm because, per definition, these algorithms are robust to noise. The main challenge in a genetic approach is the definition of a good fitness measure because it guides the global search performed by the genetic algorithm. This paper explains how the genetic algorithm works. Experiments with synthetic and real-life logs show that the fitness measure indeed leads to the mining of process models that are complete (can reproduce all the behavior in the log) and precise (do not allow for extra behavior that cannot be derived from the event log). The genetic algorithm is implemented as a plug-in in the ProM framework.  相似文献   

2.
Process mining can be seen as the “missing link” between data mining and business process management. The lion's share of process mining research has been devoted to the discovery of procedural process models from event logs. However, often there are predefined constraints that (partially) describe the normative or expected process, e.g., “activity A should be followed by B” or “activities A and B should never be both executed”. A collection of such constraints is called a declarative process model. Although it is possible to discover such models based on event data, this paper focuses on aligning event logs and predefined declarative process models. Discrepancies between log and model are mediated such that observed log traces are related to paths in the model. The resulting alignments provide sophisticated diagnostics that pinpoint where deviations occur and how severe they are. Moreover, selected parts of the declarative process model can be used to clean and repair the event log before applying other process mining techniques. Our alignment-based approach for preprocessing and conformance checking using declarative process models has been implemented in ProM and has been evaluated using both synthetic logs and real-life logs from a Dutch hospital.  相似文献   

3.
Process mining techniques aim at extracting knowledge from event logs. One of the most important tasks in process mining is process model discovery. In discovering process models, an algorithm is designed to build a process model from a given event log. In this paper, a new model to discover process models has been proposed. A combination of Genetic Algorithm and Simulated Annealing has been used in this model. Genetic Algorithms has previously been used in this context. Previous approaches had drawbacks in fitness evaluation that misguided the algorithm. Another problem was that the quality of the candidates, in the population, was low such that it reduced the chance of finding a perfect answer. In this paper, a new fitness measure has been proposed to evaluate process models based on event logs. Moreover SA has been used to improve the quality of candidates in the population. It has been demonstrated that the proposed model outperformed in terms of rediscovering process models, compared to other approaches which are proposed in the literature, which was the result of better fitness evaluation and increased quality of individuals,. It came to conclusion that using GA and SA in combination with each other can be effective in this context.  相似文献   

4.
The practical relevance of process mining is increasing as more and more event data become available. Process mining techniques aim to discover, monitor and improve real processes by extracting knowledge from event logs. The two most prominent process mining tasks are: (i) process discovery: learning a process model from example behavior recorded in an event log, and (ii) conformance checking: diagnosing and quantifying discrepancies between observed behavior and modeled behavior. The increasing volume of event data provides both opportunities and challenges for process mining. Existing process mining techniques have problems dealing with large event logs referring to many different activities. Therefore, we propose a generic approach to decompose process mining problems. The decomposition approach is generic and can be combined with different existing process discovery and conformance checking techniques. It is possible to split computationally challenging process mining problems into many smaller problems that can be analyzed easily and whose results can be combined into solutions for the original problems.  相似文献   

5.
The aim of process mining is to discover the process model from the event log which is recorded by the information system. Typical steps of process mining algorithm can be described as: (1) generating event traces from event log, (2) analyzing event traces and obtaining ordering relations of tasks, (3) generating process model with ordering relations of tasks. The first two steps could be very time consuming involving millions of events and thousands of event traces. This paper presents a novel algorithm (λ-algorithm) which almost eliminates these two steps in generating event traces from event log and analyzing event traces so as to reduce the performance of process mining algorithm. Firstly, we retrieve the event multiset (input data of algorithm marked as MS) which records the frequency of each event but ignores their orders when extracted from event logs. The event in event multiset contains the information of post-activities. Secondly, we obtain ordering relations from event multiset. The ordering relations contain causal dependency, potential parallelism and non-potential parallelism. Finally, we discover a process models with ordering relations. The complexity of λ-algorithm is only bound up with the event classes (the set of events in event logs) that has significantly improved the performance of existing process mining algorithms and is expected to be more practical in real-world process mining based on event logs, as well as being able to detect SWF-nets, short-loops and most of implicit dependency (generated by non-free choice constructions).  相似文献   

6.
Process mining aims at deriving order relations between tasks recorded by event logs in order to construct their corresponding process models. The quality of the results is not only determined by the mining algorithm being used, but also by the quality of the provided event logs. As a criterion of log quality, completeness measures the magnitude of information for process mining covered by an event log. In this paper, we focus on the evaluation of the local completeness of an event log. In particular, we consider the direct succession (DS) relations between the tasks of a business process. Based on our previous work, an improved approach called CPL+ is proposed in this paper. Experiments show that the proposed CPL+ works better than other approaches, on event logs that contain a small amount of traces. Finally, by further investigating CPL+, we also found that the more distinct DSs observed in an event log, the lower the local completeness of the log is.  相似文献   

7.
徐杨  袁峰  林琪  汤德佑  李东 《软件学报》2018,29(2):396-416
流程挖掘是流程管理和数据挖掘交叉领域中的一个研究热点.在实际业务环境中,流程执行的数据往往分散记录到不同的事件日志中,需要将这些事件日志融合成为单一事件日志文件,才能应用当前基于单一事件日志的流程挖掘技术.然而,由于流程日志间存在着执行实例的多对多匹配关系、融合所需信息可能缺失等问题,导致事件日志融合问题具有较高挑战性.本文对事件日志融合问题进行了形式化定义,指出该问题是一个搜索优化问题,并提出了一种基于混合人工免疫算法的事件日志融合方法:以启发式方法生成初始种群,人工免疫系统的克隆选择理论基础,通过免疫进化获得“最佳”的融合解,从而支持包含多对多的实例匹配关系的日志融合;考虑两个实例级别的因素:流程执行路径出现的频次和流程实例间的时间匹配关系,分别从“量”匹配和“时间”匹配两个维度来评价进化中的个体;通过设置免疫记忆库、引入模拟退火机制,保证新一代种群的多样性,减少进化早熟几率.实验结果表明,本文的方法能够实现多对多的实例匹配关系的事件日志融合的目标,相比随机方法生成初始种群,启发式方法能加快免疫进化的速度.文中还针对利用分布式技术提高事件日志融合性能,探讨了大规模事件日志的分布式融合中的数据划问题.  相似文献   

8.
9.
从运行日志挖掘业务流程模型的流程挖掘方法研究方兴未艾,然而,复杂多变的运行环境使流程日志也不可避免地呈现出多样性.传统的流程挖掘算法各有其适用对象,因此,如何挑选适合多样性流程日志的流程挖掘算法成为了一项挑战.提出一种适用于多样性环境的业务流程挖掘方法 So Fi(survival of fittest integrator).该方法基于领域知识对日志进行分类,使用多种现有的挖掘算法对每一类子日志产生一组流程模型作为遗传算法的初始种群,借助遗传算法的优化能力,从中整合得到高质量的业务流程模型.针对模拟日志和某通信公司真实日志的实验结果表明:相对于任何单一的挖掘算法,So Fi产生的流程模型具有更高的综合质量,即重现度、精确度、通用性和简单性.  相似文献   

10.
As a newly-developed information exchange and management platform, Building Information Modeling (BIM) is altering the way of collaboration among multi-engineers for civil engineering projects. During the BIM implementation, a large number of event logs are automatically generated and accumulated to record details of the model evolution. For knowledge discovery from huge logs, a novel BIM event log mining approach based on the dynamic social network analysis is presented to examine designers’ performance objectively, which has been verified in BIM event logs about an ongoing year-long design project. Relying on meaningful information extracted from time-stamped logs, networks on the monthly interval are built to graphically represent information and knowledge sharing among designers. Special emphasis is put on measuring designers’ influence by a defined new metric called “impact score”, which combines the k-shell method and 1-step neighbors to achieve comparatively low computational cost and high accurate ranking. Besides, an emerging machine learning algorithm named CatBoost is utilized to predict designers’ influence intelligently by learning features from both network structure and human behavior. It has been found that twelve networks can be easily distinguished into two collaborative patterns, whose characteristics in both network structures and designers’ behaviors are significantly different. The most influential designers are similar within the same group but varied from different groups. Extensive analytical results confirm that the method can potentially serve as month-by-month feedback to monitor the complex modeling process, which further supports managers to realize data-driven decision making for better leadership and work plan towards an optimized collaborative design.  相似文献   

11.
An automated process discovery technique generates a process model from an event log recording the execution of a business process. For it to be useful, the generated process model should be as simple as possible, while accurately capturing the behavior recorded in, and implied by, the event log. Most existing automated process discovery techniques generate flat process models. When confronted to large event logs, these approaches lead to overly complex or inaccurate process models. An alternative is to apply a divide-and-conquer approach by decomposing the process into stages and discovering one model per stage. It turns out, however, that existing divide-and-conquer process discovery approaches often produce less accurate models than flat discovery techniques, when applied to real-life event logs. This article proposes an automated method to identify business process stages from an event log and an automated technique to discover process models based on a given stage-based process decomposition. An experimental evaluation shows that: (i) relative to existing automated process decomposition methods in the field of process mining, the proposed method leads to stage-based decompositions that are closer to decompositions derived by human experts; and (ii) the proposed stage-based process discovery technique outperforms existing flat and divide-and-conquer discovery techniques with respect to well-accepted measures of accuracy and achieves comparable results in terms of model complexity.  相似文献   

12.
Process mining techniques relate observed behavior (i.e., event logs) to modeled behavior (e.g., a BPMN model or a Petri net). Process models can be discovered from event logs and conformance checking techniques can be used to detect and diagnose differences between observed and modeled behavior. Existing process mining techniques can only uncover these differences, but the actual repair of the model is left to the user and is not supported. In this paper we investigate the problem of repairing a process model w.r.t. a log such that the resulting model can replay the log (i.e., conforms to it) and is as similar as possible to the original model. To solve the problem, we use an existing conformance checker that aligns the runs of the given process model to the traces in the log. Based on this information, we decompose the log into several sublogs of non-fitting subtraces. For each sublog, either a loop is discovered that can replay the sublog or a subprocess is derived that is then added to the original model at the appropriate location. The approach is implemented in the process mining toolkit ProM and has been validated on logs and models from several Dutch municipalities.  相似文献   

13.
业务流程挖掘旨在从记录的事件日志中挖掘出满足人们需求的流程模型。以往的方法多是根据事件之间的直接依赖关系建立流程模型,具有一定的局限性,提出了基于拟间接依赖的流程挖掘优化分析方法。依据事件日志,以行为轮廓为基础,构建初始模型。在执行日志下,通过基于整数线性规划流程发现算法的基本约束体查找出具有拟间接依赖关系的变迁对,并对模型进行完善,挖掘出优化模型。通过具体的实例分析验证了该方法的有效性。  相似文献   

14.
在业务过程发现的一致性检测中,现有事件日志与过程模型的多视角对齐方法一次只能获得一条迹与过程模型的最优对齐;并且最优对齐求解中的启发函数计算复杂,以致最优对齐的计算效率较低。为此,提出一种基于迹最小编辑距离的、事件日志的批量迹与过程模型的多视角对齐方法。首先选取事件日志中的多条迹组成批量迹,使用过程挖掘算法得到批量迹的日志模型;进而获取日志模型与过程模型的乘积模型及其变迁系统,即为批量迹的搜索空间;然后设计基于Petri网变迁序列集合与剩余迹的最小编辑距离的启发函数来加快A*算法;最后设计可调节数据和资源视角所占权重的多视角代价函数,在乘积模型的变迁系统上提出批量迹中每条迹与过程模型的多视角最优对齐方法。仿真实验结果表明,相比已有工作,在计算批量迹与过程模型间的多视角对齐时,所提方法占用更少的内存空间和使用更少的运行时间。该方法提高了最优对齐的启发函数计算速度,可以一次获得批量迹的所有最优对齐,进而提高了事件日志与过程模型的多视角对齐效率。  相似文献   

15.
Considerable amounts of data, including process events, are collected and stored by organisations nowadays. Discovering a process model from such event data and verification of the quality of discovered models are important steps in process mining. Many discovery techniques have been proposed, but none of them combines scalability with strong quality guarantees. We would like such techniques to handle billions of events or thousands of activities, to produce sound models (without deadlocks and other anomalies), and to guarantee that the underlying process can be rediscovered when sufficient information is available. In this paper, we introduce a framework for process discovery that ensures these properties while passing over the log only once and introduce three algorithms using the framework. To measure the quality of discovered models for such large logs, we introduce a model–model and model–log comparison framework that applies a divide-and-conquer strategy to measure recall, fitness, and precision. We experimentally show that these discovery and measuring techniques sacrifice little compared to other algorithms, while gaining the ability to cope with event logs of 100,000,000 traces and processes of 10,000 activities on a standard computer.  相似文献   

16.
Process mining is the research domain that is dedicated to the a posteriori analysis of business process executions. The techniques developed within this research area are specifically designed to provide profound insight by exploiting the untapped reservoir of knowledge that resides within event logs of information systems. Process discovery is one specific subdomain of process mining that entails the discovery of control-flow models from such event logs. Assessing the quality of discovered process models is an essential element, both for conducting process mining research as well as for the use of process mining in practice. In this paper, a multi-dimensional quality assessment is presented in order to comprehensively evaluate process discovery techniques. In contrast to previous studies, the major contribution of this paper is the use of eight real-life event logs. For instance, we show that evaluation based on real-life event logs significantly differs from the traditional approach to assess process discovery techniques using artificial event logs. In addition, we provide an extensive overview of available process discovery techniques and we describe how discovered process models can be assessed regarding both accuracy and comprehensibility. The results of our study indicate that the HeuristicsMiner algorithm is especially suited in a real-life setting. However, it is also shown that, particularly for highly complex event logs, knowledge discovery from such data sets can become a major problem for traditional process discovery techniques.  相似文献   

17.
Process mining is a tool to extract non-trivial and useful information from process execution logs. These so-called event logs (also called audit trails, or transaction logs) are the starting point for various discovery and analysis techniques that help to gain insight into certain characteristics of the process. In this paper we use a combination of process mining techniques to discover multiple perspectives (namely, the control-flow, data, performance, and resource perspective) of the process from historic data, and we integrate them into a comprehensive simulation model. This simulation model is represented as a colored Petri net (CPN) and can be used to analyze the process, e.g., evaluate the performance of different alternative designs. The discovery of simulation models is explained using a running example. Moreover, the approach has been applied in two case studies; the workflows in two different municipalities in the Netherlands have been analyzed using a combination of process mining and simulation. Furthermore, the quality of the CPN models generated for the running example and the two case studies has been evaluated by comparing the original logs with the logs of the generated models.  相似文献   

18.
Over the past decade process mining has emerged as a new analytical discipline able to answer a variety of questions based on event data. Event logs have a very particular structure; events have timestamps, refer to activities and resources, and need to be correlated to form process instances. Process mining results tend to be very different from classical data mining results, e.g., process discovery may yield end-to-end process models capturing different perspectives rather than decision trees or frequent patterns. A process-mining tool like ProM provides hundreds of different process mining techniques ranging from discovery and conformance checking to filtering and prediction. Typically, a combination of techniques is needed and, for every step, there are different techniques that may be very sensitive to parameter settings. Moreover, event logs may be huge and may need to be decomposed and distributed for analysis. These aspects make it very cumbersome to analyze event logs manually. Process mining should be repeatable and automated. Therefore, we propose a framework to support the analysis of process mining workflows. Existing scientific workflow systems and data mining tools are not tailored towards process mining and the artifacts used for analysis (process models and event logs). This paper structures the basic building blocks needed for process mining and describes various analysis scenarios. Based on these requirements we implemented RapidProM, a tool supporting scientific workflows for process mining. Examples illustrating the different scenarios are provided to show the feasibility of the approach.  相似文献   

19.
In this paper, we introduce a probabilistic modeling approach for addressing the problem of Web robot detection from Web-server access logs. More specifically, we construct a Bayesian network that classifies automatically access log sessions as being crawler- or human-induced, by combining various pieces of evidence proven to characterize crawler and human behavior. Our approach uses an adaptive-threshold technique to extract Web sessions from access logs. Then, we apply machine learning techniques to determine the parameters of the probabilistic model. The resulting classification is based on the maximum posterior probability of all classes given the available evidence. We apply our method to real Web-server logs and obtain results that demonstrate the robustness and effectiveness of probabilistic reasoning for crawler detection.  相似文献   

20.

The problem of automatically discovering business process models from event logs has been intensely investigated in the past two decades, leading to a wide range of approaches that strike various trade-offs between accuracy, model complexity, and execution time. A few studies have suggested that the accuracy of automated process discovery approaches can be enhanced by means of metaheuristic optimization techniques. However, these studies have remained at the level of proposals without validation on real-life datasets or they have only considered one metaheuristic in isolation. This article presents a metaheuristic optimization framework for automated process discovery. The key idea of the framework is to construct a directly-follows graph (DFG) from the event log, to perturb this DFG so as to generate new candidate solutions, and to apply a DFG-based automated process discovery approach in order to derive a process model from each DFG. The framework can be instantiated by linking it to an automated process discovery approach, an optimization metaheuristic, and the quality measure to be optimized (e.g., fitness, precision, F-score). The article considers several instantiations of the framework corresponding to four optimization metaheuristics, three automated process discovery approaches (Inductive Miner—directly-follows, Fodina, and Split Miner), and one accuracy measure (Markovian F-score). These framework instances are compared using a set of 20 real-life event logs. The evaluation shows that metaheuristic optimization consistently yields visible improvements in F-score for all the three automated process discovery approaches, at the cost of execution times in the order of minutes, versus seconds for the baseline approaches.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号