Repairing Outlier Behaviour in Event Logs using Contextual Behaviour
Keywords:Process Mining, Data Cleansing, Log Repair, Event Log Preprocessing, Outlier Detection, Conditional Probability
It is common in practice, e. g., due to logging errors in information systems or the presence of exeptional process behavior, to have outlier behavior in real event data. Such behavior often leads to incomprehensible, complex, and inaccurate analysis results and makes correct and/or important behavior undetectable. In this paper, we propose a novel data preprocessing method, that detects and subsequently repairs outlier behavior in event data. We propose a probabilistic method that detects outlier behavior based on the occurrence probability of a sequence of activities among its surronding contextual behavior. We replace the outlier behavior with more probable behavior among that behavioral context. Our approach allows to remove outlier behavior, which enables us to obtain a more global view of the process. The proposed method has been implemented in both the prom- and the rapidprom frameworks. Using these implementations, we conducted several experiments that show that most types of outlier behavior in event data are detectable and repairable via the proposed method. The evaluation clearly demonstrates that we are able to improve process discovery results by repairing event logs upfront. Results show that using the proposed method we obtain more understandable process models with higher accuracy.
Authors who publish with this journal agree to the following terms: Authors retain copyright and grant the journal 'Enterprise Modelling and Information Systems Architectures - International Journal of Conceptual Modeling' and the Gesellschaft für Informatik e.V. (GI) the permission of first publication, and the non-exclusive, irrevocable and non-time limited publication permission for the submitted work including the permissions to store, copy, distribute and reproduce their work in printed and electronic form for the duration of the legal copyright. This includes the right of translation. Authors grant the journal 'Enterprise Modelling and Information Systems Architectures - International Journal of Conceptual Modeling' and the Gesellschaft für Informatik e.V. (GI) the permission to license their work under a Creative Commons BY-SA 4.0 license that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book) given an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access). The submitting corresponding author on behalf of all co-authors asserts that she/he is entitled to the granting of the above mentioned permissions for the submitted work.