Goliath National Bank

In this third assignment we work on real-life event logs taken from a Dutch financial institution. The logs contains some 262.200 events in 13.087 cases. Apart from some anonymization, the logs contain all data as it came from the financial institute. The process represented in the event logs is an application process for a personal loan or overdraft within a global financing organization. Your task is to analyze, visualize and mine the loan application process using ProM.

Assignment format

This is Assignment 3 of the Data Science and Process Modelling course taught at Leiden University.

For each part of the assignment, the number of points awarded for a 100% perfect answer is listed between brackets and sums to a total of 100 points. You should answer each question as precisely as possible; not addressing parts of the question means that fewer points are awarded. Your assignment grade (between 1 and 10, bounds included) is computed by dividing your number of points by 10 and rounding it to the nearest half. If you get an insufficient grade for the assignment, you can retake the assignment by meeting the assignment retake deadline. Please do not be late with handing in your work. If you are late with handing in your work, it means that you failed the assignment and that you are automatically using the retake deadline for the assignment. Retake assignment grades have 2 points subtracted from the total. You are allowed to work in teams consisting of exactly two people. For each question, clearly describe how you obtained your answer, and write down any non-trivial assumptions. All practical exercises can be done on the student workstations. Be sure to hand in digitally:

  • Your final assignment report (in PDF, generated using LaTeX)

Questions or remarks? Preferably ask them during one of the weekly lectures or lab sessions. In case of urgent questions outside these hours, contact one of the course assistants via e-mail, or ask the lecturer in person.


Warning: Before getting started with this assignment, it is highly recommended to:

  • Read van der Aalst Chapter 11
  • Do the ProM Getting Started Tutorial
  • Walk through the Exercises, focussing on the part on the discovery of Petri nets (and not so much on the other models and techniques).

To get ProM running under UNIX, edit and change JAVA=java to JAVA=/usr/lib/jvm/java-1.7.0-openjdk-amd64/jre/bin/java

Data for this assignment

The datafile can be found here: BPI_Challenge_2012.xes.gz (unzip and import this file in ProM). The amount (size of the loan) requested by the customer is indicated in the case attribute AMOUNT_REQ, which is global, i.e. every case contains this attribute. The event log is a merger of three intertwined sub processes. The first letter of each task name identifies from which sub process (source) it originated from. Feel free to run analyses on the process as a whole, on selections of the whole process and/or the individual sub processes. Event types are explained in the table below.

Informal process description: An application is submitted through a webpage. Then, some automatic checks are performed, after which the application is complemented with additional information. This information is obtained trough contacting the customer by phone. If an applicant is eligible, an offer is sent to the client by mail. After this offer is received back, it is assessed. When it is incomplete, missing information is added by again contacting the customer. Then a final assessment is done, after which the application is approved and activated.

Event Type Meaning
States starting with ‘A_’ States of the application
States starting with ‘O_’ States of the offer belonging to the application
States starting with ‘W_’ States of the work item belonging to the application
COMPLETE The task (of type ‘A_’ or ‘O_’) is completed
SCHEDULE The work item (of type ‘W_’) is created in the queue (automatic step following manual actions)
START The work item (of type ‘W_’) is obtained by the resource
COMPLETE The work item (of type ‘W_’) is released by the resource and put back in the queue or transferred to another queue (SCHEDULE)

Your Assignment

The goal of the assignment is to become familiar with a tool such as ProM to analyze business processes based on event data. You will need to write a report on your activities, addressing both technical (Process Mining) and domain-specific (Process Analysis) aspects:

Process Mining [60p]
  1. Analyze the event logs using ProM in at least four different ways:

    1. [10p] the plain "View" tab
    2. [10p] a Dotted Chart Analysis
    3. [10p] the Alpha Miner (Petri Nets)
    4. [10p] a miner of your choice (e.g., the Fuzzy Miner)

    For each technique, report the most important findings and results. Which steps did you take in the ProM tool to obtain the desired results, and why? What settings and parameters did you tune? Please do include plenty of screenshots and diagrams.

  2. [20p] Explain the differences between the different analysis techniques in terms of what information they conceptually provide, and how they work in practice on larger datasets such as the one provided in this assignment.

Note that as opposed to Assignment 1, the goal is not to visualize all data, but rather to show useful information extracted from the data relevant to answering the questions posed in this assignment.

Process Analysis [40p]

The bank is interested in all valuable information hidden in the event data. The main question is: what does the process model look like, and what can we learn from it?

At least try to answer the questions below, and explain how you obtained the answer (half of the points). Make sure to relate each finding to the financial domain (other half of the points), explaining how the insights may improve the bank's business processes:

  • [5p+5p] What is the average/ minimum/ maximum throughput time of cases?
  • [10p+10p] Which paths take too much time on average? How many cases follow these routings? What are the critical sub-paths for these paths?
  • [5p+5p] Are there any non-intentional dead-ends? Are customers getting stuck somewhere in the process?

For each question, plot relevant values, averages and distributions that support your findings.
Remember to always provide an answer which is based on the data, and always explain how you obtained your answer.

Optional: convert the .xes-data into .csv and also experiment with exploring this data using EventPad (up to 10p bonus points).

Good luck with the assignment! Ask questions. A lot, if you have to. The deadline is posted on the course website.

Full credits for this exercise and the data goes to the BPI Challlenge 2012 held at the 8th International Workshop on Business Process Intelligence.