liacs.leidenuniv.nl/~takesfw/BIPM/assignment1.html


Introduction

CookieDestroyer

Congratulations! For the next 3 weeks you are employed as all-round Business Intelligence expert of the largest Leiden-based gaming business "Thunderstorm Entertainment". One of their games is CookieDestroyer, a popular game played by people of all ages and genders worldwide, primarily making money from selling in-game "coins" that allow the gamer to progress in the game more quickly. Your job is to provide the company's management with information on the status, growth and trends with respect to the monetization of CookieDestroyer. To do this, you will use data from the company's sales logs, containing all purchases of coins made by the customers. Ultimately, your data analysis, visualization, interpretation and tools should allow company management to make decisions on future steps to expand the game.


Assignment format

This is Assignment 1 of the Business Intelligence and Process Modelling course taught at Leiden University in the spring semester of 2018.

For each part of the assignment, the number of points awarded for a 100% perfect answer is listed between brackets and sums to a total of 100 points. Your assignment grade (between 1 and 10, bounds included) is computed by dividing your number of points by 10 and rounding it to the nearest half. If you get an insufficient grade for the assignment, you can retake the assignment by meeting the assignment retake deadline. Please do not be late with handing in your work. If you are late with handing in your work, it means that you failed the assignment and that you are automatically using the retake deadline for the assignment. You are allowed to work in teams consisting of exactly two people. For each question, clearly describe how you obtained your answer, and write down any non-trivial assumptions. All practical exercises can be done on the student workstations. Be sure to hand in digitally:

  1. Your final assignment report (in PDF, generated using LaTeX)
  2. Your Dashboard URL
  3. All relevant server-side source code in one zip-file or tarball

Questions or remarks? Preferably ask them during one of the weekly lectures or lab sessions. In case of urgent questions outside these hours, contact one of the course assistants via e-mail, or ask the lecturer in person.


Assignment contents

Assignment goal

The goal of the assignment is to:

  1. Create a web-based dashboard which can be used by management to understand and interpret the sales data of the CookieDestroyer game.
  2. Answer a few strategic BI questions on the game using the data and/or the dashboard in a short assignment report.

The data

The data for this assignment comes in three files: sales, methods and countries. The main data table to be studied is sales, which contains a few hundred thousand sales records spanning a time period from 2010 to 2015. It has the following attributes:

  • saleId - the sale unique identifier
  • saleDateTime - date and time
  • accountName - name of the user, anonymized using one-way collision-free hashing
  • coins - the number of game coins given
  • currency - the currency of the user
  • priceInCurrency - the price in the user's currency
  • priceInEUR - the price converted to EUR (which is all the pricing information you need for this assignment)
  • methodId - unique ID of the payment method used for this sale
  • ip - the IP (again anonymized)
  • ipCountry - the ISO2 country code of the country corresponding to that IP

Furthermore, there are tables methods and countries which map the methodId and ipCountry fields in the sales table to (anonymized) payment method names and country names. Each table is available in .sql format and in .csv format. The files can be found here:

sales .csv .sql
methods .csv .sql
countries .csv .sql

The files can also be found in the shared UNIX folder /vol/share/groups/liacs/scratch/BIPM/.

A simple way to load data from file.sql into MySQL is via the command line, for example as follows:
mysql -h mysql.liacs.leidenuniv.nl -u username -p username < file.sql
Username is your ULCN username, password is your MySQL password. The second username is to select the database name, which is equal to your username.


Dashboard

The web-based dashboard consists of various (at least four) widgets that should each visualize (in a different way) the following aspects of the data:

  1. [10p] Sales Volume: Visualize the sales over time (for example, per month or per quarter) in terms of at least three measures: real money (price attribute), virtual money (amount attribute) and/or transaction count (row count).
  2. [10p] Attribute Segmentation and Filtering (OLAP-style): Present sales volume (as above) segmented per attribute: at least the methodId attribute should be included, but you can also think of the day of the week, time of the day or the country of the customer.
  3. [15p] Customer Behavior: Can you come up with some KPI's (metrics and scores) that help management understand the customer? Visualize them in a nice way. For example, the average number of transactions per customer, the average lifespan of a customer, etc. Bonus points if you can show a plot of a distribution of the (perhaps binned) values of the metrics in addition to an average.
  4. [10p] Geographical Development: visualize the sales volume (as above) in a geographical setting, for example the number of distinct customers per country over time. The goal is again to give management as much geographic insight as possible (see Strategic Questions later on).
  • [15p] You are very much encouraged to make the dashboard visually appealing, and to use non-traditional visualization techniques to give management an astonishing insight in the sales data of their game. In particular, the interaction and degrees of freedom in visualizing and exploring the sales data should be as large as possible. Try pushing this to the maximum, striving for as much "Codeless reporting" as possible. Important: Explain your toolstack in the report.

You can use any programming language, scripting language, markup language, framework or toolkit, but here are some hints.

  • Data storage: You can choose to store the data in a MySQL database, but you are also free to use and experiment with NoSQL databases such as ElasticSearch (try for example Kibana), Apache superset or any other framework of your choice.
  • Markup language: when needed, on the markup/client side you will want to use at least HTML, with proper CSS style classes assigned to elements. Try to stick to a modern presentation framework such as Twitter Bootstrap to prevent you from having to write custom CSS.
  • Scripting language: On the server side, you can use PHP to serve the data, and then use Javascript and in particular jQuery's interface for JSON on the client side. Depending on your type of data storage, you could also use Python on the server side. Or work with Node.js or Angular.js.
  • Visualization: For visually appealing charts you could for example use D3.js, Chart.js or Google Charts.

Not sure what to use? Go with what you learned in your first year the "easy mode":
- Data in the MySQL database (perhaps create some relevant indexes),
- One HTML page with the dashboard, styled using Twitter Bootstrap.
- One PHP script that serves relevant data in JSON-format.
- Some Javascript/jQuery code to put the data in relevant Google Charts diagrams.


Strategic BI Questions

The report that you hand in for this assignment should contain a short introduction to the data, the company and the dashboard.
Then, the following questions should be answered by querying the data or using the dashboard. For each question, always elaborately motivate your answers based on the data, for example by giving queries or instructions to use the dashboard.

  1. [10p] Visualization For each component of your dashboard, explain how you mapped data attributes to visual attributes, and why you made various visualization or interaction design choices. How do these choices help management understand the data, and to what extent have you implemented "Codeless reporting"?
  2. [10p] Decision Making: For next year, company management has budget to start expanding in three countries. Based on the data, what are the emerging countries in which expansion could be worthwile? State how you define "emerging". Try to do more than counting transactions. For example, look for a trend, or also incorporate volume or customer behavior.
  3. [10p] Cashflow Requirements Assume that it costs actual money to produce the coins (for example, because they are mined from a blockchain), and that one credit costs the company EUR 0.00025 to produce (and this production can be done instantly, real-time, as-a-service). Assume that all of a month's turnover in cash from payment methods provides arrives, at once, 30 days after the end of the respective month. Now calculate for 2013 how much cash the company should have at any point in time in order to pay its production costs. What metrics do you compute on a per-month basis, and how do you use them to find your answer?
  4. [10p] Fraud and Compliance: Gaming companies have a responsibility towards society to prevent addiction and excessive spending patterns. A customer that makes more than 50 orders within one day can be considered addicted, but (assuming that the coins have value and can be resold) could also be committing fraud (for example, with a stolen credit card or phone). How do you identify a customer when a customer can have multiple account names in the game? How do you propose to distinguish between addiction and fraud based on the data? What are the results of applying your metric to the data, i.e., who are potential fraudulent users?

Good luck with the assignment! Ask questions. A lot, if you have to. The deadline is posted on the course website.