liacs.leidenuniv.nl/~takesfw/DSPM/assignment1.html
Congratulations! For the next 3 weeks you are employed as all-round Data Science expert of the largest Leiden-based gaming business "Thunderstorm Entertainment". One of their games is CookieDestroyer, a popular game played by people of all ages and genders worldwide, primarily making money from selling in-game "coins" that allow the gamer to progress in the game more quickly. Your job is to provide the company's management with information on the status, growth and trends with respect to the monetization of CookieDestroyer. To do this, you will use data from the company's sales logs, containing all purchases of coins made by the customers. Ultimately, your data analysis, visualization, interpretation and tools should allow company management to make decisions on future steps to expand the game.
This is Assignment 1 of the Data Science and Process Modelling course taught at Leiden University. It is also the first assignment of the Data Science course.
For each part of the assignment, the number of points awarded for a 100% perfect answer is listed between brackets and sums to a total of 100 points. You should answer each question as precisely as possible; not addressing parts of the question means that fewer points are awarded. Your assignment grade (between 1 and 10, bounds included) is computed by dividing your number of points by 10 and rounding it to the nearest half. If you get an insufficient grade for the assignment, you can retake the assignment by meeting the assignment retake deadline. Please do not be late with handing in your work. If you are late with handing in your work, it means that you failed the assignment and that you are automatically using the retake deadline for the assignment. Retake assignment grades have 2 points subtracted from the total. You are allowed to work in teams consisting of exactly two people. For each question, clearly describe how you obtained your answer, and write down any non-trivial assumptions. All practical exercises can be done on the student workstations. Be sure to hand in digitally:
Questions or remarks? Preferably ask them during one of the weekly lectures or lab sessions. In case of urgent questions outside these hours, contact one of the course assistants via e-mail, or ask the lecturer in person.
The goal of the assignment is to:
The report that you hand in for this assignment should contain a short introduction to the data, the company and the dashboard, as well as the answers to the strategic questions.
The data for this assignment comes in three files: sales, methods and countries. The main data table to be studied is sales, which contains a few hundred thousand sales records spanning a time period from 2010 to 2015. It has the following attributes:
Furthermore, there are tables methods and countries which map the methodId and ipCountry fields in the sales table to (anonymized) payment method names and country names.
Each table is available in .sql format and in .csv format.
The files can be found here:
sales .csv .sql
methods .csv .sql
countries .csv .sql
For some additional instructions, see below.
The files can also be found in the Leiden University shared Linux folder /vol/share/groups/liacs/scratch/DSPM/.
The web-based dashboard consists of various (at least four) widgets that should each visualize (in a different way) the following aspects of the data:
You can use any programming language, scripting language, markup language, or framework (as long as it is open-source), but here are some hints.
Not sure what to use? Go with what you learned in your first year the "easy mode":
- Data in the MySQL database (perhaps create some relevant indexes),
- One HTML page with the dashboard, styled using Twitter Bootstrap.
- One PHP script that serves relevant data in JSON-format.
- Some Javascript/jQuery code to put the data in relevant Google Charts diagrams.
You could start with the skeletoncode.
The following questions should be answered by querying the data or using the dashboard. For each question, always elaborately motivate your answers based on the data, for example by giving queries or instructions to use the dashboard.
Good luck with the assignment! Ask questions. A lot, if you have to. The deadline is posted on the course website.
Webdevelopment know-how
A crash-course on webdevelopment is available. In addition, tutorials can be found at W3 Schools (HTML, CSS, javascript, jQuery, Bootstrap).
However, fluency in each of these languages is not per se required; some skeleton code is available, which may serve as a good starting point.
Webspace at LIACS
Each user should have SSH-access to the liacs.leidenuniv.nl webserver when connecting from the workstations in computer rooms, or via WiFi 'wlan 3'. Your can create your own website by placing files, e.g., an index.html file, in the ~/public_html folder on this server. Your website is visible at http://liacs.leidenuniv.nl/~s.... Also see generic instructions: LIACS-ISSC webserver
MySQL
A simple way to load data from file.sql into MySQL is via the command line while on the webserver or a LIACS desktop machine, for example as follows:
mysql -h mysql.liacs.leidenuniv.nl -u username -p username < file.sql
Username is your ULCN username, password is your MySQL password. The second username is to select the database name, which is equal to your username.
Mounting your webspace on your local machine
There is a possibility to conveniently mount the webserver's public_html folder on your own machine. This way, your webspace just looks like a regular folder on your computer.
From within the university this can be done by issuing the following commands:
mkdir public_html
sshfs s...@liacs.leidenuniv.nl:/home/s.../public_html ./public_html
The first command creates in your current working directory on your machine a folder public_html. This needs only to be done when this folder is not yet present. The second command mounts the remote folder ~/public_html on liacs.leidenuniv.nl in your local folder.
Note that this second command should be run each after each logout or reboot.
Mounting your webspace from outside the university
The ssh-server at liacs.leidenuniv.nl is only accessible from the computer workstations or from the ssh gateways sshgw.leidenuniv.nl or ssh.liacs.nl.
Starting from OpenSSH 7.3, released August 2016, the easiest way to configure this is using a SSH config file, which needs to be placed at ~/.ssh/config. Here is an example:
Host sshgw.leidenuniv.nl User s...... Host liacs.leidenuniv.nl User s...... ProxyJump sshgw.leidenuniv.nlGiven that you have an empty directory ./public_html, you can connect in the same way as you did at the local machines: