Marvin Meeng

Marvin has been a researcher at the Algorithms group of LIACS (Leiden Institute of Advanced Computer Science) since 2009, and has worked as an embedded scientist at various organisations other than LIACS. His main focus is on developing Data Mining algorithms.

His predominant research field is the data mining paradigm Subgroup Discovery, others include Multi-relational Data Mining, Equation Discovery, Time-series Analysis, Graph and Text Mining, and Fraud and Outlier Detection.
Application domains range from bioinformatics (NBIC, NCSB, CMSB, LUMC), cheminformatics (LACDR), insurance and financial fraud detection (NZA, FSSC), population reconstruction (ChartEx), and sports data analytics (NOC*NSF, SDC).

Current Projects
Marvin currently works for the Focus On Emotions research group of Professor Carolien Rieffe, in the Developmental and Educational Psychology unit of the Institute of Psychology at Leiden University. For this project, children wear coin-sized sensors that continuously register spatial proximity to other sensors, allowing evaluation of social interactions. This technology was successfully used in a previous, prize-winning, project by the same group and members of LIACS.

Marvin set up the original system, which consists of dozens of OpenBeacon Bluetooth RFID tags and a receiver, and has since maintained it. This renewed cooperation mainly aims to exploit more of the technical capabilities of the receiver, a Raspberry Pi-like Beagle Bone Black mini-computer. New algorithms for data analysis and visualisations are developed in parallel.

Software and Algorithms
Marvin created various novel data mining software tools and algorithms.

Cortana is the most versatile Data Mining software tool to perform Subgroup Discovery and Exceptional Model Mining currently available. These local, supervised, descriptive pattern mining paradigms aim to provide the analyst with readily interpretable descriptions of subsets of the data that show interesting behaviour (like 'Country = Finland', when looking for factors correlated with high levels of happiness). Marvin is the co-creator and lead developer of this tool, though in the past individual model classes were developed by other members of the Data Mining group and integrated with its generic search algorithm.

SDMM is a novel, state-of-the-art, generic Subgroup Discovery algorithm, scheduled for publicly release in 2020. It is three to five orders of magnitude faster than the original Cortana algorithm, through the use of different data structures that allow a reduction of computational complexity. Also, SDMM offers a superset of the model classes available in Cortana, to which it was partly introduced in 2019. A DMKD paper was accepted and is forthcoming.

QUICKIE is an Equation Discovery tool geared towards Time-series Analysis, that further allows straightforward manipulation of Time Series data, and visualisations that aid their interpretation. It was developed in the context of whole-body-metabolism modelling using differential equations and convolution kernels (hence, Quick User Interface for Convolution Kernel Involving Experiments). Marvin created this tool while working as an embedded scientist at the LUMC. Ricardo Cachucho later significantly extended the tool with methods for Feature-Extraction while working on his PhD thesis.

FunViz computes and visualises so-called performance funnels, used in sports data analytics, based on Elo scores and related metrics. It was used the first year when Marvin worked as an embedded scientist at NOC*NSF. He then set up a new infrastructure based solely on SQL databases and SQL-based data analysis code and QlikView (for visualisation only), which NOC*NSF uses till this day.

last updated: 2020.04.20