Data Mining Group
Data Mining Group
The LIACS Data Mining group is concerned with fundamental and applied research in the areas of Data Mining and Knowledge Discovery. The theoretical research focuses on finding regularities in complex data such as large graphs, streams, time-series and relational databases. The research of applied nature is concerned with producing understandable and actionable insights from data provided by partners from the academic and commerical domain. Typically, these application areas require the development of new mining techniques that solve the specific challenges of the data at hand. The research partners include the Leiden University Medical Center, T-Mobile, Strukton, the Netherlands Cancer Institute, to name but a few.
Data Mining Course
The Data Mining group is also responsible for the Data Mining course (code DaMi). The course webpages can be found here.
A complete list of members of the LIACS Data Mining group can be found here.
Below is a non-exhaustive list of projects run by the LIACS DM group:
- Annotated Graph Mining: This VIDI project aims at developing a new paradigm for data mining, one that is based on the analysis of annotated graphs. These are graphs where nodes and edges are annotated with extra information. The analysis of such graphs comprises both analysis of the graph structure and of the annotations. With this novel representation, data mining methods can be developed that strike an ideal balance between analysis of the graph structure, and analysis of the information in the annotations, and thus combine the advantages of the different approaches to relational mining that currently exist.
- Exceptional Model Mining: The EMM projects focuses on extensions of Subgroup Discovery that allow for more complicated target concepts. Rather than finding subgroups based on the distribution of a single target attribute, EMM finds subgroups where a model fitted to that subgroup is somehow exceptional.
- InfraWatch: This recently started project is concerned with the monitoring and large-scale modeling of sensory data collected at the Dutch highway bridge "Hollandse Brug". 145 sensor plus a weatherstation and video camera produce a continuous stream of data under different traffic and weather situations. Our goal is to use this data to model the structural characteristics of the bridge over a long period.
- The CATCH LINKS project is a collaboration on the border of computer science and history. In the early 1800's, municipalities in the Netherlands started to systematically record key population events, such as births, marriages and deaths. Recently, these data have been digitized to a considerable degree. Without unique indentifiers, reconstructing relations in this data becomes a research problem. Besides relation discovery (known as ‘record linkage’), also domain knowledge discovery and visualization are interesting from a data mining point of view.
- COMPASS: The goal of this project is the development of stream mining techniques for complex patterns such as graphs. We will try to extend the existing state-of-the-art techniques into two, orthogonal directions: on the one hand, the mining of more complex patterns in streams, such as sequential patterns and evolving graph patterns (for example social networks), and on the other hand, more natural stream support measures taking into account the temporal nature of most data streams.
- CORTANA: A Data Mining tool for discovering local patterns in data. Cortana features a generic Subgroup Discovery algorithm that can be configured in many ways, in order to implement various forms of local pattern discovery. The tool can deal with a range of data types, both for the input attributes as well as the target attributes, including nominal, numeric and binary. A unique feature of Cortana is its ability to deal with a range of Subgroup Discovery settings, determined by the type and number of target attributes. Where regular SD algorithms only consider a single target attribute, nominal or sometimes numeric, Cortana is able to deal with targets consisting of multiple attributes, in a setting called Exceptional Model Mining.