Arno Knobbe is a senior researcher at the LIACS, and head of the Data Mining
group.
His current research revolves around the following topics:
Subgroup Discovery
Subgroup Discovery is the art of finding specific areas in the data that show a significant difference in behaviour, compared to the overall dataset,
and that are easily described in terms of the available attributes. By ‘behaviour’, we can mean many things, depending on the specific application.
The simplest setting would be where one would assign one of the attributes as the target, and then try to identify parts of the data,
so-called subgroups, where the average value of the target is notably different from that of the entire population. Over the last years, I have
been pioneering a variation of Subgroup Discovery that considers multiple target attributes, and subgroups are deemed interesting if a model that
is fitted to the target attributes is somehow (depending on the application) different from a model fitted to the targets for the entire data.
This approach, called Exceptional Model Mining comes in many flavours, depending on the type of the target attributes and the nature of the models
fitted to the data. See this paper for an extensive survey of the
work on EMM. My efforts on SD are bundled in the Cortana and
Safarii packages.
Sensor Data Analysis
My second line of research deals with time series and the modelling of complex, dynamic systems, specifically where such time series are produced
by sensor systems, which are capable of producing data at an unprecedented scale. One of the key applications in this area has been the
InfraWatch
project, of which I have been the project manager over the last five years. The project revolved around a highway bridge on the A6 between
Amsterdam and Almere, which was fitted with 145 sensors of various types, and has been producing data ever since their installation in 2008.
The sensor data we model tends to combine continuous (e.g. how the apparent strain on a bridge depends on the outside temperature and amount of
sunlight absorbed) and discrete elements (e.g. a heavy truck passing the bridge or traffic jams). Additionally, one tends to recognize multiple
intertwined effects in the data, delays and integration over time, and effects at different time scales (ranging from seconds to years). My group has been
pioneering the use of Minimum Description Length techniques to model complex sensor data, for example to separate sensor data that works at different
time scales into temporal components.
A full list of publications since 1995 can be found here.
News
I am currently on sabbatical at the Sports and Nutrition faculty of the Amsterdam University of Applied Science, and will return in Leiden fall 2015.
Recently accepted papers: