Bachelorklas 2015-2016

Trust Management in Social Networks

Abstract:

developing an effective trust propagation model for Social networks

Supervisors:

TBD

Contact:

Open Linked Data and Privacy

Abstract:

developing solutions for privacy protection in Linked Open Data domains

Supervisors:

TBD

Contact:

Secure Data Transmission for the Smart Grids

Abstract:

developing a secure way to transfer electric data on the smart grid.

Supervisors:

Amr Ali-eldin + TBD

Contact:

Custom algorithm scripting language

Abstract:

Browsers use HTML, an internal DOM Model and Javascript to script webpages. A similar structure is desirable for software that reads in node-edge oriented data structures (trees / graphs) into an internal representation that is scriptable with custom algorithms (walkthroughs, shortest path, etc). Both the scripting language, representation and file structure are wanted, either through new development, literature study or a combination. A prototype is also required.

Supervisors:

Tim Cocx

Contact:

t.k.cocx@liacs.leidenuniv.nl

Data Mining Complex Workflows

Abstract:

Machine Learning algorithms attempt to learn from past data and make predictions for the future. There are many of such algorithms (see Weka). Choosing the right algorithm influences the quality of these predictions. However, recent insights suggest that choosing the right data representation influences this even more. We will use tools like KNIME and RapidMiner to learn from past experiments which algorithm, parameter setting and data representation should be used for a given dataset.

More information:

Supervisors:

Jan van Rijn

Contact:

j.n.van.rijn@liacs.leidenuniv.nl

Algorithm Selection using Landmarkers

Abstract:

A common data mining problem is given a dataset, predict what algorithm will work best on it. This task is known as the algorithm selection problem. One approach is to use landmarkers: first see how some fast algorithms work and based on that select a good algorithm. Many variants of landmarkers have been proposed in the past. However, most of these can be enhanced by choosing the right representation, which we will explore in this project.

More information:

Supervisors:

Jan van Rijn

Contact:

j.n.van.rijn@liacs.leidenuniv.nl

CUDA programming and optimization project

Abstract:

Optimization of algorithms for use on GPUs (for instance using CUDA) is still largely a manual task. In this project, we will select an algorithm (or set of closely related algorithms) that you will be mapping and optimizing on GPUs. Example algorithms are for instance: (sparse) matrix computations, graph algorithms, sorting algorithms.

(In case multiple students select this project, each student will work on a different algorithm).

Supervisors:

Rietveld

Contact:

SIMD vectorization project

Abstract:

Because vectorization capabilities of many optimizing compilers are weak, the development of vectorized software is still largely a manual task. In this project, we will select an algorithm (or set of closely related algorithms) that you will be mapping and optimizing using modern SIMD instructions (AVX2). Example algorithms are for instance: (sparse) matrix computations, graph algorithms, sorting algorithms.

(In case of multiple students, each will work on a different algorithm)

Supervisors:

Rietveld

Contact:

Distributed Hybrid Sort Algorithms

Abstract:

In a cluster computer, we can take advantage of many different levels of parallelism: vectorization (per-core), multi-core (per CPU), multi-CPU (per node), multi-node. What sort algorithm works best at which level and how do these sort algorithms work together? This leads to the development of hybrid sort algorithms. In this project you will implement two or three distributed hybrid sort algorithms and benchmark these algorithms on the DAS4 and DAS5 cluster computers.

Supervisors:

Wijshoff, Rietveld

Contact:

Distributed Matrix Multiplication

Abstract:

In this project you will be working on an implementation of distributed matrix multiplication by starting with an elementary specification of matrix multiplication. Through a series of transformations you will be deriving distributed variants of matrix multiplication which you will be implementing, testing and benchmarking on the DAS4 and DAS5 cluster computers.

Supervisors:

Wijshoff, Rietveld

Contact:

Vectorized Generation of Index Sets

Abstract:

In previous work we have devised methods to optimize database (SQL) queries through the use of compiler transformations. Key to this approach is to express queries as loops in which iteration is controlled by "index sets" ("forelem" loops). During code generation, code is generated to compute these index sets. In this project, you will be writing a code generator that can generate vectorized codes (using SIMD instructions) instead. Experimental evaluation is also part of this project.

More information:

Previous publications: http://link.springer.com/chapter/10.1007/978-3-319-17473-0_20 http://dl.acm.org/citation.cfm?id=2818180

Supervisors:

Rietveld

Contact:

Manual Vectorization Of Simple Functions

Abstract:

As the first part of this project you will be investigating the latest compiler technology (latest gcc, clang and Intel compiler) and see whether the compilers are capable of vectorizing entire functions. You will investigate the limitations of the compilers: when does the compiler's vectorizer break down? For a number of the identified cases, you will be investigating whether it is possible to vectorize these cases by hand and if so whether a generic method for doing so can be devised.

Supervisors:

Rietveld

Contact:

Image or Video Search/Recommendation

Abstract:

New algorithms to browse, recommend or search image and video databases or collections

Supervisors:

Dr. Michael S. Lew

Contact:

mlew@liacs.nl

Social networks on the playground

Abstract:

Data has been collected from children playing on a playground. Obvious data analysis questions could be, e.g., in which 'communities' the children play and how this evolves over time.

More information:

http://liacs.leidenuniv.nl/~nijssensgr/bachelorklas-2015-2016/topic-slides/vanleeuwen.pdf

Supervisors:

Matthijs van Leeuwen

Contact:

Description-driven community detection

Abstract:

Many algorithms for community detection, but most of them only consider the network. In reality, much more data is often available, i.e., we often have an attributed graph rather than just a graph. How can we use this information to search for communities that have a description?

More information:

http://liacs.leidenuniv.nl/~nijssensgr/bachelorklas-2015-2016/topic-slides/vanleeuwen.pdf

Supervisors:

Matthijs van Leeuwen

Contact:

Metro maps for pattern visualisation

Abstract:

Mining patterns from data is easy, but how do you present these to a domain expert who is not a data mining expert? One idea is to connect patterns and data through metro maps, which have also been used to visualise other types of information.

More information:

http://liacs.leidenuniv.nl/~nijssensgr/bachelorklas-2015-2016/topic-slides/vanleeuwen.pdf

Supervisors:

Matthijs van Leeuwen

Contact:

Mine, interact, learn, repeat

Abstract:

Data mining algorithms do not always return models or patterns that a user finds interesting, but in some cases this can be learned from the user through interaction. In this project we investigate new instances of this overall approach.

More information:

http://liacs.leidenuniv.nl/~nijssensgr/bachelorklas-2015-2016/topic-slides/vanleeuwen.pdf

Supervisors:

Matthijs van Leeuwen

Contact:

Data Mining for Cyber Security

Abstract:

Recently, various machine learning algorithms were successfully deployed to increase security of computer systems, networks, or industrial control systems. The goal of this project is to provide an overview of the state-of-the-art of existing solutions in one, well defined area (for example, mining system event logs, network access logs, monitoring network traffic, detection of DDoS attacks, malware detection, etc.), and to design, implement and experimentally validate new solutions.

Supervisors:

Dr. W. Kowalczyk

Contact:

w.j.kowalczyk@liacs.leidenuniv.nl

Auto-tuning of Locality Sensitive Hashing

Abstract:

LSH is a powerful (approximate) algorithm for finding similar objects in huge collections of texts, images, sound recordings, etc. LSH reduces the search time from O(N) to O(1). However, in order to achieve such performance, the algorithm must be tuned - there are several parameters that highly influence the algorithm's speed and accuracy. The objective of this project is to study existing heuristics for tuning LSH and to invent new heuristics and experimentally demonstrate their properties.

Supervisors:

Dr. W. Kowalczyk

Contact:

w.j.kowalczyk@liacs.leidenuniv.nl

Data Mining for Marketing

Abstract:

Wolters Kluwer (Alphen aan den Rijn) has an interesting data mining project (or projects) in the field of direct marketing. They are looking for (Dutch-speaking) students who would like to dig into their marketing data a produce some predictive models with help of R/RStudio or Python.

Supervisors:

Dr. W. Kowalczyk

Contact:

w.j.kowalczyk@liacs.leidenuniv.nl

RNomics: mining functional RNA structures encoded in genomic sequences

Abstract:

Functions of RNA molecules depend on their structures. Using the algorithms predicting structures folded by RNA sequences, it is possible to identify functional RNA patterns and mine RNA genes in the sequence databases.

Supervisors:

Alexander Gultyaev

Contact:

a.p.goultiaev@liacs.leidenuniv.nl

UML translation of Paradigm models

Abstract:

UML is state-of-the-art modelling language for object-oriented systems; a well-known weakness of UML is its lacking in support for consistency, particularly dynamic consistency. Paradigm is a non-standard coordination modelling language, also suited for modelling (unforeseen) self-adaptation. The Paradigm language syntactically guarantees vertical dynamic consistency for general Paradigm models. For a bachelor project, given Paradigm models should be translated into UML 2.0 models.

More information:

General UML documentation; for Paradigm papers, ask Luuk Groenewegen

Supervisors:

dr. Luuk Groenewegen

Contact:

luuk.and.liesje@gmail.com; office 132, on Mon-Wednes-Fri only

Biodiversity Oostvaardersplassen

Abstract:

At the Oostvaardersplassen Staatsbosbeheer will conduct an experiment to control biodiversity in certain areas by influencing the water level. Many measurements for biodiversity will be taken, and need to be analysed

More information:

TBA

Supervisors:

Joost Kok, Siegfried Nijssen, others

Contact:

Joost Kok

Mining Intensive Care Data

Abstract:

At the LUMC, large amounts of data are collected regarding patients on the intensive care. In this data, researchers are interested in identifying different types of sepsis, a whole-body inflammatory response to an infection, as different types of sepsis may require different types of treatment. In this project, you will use predictive clustering techniques to cluster the patients and characterize the different forms of sepsis.

More information:

http://liacs.leidenuniv.nl/~nijssensgr/bachelorklas-2015-2016/intro.pdf

Supervisors:

Siegfried Nijssen, Aske Plaat

Contact:

http://liacs.leidenuniv.nl/~nijssensgr/bachelorklas-2015-2016/intro.pdf

Mining Molecular Data

Abstract:

Pharmaceutical chemists have collected large databases in which the 2D structure of small molecules is associated with properties of these molecules, such as whether the molecule is toxic or carcinogenic. Based on the 2D structures of these molecules, they wish to predict the properties of the molecules. In this project, you will develop and evaluate new algorithms for this task. Possible algorithms include deep learning algorithms and pattern mining algorithms.

More information:

Supervisors:

Siegfried Nijssen

Contact:

Mining Mental Health and Addiction Data

Abstract:

The Dutch association of mental health and addiction care collects data from all over the Netherlands that reflects the conditions of patients before and after they receive treatment. They are interested in discovering relationships between the conditions of patients and the effectiveness of treatments. You will develop machine learning algorithms that combine linear models and tree-based models to discover these relationships.

More information:

http://www.ggznederland.nl/

Supervisors:

Siegfried Nijssen, Aske Plaat

Contact:

Mining Expression and Mutation Data

Abstract:

Nowadays, large amounts of data are collected that reflect the expression of genes in patients or organisms, as well as the mutations present in the genome of these patients or organisms. In this project, you will develop a tool to analyse such data. Possibilities include to search for an evolutionary tree that is consistent with the mutations observed in organisms, or the discovery of different clusters of patients in terms of gene expression profiles.

More information:

http://liacs.leidenuniv.nl/~nijssensgr/bachelorklas-2015-2016/intro.pdf

Supervisors:

Siegfried Nijssen

Contact:

http://liacs.leidenuniv.nl/~nijssensgr/bachelorklas-2015-2016/intro.pdf

Probabilistic Inference

Abstract:

Probabilistic models, such as Bayesian networks, can be used to encode uncertain data, such as for instance social networks in which the strength of connections between individuals is not clear. On this model, new problems can then be solved such as: how likely is it that three people know each other? Which people should receive a targeted advertisement to influence the largest number of people in a social network? In this project you will develop a generic program to solve such problems.

More information:

Supervisors:

Siegfried Nijssen

Contact:

http://liacs.leidenuniv.nl/~nijssensgr/bachelorklas-2015-2016/intro.pdf

An SQL for Data Mining

Abstract:

While for databases, SQL is the commonly accepted declarative query language, in data mining there is no such standard. However, in artificial intelligence numerous declarative programming systems have been developed that could allow to solve data mining problems in a declarative manner as well. In this project, you will study how a pattern mining problem can be implemented in a declarative programming system.

More information:

Supervisors:

Siegfried Nijssen

Contact:

Comparison of Network Analysis Toolkits and Frameworks

Abstract:

The study of large graphs (networks) requires large volumes of data to be processed and analyzed in all kinds of different settings. Over the past 20 years, different toolkits, packages and other pieces of software have been developed to analyze large graphs. The student is expected to survey these software packages, and compare the most popular ones, paying specific attention to how well they are suitable for the analysis of large networks of millions of nodes and possibly billions of edges.

More information:

https://groups.google.com/forum/#!topic/liacs-thesis-projects/ouas3ZOTISA

Supervisors:

dr. F.W. Takes

Contact:

f.w.takes@liacs.leidenuniv.nl

Clustering presenters based on physiological response

Abstract:

Several physiological variables, such as heart rate, were measured while students were giving presentations. The research question is whether the resulting time series can be clustered in a meaningful way, such that different responses to presenting can be discovered.

Supervisors:

Shengfa Miao, Matthijs van Leeuwen

Contact:

Interactive web platform for network analysis

Abstract:

A modern responsive web platform for the visualization of (social) networks using for example sigma.js or D3.js, and an option to upload a network network dataset and do online computation of various network measures, algorithms and statistics. Result is a service-oriented architecture that runs on specialized high performance hardware (16-core, 1.5TB RAM, 12TB SSD). Challenges in usability, scalability and performance.

More information:

http://liacs.leidenuniv.nl/~takesfw/pdf/bachelor-projects2015.pdf

Supervisors:

dr. F.W. Takes

Contact:

f.w.takes@liacs.leidenuniv.nl

Tuning IMOD for use in a distributed environment

Abstract:

IMOD is a software package for analyzing electron tomographs that we would like to run on a cluster computer. This project aims at deploying IMOD in a parallel fashion on the available cluster. You will study the source code of IMOD's batch execution facility (written in Python), make a design how to best map IMOD's workflow to the available cluster, implement the design and run benchmarks.

More information:

http://liacs.leidenuniv.nl/assets/Bachelorscripties/Inf-studiejaar-2014-2015/2014-2015SimonRKlaver.pdf

Supervisors:

Verbeek, Rietveld

Contact:

http://liacs.leidenuniv.nl/assets/Bachelorscripties/Inf-studiejaar-2014-2015/2014-2015SimonRKlaver.pdf

IMOD as a Service

Abstract:

This project must be done together with the "Tuning IMOD for use in a distributed environment" project. The aim to is provide a clear webinterface for scientists who want to use IMOD on the LLSC. In this webinterface they can upload the source data and configure the computation that must be performed. Upon completion of the computation, the webinterface shows the result of the computation and has the option of downloading the resulting dataset.

More information:

Supervisors:

Verbeek, Rietveld

Contact:

Deploying PEET on the LLSC

Abstract:

PEET is a software package for analyzing particle electron tomographs that we would like to run on a cluster computer. This project aims at deploying PEET in a parallel fashion on the available cluster.

Supervisors:

Verbeek, Rietveld

Contact:

Python bindings for DIPLIB

Abstract:

DIPLIB is a collection of image filters for use in image processing. Currently, DIPLIB is integrated in Matlab. In the last few years, the combination of iPython, NumPy and matplotlib has been gaining a lot of traction as an alternative to Matlab. In this project, we will look into writing Python bindings for DIPLIB, such that DIPLIB can be used from within Python and iPython.

Supervisors:

Verbeek, Rietveld

Contact:

Distributed Sequence Analysis ("sequencing as a service")

Abstract:

Design and development of infrastructure and interface for scientists in Biology to easily submit sequencing jobs. The sequencing jobs will be automatically parallelized and distributed over multiple machines in the LLSC.

Collaboration will be sought with the department of Biology to determine the job characteristics and user interface requirements.

Supervisors:

Verbeek, Rietveld

Contact: