Seminar DNA String Algorithms (Fall 2008)
The Seminar DNA String Algorithms
is intended for Master students in Computer Science.
Lectures are in English.
The seminar is organised by
dr. H.J. (Hendrik Jan) Hoogeboom
dr. W.A. (Walter) Kosters.
The seminar examines algorithms that deal with DNA,
viewed as a string.
Students should perhaps write programs and certainly study recent literature,
and present their results. Meetings are on a weekly basis.
Students can also suggest subjects. For a "full" list, see below.
Dates: Monday September 8, 2008 — Monday November 24, 2008
(12 sessions, see below);
time: 11.15-13.00; room: 407, Snellius.
Credit points: 7 ECTS (at some other locations "6 ECTS"
are mentioned, but it is 7 (seven)).
An extra session has been added in December.
There is no written exam.
In this seminar we will discuss several issues dealing
with DNA and strings.
A typical question looks like this:
| || || ||teams||
|✓||Giovanni Bernardi||gbtito X gmail
|✓||Bertram Bourdrez||bertram X bourdrez.org
|✓||Ramon van Dam||ramonvandam X gmail
|✓||Bart van der Drift||bartvanderdrift X hotmail
|✓||Sjoerd van Egmond||sjoerdvanegmond X gmail
|✓||Johan Groenen||johan.groenen X gmail
|-||Sjoerd Henstra||shenstra X liacs.nl
|✓||Erik Jongsma||ejongsma X liacs.nl
|✓||Susan Laraghy||susan X running-gag.nl
|✓||Jori van Lier||jlier X liacs.nl
|✓||Edgar Reehuis||edgarreehuis X hotmail
|✓||Jan van Rijn||janvanrijn X gmail
| ||Johan de Ruiter||johan.de.ruiter X gmail
|✓||Marijn Swenne||marijn.swenne X gmail
|-||Frank Takes||ftakes X liacs.nl
Given a large piece of DNA, for example the (or a) human genome.
How many unique substrings of length 25 does
this string possess?
To get some ideas, see
The solutions may use sophisticated techniques like suffix arrays,
but can also be brute-force, or hybrid.
In scientific papers many algorithms are described for these problems.
Students are supposed to discuss these papers during class.
Results are presented by the students in one hour lectures
(i.e., 45 minutes),
where they could also contrast their own findings with
and research papers: see the list of references below.
Both members of the team seriously take part in the presentation.
Next to the oral presentations
(in PowerPoint or PDF) 10
page self-contained team essays on the subject must be written (in
it is also possible to give two 6 page individual essays.
Deadline: one month after the presentation.
(Each person writes on his/her own two presentations.)
Essays should be improved until students and lecturers
agree on the contents. Students must be present
during all sessions, and are supposed to actively
take part in the presentations, e.g., ask questions.
Immediately after each lecture there is a short
discussion between students and organizers,
where the presentation is evaluated.
Some of the subjects can perhaps be split into two, together
making a two hour lecture.
The first meeting is used to make a proper schedule.
Every student is supposed to choose and present at least two subjects,
one in the first half and one in the second half of the seminar.
The two-person teams split up after their first session, to make new teams.
You can use your own laptop for the presentations.
If you don't have one available, you can use ours
— let us know in advance!
In that case, please email the .ppt-
or .pdf-files a day before the presentation — at the latest.
It is wise to also put them on a memory-stick.
We take care of a beamer.
Algorithms, Artificial intelligence, Data mining, Complexity, Datastructures
(all at bachelor's level, see for
instance the Dutch course pages
And perhaps some Biology.
Students should register in advance, in August 2008,
by personal visit to the organizers.
The number of participants is at most 10 (approximately :).
|team ||date ||first ||second ||done ||topic
|D1 (Edgar & Marijn)||29.9||WAK||HJH||✓||Fast Algorithms for Selecting ...
WABI'07, Davila doi
|A1 (Giovanni & Jan)||6.10||HJH||WAK||✓||Fixed Parameter Tractable ...
CPM'08, Möhl doi
|C1 (Bertram & JohandR)||6.10||HJH||WAK||✓||Indexing a Dictionary ...
SPIRE'07, Landau doi
|E1 (SjoerdvE & JohanG)||13.10||WAK||HJH||✓||Jump-Matching with Errors
SPIRE'07, Butman doi
|F1 (Erik & Susan)||13.10||WAK||HJH||✓||More Efficient Algorithms ...
RECOMB'08, Ma doi
|G1 (Ramon & Bart)||20.10||WAK||HJH||✓||Optimal Self-adjusting Trees ...
SPIRE'07, Ko doi
|B1 (Jori (& Frank))||10.11||WAK||HJH||✓||Faster Algorithm for the Set ...
CPM'08, Gąsieniec doi
|B2 (Jan & Marijn)||10.11||HJH||WAK||✓||Efficient Text Proximity Search
SPIRE'07, Schenkel doi
|C2 (Giovanni & Edgar)||17.11||WAK||HJH||✓||A Faster Algorithm for RNA ...
WABI'08, Ziv-Ukelson doi
|G2 (Erik & JohanG)||17.11||WAK||HJH||✓||Searching for Gapped Palindromes
CPM 2008, Kolpakov doi
|D2 (Susan & Ramon)||24.11||WAK||HJH||✓||Tuning Approximate Boyer-Moore ...
SPIRE'07, Kalsi doi
|E2 (Bart & SjoerdvE)||24.11||WAK||HJH||✓||Minimum Common String Partition
WABI'08, Damaschke doi
|F2 (Jori & Bertram)||1.12||HJH||WAK||✓||CompostBin: A DNA Composition-Based ...
RECOMB 2008, Chatterji doi
|A2 ((Frank &) JohandR)||1.12||HJH||WAK||✓||Computation of Median Gene Clusters
RECOMB 2008, Böcker doi
The columns "first" and "second"
denote the person handling your paper.
A period "." means that that person is still busy reading ;-).
And what's "done"✓ is done.
References are either books or papers,
see the examples below.
Books are usually available in rooms 159 and 162.
More papers can be found in various conference proceedings,
we have provided some candidate conferences.
Remember that most doi's can only be used from computers within the university domain.
Many other books and the internet
may be helpful.
CPM (Combinatorial Pattern Matching)
RECOMB (Research in Computational Molecular Biology)
SPIRE (String Processing and Information Retrieval)
WABI (Workshop Algorithms in Bioinformatics)
M. Crochemore, C, Hancart and T. Lecroq,
Algorithms on Strings,
Cambridge University Press, 2007.
Algorithms on Strings, Trees and Sequences,
Cambridge University Press, 1997.
M. Kasahara and S. Morishita,
Large-scale Genome Sequence Processing,
Imperial College Press,
Computing Patterns in Strings,
ACM Press Books, 2003.
Jie Lin, Yue Jiang and Don Adjeroh,
The Virtual Suffix Tree: An Efficient Data Structure for Suffix Trees,
B. Phoophakdee and M.J. Zaki,
Genome-scale disk-based suffix tree indexing,
ACM SIGMOD International Conference on Management of Data,
Proceedings pp. 833-844;
S.J. Puglisi, W.F. Smyth and A.H. Turpin,
A taxonomy of suffix array construction algorithms,
ACM Computing Surveys 39 (2007) 2-4;
K.-B. Schürmann and J. Stoye,
An incomplex algorithm for fast suffix array construction,
Software: Practice and Experience 37 (2006) 309-329;
21 June 2010 — http://www.liacs.nl/home/kosters/semdna/index.html