In Spring 2011 the course is taught by dr. Erwin Bakker.

IN4173 / MCB

Delft - Leiden BioInformatics Master track »

Molecular Computational Biology 2009

shouldn't that be "Computational Molecular Biology"?

A.P. Gultyaev (Sacha)
Instituut voor Biologie Leiden (IBL)
a.p.gultyaev (a)

H.J. Hoogeboom (Hendrik Jan) »
Institute for Advanced Computer Science (LIACS)
hoogeboom (a)

Schedule: Tuesday Feb 3 - Apr 28, 9.00-10.45, Snellius 174.

Exam: 2 June 2009, 14.00-17.00, Snellius 174
To practise: old exams.


tentative! In principle we will follow the same plan as last year. To be updated during the semester.
date topic handout
  APG deadline exercises see below
03.02.09 1 APG Basic molecular biology
Lecture Notes (covering the full range of APG lectures)
With exercises!
10.02.09 2 HJH String alignment by Dynamic programming Copies of transparencies [pdf] 14.2'09
copies of Alignment notes (Ron Shamir Lecture 2)
Eddy S.R. (2004a, Dynamic programming)
(new, under development:) lecture notes 12.02'09
17.02.09 3 APG Alignment & database search  
24.02.09 4 HJH Hidden Markov Models Copies of transparencies [pdf] 10.3'09
copies of Hidden Markov Models notes (Ron Shamir Lecture 5)
Eddy S.R. (2004b, BLOSUM62).
Eddy S.R. (2004c, hidden Markov).
03.03.09 5 APG Multiple alignment, profiles,
Gene finding
10.03.09 6 HJH Applications of HMM Copies of transparencies [pdf] 10.3'09
copies of HMM application notes Gene Finding (Ron Shamir Lecture 7)
17.03.09 7 APG Structure prediction (1)  
24.03.09 8 APG Structure prediction (2)
31.03.09 9 HJH Phylogeny Copies of transparencies [pdf] 1MB 5.4'06
lecture notes 12.02'09
alternatively, e.g., copies of Phylogeny notes (Ron Shamir Lecture 8)
or Mona Singh
07.04.09 10 APG RNA  
14.04.09 11 HJH Physical Mapping (PQ trees), sequencing copies of Physical Mapping (Ron Shamir Lecture 9, sections 9.1, 9.2)
and DNA sequencing, shortest common superstring (Saad Mneimneh Lecture 15)
transparencies [pdf] 1MB 16.4'08 [see also Mneimneh]
(new, under development:) lecture notes 05.03'09
21.04'09 12 HJH Exact pattern matching:
Aho-Corasick, Suffix trees
transparencies [pdf] 206kB 21.4'09
28.04'09 - - available when needed  

Recommended reading and websites

Free text availability, e.g. via PubMedCentral PubMedCentral (PMC) is indicated. Other articles are available via Leiden University Digital Library or (within University) at journal sites.
Altschul S.F. et al.
The statistics of sequence similarity scores. (BLAST tutorial).
Brudno M. et al. (2003).
LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13:721-731.
doi: 10.1101/gr.926603 (free full-text at
Durbin R., Eddy S.R., Krogh A., Mitchison G. added (HJH)
Biological Sequence Analysis, Cambridge University Press, 1998. (see e.g.,
Eddy S.R. (2004a).
What is dynamic programming? Nature Biotechnology 22:909-910
(doi: 10.1038/nbt0704-909)
Eddy S.R. (2004b).
Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22:1035-1036.
(doi: 10.1038/nbt0804-1035)
Eddy S.R. (2004c). added (HJH)
What is a hidden Markov model? Nature Biotechnology 22:1315-1216
(doi: 10.1038/nbt1004-1315)
This and other introductions to Computational Biology, as Primer in Nature Biotechnology.
Gribskov M., McLachlan A.D. & Eisenberg D. (1987).
Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84:4355-4358.
(PMCID: 305087)
Pearson W.R. & Lipman D.J. (1988).
Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85:2444-2448.
(PMCID: 280013)
Sippl M.J. (1993).
Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures. Journal of Computer-Aided Molecular Design 7:473-501.
Schwartz S. et al. (2003).
Human-mouse alignments with BLASTZ. Genome Res. 13:103-107.
doi: 10.1101/gr.809403 (free full-text at
Thompson J.D., Higgins D.G. & Gibson T.J. (1994).
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.
(PMCID: 308517)
Zuker M. (2000).
Calculating nucleic acid secondary structure. Curr. Opinion Struct. Biol. 10:303-310. doi: 10.1016/S0959-440X(00)00088-9 (within University free under Elsevier contract)


Databases publicly available on the web were covered in the first lecture. The following resources are from the handout.

Database Function Organization Address
GenBank Nucleotide National Center for Biotechnology Information BLAST!
search ENTREZ for:
EMBL Nucleotide European Bioinformatics Institute
SWISSPROT Amino acid Swiss Institute of Bioinformatics
ExPASy (Expert Protein Analysis System) proteomics server
PSIPRED Protein structure
Protein Structure Prediction Server
MFOLD RNA structure Bioinformatics Center at Rensselaer and Wadsworth

Some other addresses from the lecture notes:

Section 1.1: Sample GenBank Record

Section 5.2.1: For instance, SWISS-MODEL, one of the first servers for protein structure predictions, initiated in 1993 and accessible via the ExPASy web server. (

Section 6.1: Many of these parameters have been estimated from thermodynamic experiments (e.g. D.Turner et al.) and are available in the Internet.

The algorithm used for energy minimization by dynamic programming is somewhat similar to the algorithm used for alignment problem. The program mfold (M. Zuker) is the most frequently used. Mfold web server:

Section 6.2: Kinefold server of H. Isambert allows one to visualise kinetics of RNA folding.

Section 6.4: An example of user-friendly server is GPRM (genetic programming for RNA motifs) server (Hu, 2003). (

Exercises/assignments 2009

Two out of ten points for the final grade are based on the exercises from the hand-out (in other words, without assignments the maximum mark is 8). The answers to the problems formulated in the assignments may be either sent to a.p.gultyaev /at/ or submitted after the lecture hours. The deadlines for submission in spring 2009:
exercise date
1 - 6 March 10
7 - 10 March 20
11 - 12 April 20

To avoid tedious re-typing of the sequences, we made the assignments available on the web.

this page: (jan 09)
Comments to HJH