Delft - Leiden BioInformatics Master track
Molecular Computational Biology 2009
Exercises / Assignments
The answers to the problems formulated in the assignments may be either sent
to a.p.gultyaev /at/ biology.leidenuniv.nl or submitted after the lecture
hours. Deadlines for submissions in spring 2009 are indicated below.
Timely submissions of correct assignment solutions can contribute up to 2
points of the total exam mark (in other words, without assignments the
maximum mark is 8).
please don't send attachments or hyperlinks
in your e-mails (they will not be considered): it is enough to describe
briefly what has been done and what kind of result is obtained.
Assignments 1-6: deadline March 10, 2009
The genome of a human coronavirus NL63 has an accession
NC_005831. How many nucleotides does it contain? How many amino
acids are in the protein annotated as “replicase
polyprotein 1ab”, encoded in this genome?
Using protein-protein (blastp) program, determine the species
having the most similar homolog of human interferon-gamma
(NP_000610). Give the accession of this homolog and the number
of amino acid differences as compared to the human
The following DNA sequence fragment, containing some mutation,
was isolated from a patient:
(a) In what gene the mutation is located? On which chromosome?
How many nucleotides are changed?
(b) Using the annotation given for corresponding sequence
database entry, could you indicate possible diseases determined
by mutations in this gene?
What is the most efficient strategy to determine quickly the
difference (number of amino acid substitutions) between
homologous proteins from two strains of influenza virus? For
instance, determine the number of substitutions in the
polymerase PB1 from the strain resulting in the death of a
veterinarian during the outbreak of bird flu in 2003 in The
Netherlands (strain A/Netherlands/219/03) as compared to the
homologous protein from the strain isolated from the 1918
pandemy victim who had been interred in Alaska permafrost since
November 1918? (strain A/Brevig Mission/1/1918). How many of
these substitutions are conservative ones according to the
default substitution matrix (BLOSUM62) used in BLAST programs
One of the 8 RNA fragments of influenza A genome codes for a
polymerase called PB1 of about 750 amino acids. It has been recently
determined that the 5'-proximal part of this RNA fragment contains an
overlapping open reading frame (ORF) coding for another protein PB1-F2
of about 90 amino acids. However, for many influenza A virus strains
the information about this protein is still missing in GenBank. Using
the tool ORF Finder
determine the size
of PB1-F2 protein encoded by the PB1 segment from the strain
A/Netherlands/219/03 (accession AY340083). Using one of the BLAST
versions, provided by the ORF Finder, determine the strain that has
the most similar putative PB1-F2 to that from A/Netherlands/219/03.
Using BLAST options and the amino acid sequence of the protein
Dicer from Arabidopsis thaliana (accession Q9SP32), retrieve
putative (partial) plant Dicer mRNAs from the database of
expressed sequence tags (EST). What organisms have the putative
Dicer proteins with the highest sequence similarities to that
from A.thaliana (give three names and accession numbers of BLAST
hits)? (NB. EST database contains "raw" nucleotide sequences,
and its entries do not include features like coding sequences).
Assignments 7-10: deadline March 20, 2009
Using ClustalW program, available at the EBI website
(www.ebi.ac.uk/services/), calculate a multiple alignment
of five homologous Hfq proteins from the following organisms:
Escherichia coli (Accession NP_418593), Neisseria gonorrhoeae
(YP_207484), Nitrosomonas europaea (Q82V23), Legionella
pneumophila (Q5ZZK1) and Bacillus subtilis (NP_389616). How many
amino acid residues are completely conserved in all five
sequences? What is the length of the longest stretch of
conserved amino acids? Give this motif in single letter code.
(NB. The easiest input for ClustalW is a FASTA format file of
sequences, prepared in advance. For this assignment, default
parameters of ClustalW are sufficient)
Recently a so-called minor spliceosome (that catalyses the splicing of atypical introns)
has been identified in a number of organisms. In order to establish the evolutionary history
of the minor spliceosome, BLAST searches for minor spliceosome-specific proteins were
used. Explore the usefulness of PSI-BLAST program for the search of (distant) homologs of
one of the human minor spliceosome-specific proteins (accession NP_078847):
- How many hits are yielded by the PSI-BLAST iteration 1?
- How many hits are yielded by the PSI-BLAST iteration 2?
How many new hits with Evalue
better than threshold are found? Give the accession number and organism name for
the best of these new hits.
- Does PSI-BLAST iteration 3 yield new hits with E-value better than threshold? Explain
Using the database of protein profiles PROSITE
whether the amino acid sequence
contains some consensus pattern. Give the description of this consensus pattern.
Using the database of protein profiles PROSITE
determine the positions of conserved motifs (profiles) in one of the
human proteins involved in RNA splicing, 9G8 (Accession NP_001026854).
Using the resources of ENTREZ system, determine how many exons are in
the 9G8 gene, and which of these exons contain the sequences encoding
the found motifs.
Assignments 11-12: deadline April 20, 2009
IMPORTANT: Exercise 11 is preferably to be performed at one of
University computers (in any case, a University email should be
submitted at the server input form), because the PSIPRED server
description contains a statement about a distinction between academic
and commercial users. For academic ones, no problems (no registration,
passwords, etc.), the result is simply sent to your e-mail. Other
e-mails (e.g. your private accounts, especially hotmails etc.) may be
considered as commercial ones, with more complicated procedure of
using the server.
It is known that the secondary structures of RNA-binding
proteins Hfq contain specific structural motif: the N-terminal
alpha helix followed by several beta strand regions. Using
PSIPRED protein secondary structure prediction program
predict the secondary structure of Hfq
protein from Mesorhizobium loti (accession NP_102205).
How many helices and beta strand regions are predicted? What are
their lengths? Is the predicted secondary structure consistent
with the motif description, mentioned above?
Some RNA molecules have alternative secondary structures with
close values of free energy. For the sequence given below
predict the folding, using the program mfold (www.bioinfo.rpi.edu/applications/mfold).
What is the free energy of the lowest energy conformation? Free
energy of the structure 2? What are the main secondary structure
elements in these structures?