next up previous contents
Next: References Up: A Gentle Guide to Previous: 2 The Carrillo-Lipman Method

3 Heuristic Alignment Procedures and Examples

 

Here is what you will learn in the following sections: You will understand how the most popular Multiple Alignment heuristic works, and following an example, you will investigate optimal, heuristic, and structurally verified multiple alignments obtained from WWW servers, recapitulating results from an original paper.

3.1 Alignment along a Tree.

[ AliAlongTree ] For more than approximately 8 sequences of average size and similarity, even employing Carrillo-Lipman bounds may not result in a manageable demand on time and memory space, so that an optimal alignment cannot be obtained. (This is the state-of-the-art in 1996.) In such cases, alignment along a tree can be the alternative of choice.


Figure 13: Phylogenetic Tree.

Imagine that you have obtained a phylogenetic tree for the sequences (Fig. 13). This tree may be the result of morphological studies, or it may be obtained from the sequences themselves by one of the methods described in chapter 4. One popular approach (employed by the Clustal software package, http://dot.imgen.bcm.tmc.edu:9331/multi-align/multi-align-vsns.html), is the generation of all optimal pairwise alignments, the costs of which form the estimated distances between the sequences. From these distances, a tree can then be obtained.

Exercise [02] How many pairwise comparisons need to be done for sequences ?

Alignment along a tree is just this; a tree is used to decide about the sequences that shall be aligned first, because of their close relation. After the first step, more sequences are added by aligning them to the existing alignment; we may also align an alignment to an alignment. Alignment along a tree does not necessarily yield an optimal alignment, even if the tree is "perfect". For example, errors may be made in the very first pairwise alignment and they do not get corrected because information from the other sequences is overshadowed during the later steps.

Exercise [02, opt] For which kind of trees may you need to align an alignment to an alignment ? Or, alternatively, for which kind of trees do you not need to bother with this ?

The technique for aligning alignments is to simulate standard pairwise alignment, but use profiles instead of sequences. For each position, a profile holds a list of the relative frequencies (i.e. values between 0 and 1) of the 20 amino acids (and gap), and the cost of matching a position in profile A with a position in B is calculated by multiplying the (mis)match scores, for each pair of amino acids, by the said amino acids' frequencies at these positions, and summing up.

Exercise [05A] Calculate the score of matching the following two positions in profiles A, and B, respectively:

Use the PAM250 similarity matrix, yielding a similarity score.

Exercise [10M, opt.] Develop the mathematical formula for the alignment of profiles. If you like, begin with the formula for aligning one sequence to a profile. To this end, you need to introduce frequency vectors of length 21, one vector per position of the profile.

Normally, the alignments obtained thus far are fixed; gaps may only be added. Then, we follow the rule "Once a gap, always a gap" [FeD87], also known as "Progressive Alignment".

Our technique is illustrated by Fig. 14, adapted with permission from [Bar95], the original of which is available at http://geoff.biop.ox.ac.uk/papers/rev93_1/Figure5.ps.

Figure 14: Progressive Alignment.

Some methods (e.g. [BaS87]) do an iterative refinement of the alignment after the initial pass; now gaps may move.

The following concepts may easily be confused:

Scoring along a tree is the main alternative to the simple "sum-of-pairs" cost model; only pairs of sequences that are adjacent (neighboring) in the tree are taken into consideration (or, at least they're weighted higher). Indeed, by weighting the pairs differently, we can score along a tree, yet employ Carrillo-Lipman and try out all possibly optimal alignment paths in the hyperlattice, see [AlL89] ! "Tree Alignment" subsumes methods that involve reconstructing ancestral sequences, too.

3.2 A Hands-On Example: Aligning Immunoglobulin Sequences.

[ ImmIntro ] We will now apply our knowledge about heuristic and optimal alignment methods to a real-life example. The example is more real-life than usual for a textbook; we will deal with a lot of problems you may face in your own investigations, like hard-to-find sequences, inconsistent data, etc. The author hopes that this has got some advantages, too :-)

Our example is taken from the paper "A Strategy for the Rapid Multiple Alignment of Protein Sequences. Confidence Levels from Tertiary Structure Comparisons." by G.J. Barton and M.J.E. Sternberg, J Mol Biol 1987;198:327-337.

We will discuss alignments of the immunoglobulin sequences they are using; fragments of these sequences have already been featured in the introduction.

Exercise [05*] Get the paper ! J Mol Biol, the Journal of Molecular Biology, is an absolute "must" for any university library. Students of the GNA-VSNS Biocomputing Course may receive a copy from the instructors/organizers, if needed. Nevertheless, care has been taken to ensure that the following section is self-contained.

Exercise [10*] Inform yourself about the molecular biology of immunoglobulins; light chain, heavy chain, disulphide bridges, constant region, so-called variable region, and how they fit together. (See also Fig. 15, below.)

3.3 Getting the Immunoglobulin Sequences from the Internet.

[ ImmDescr ] The Barton & Sternberg paper is now 9 years old; it's from the early days of Multiple Alignment ! 9 years can be a long time for sequences, too, as we will find out really soon.

The authors write the following about their selection of sequences; formatting their description was done by the textbook author. "Eight domains were selected (Brookhaven Data Bank codes).

The chains from FAB and FC1 make up one of the identical halves of an antibody; one light ("") chain, and one heavy ("") chain, the heavy chain consisting of 3, and the light chain consisting of 1 constant region, see Fig. 15. For more details, please try Kevin Shreders's Antibody Resource Page, http://www.antibodyresource.com/, in particular the link to Mike Clark's page featuring Images of Immunoglobulin Molecules. As an example of a relevant database, you may explore the Kabat Database of Sequences of Proteins of Immunological Interest, http://immuno.bme.nwu.edu/.

Figure 15: Schematic Structure on an Antibody (Immunoglobulin).

The FB4 regions are added to the collection in order to have an equal amount of variable and constant regions. Let me stress that the "variable" regions get their name from the antigen-binding subregions ("CDRs", complementarity- determining regions), which are composed of just a few amino acids each, and give the antibody its specificity. Most of the variable region of an antibody is about as conserved as the constant regions are !

Exercise [5, opt.] Using the Molecules R Us server, http://molbio.info.nih.gov/cgi-bin/pdb, get some images of the 1FC1 immunoglobulin.

Exercise [15, opt.] Using technology from the VSNS-PPS course, http://www.cryst.bbk.ac.uk/PPS/index.html, you can take a closer look at 1FC1. (This may take some time, though, if you need to install software, etc. Right now, the GNA-VSNS Biocomputing Course organizers have not got enough time resources to help you intensively.)

In the alignments from the paper and from our introduction, the sequences are arranged as follows:

The arrangement of the variable and constant sets is done to maximize similarity of adjacent sequences: both FB4 variable regions go together, and both FAB constant regions go together. We will use this numbering (BS1-BS8) throughout.

Up until the beginning of the next subsection (3.4) the following is an optional part of the chapter, in which you will retrieve the sequences from the net, and check your results.

[ ImmRetrieval ] Exercise [15, opt.] For this exercise, note that there are quite a lot of differences between the sequences you retrieve and the sequences from the paper. What's more, the sequences will be different depending on the data bank you searched ! But don't despair, you will have a scout with you !
Obtain the 8 immunoglobulin sequences, using what you learned in chapter 2. If you've not read chapter 2 (What a shame !), start with Pedro's list, http://www.public.iastate.edu/~pedro/research_tools.html and try out the various PDB resources. Hint: 2 of the entries have been superseded, and once you know the new entry IDs, you can search via SRS-WWW, http://www.embl-heidelberg.de/srs/srsc. If you'd like to obtain sequences with the one-letter code directly, (and you want to end up with exactly the same sequences as the author), you can access a nice databank for this via SRS: PDBFINDER. PDBFINDER however does not distinguish variable and constant regions; they are just concatenated ! (But you don't need to worry about this.)

Let's take a look at the 3 PDBFINDER files you retrieved:
http://www.embl-heidelberg.de/srs/srsc?[PDBFINDER-id:7FAB]
http://www.embl-heidelberg.de/srs/srsc?[PDBFINDER-id:1FC1]
http://www.embl-heidelberg.de/srs/srsc?[PDBFINDER-id:2FB4]

They're quite regular, 7FAB and 2FB4 listing one heavy and one light chain each, and 1FC1 listing 2 identical chains A and B.

Exercise [05B*] Why are chains A and B identical ?

Exercise [05B*] "Why do light and heavy chains suddenly have the same length in 7FAB and 2FB4 ? I thought, the heavy chain is twice as long ?!"

[ ImmRetrievalVerif ] We will now do a plausibility check on whether we've retrieved the right sequences. To this end, we'll align the fragments from the introduction (they are listed in the order BS1-BS8, taken directly from the paper) to the retrieved sequences. Variable and constant regions are still stuck together !

Exercise [10] Using the Clustal Query Form, http://dot.imgen.bcm.tmc.edu:9331/multi-align/multi-align-vsns.html, align the fragments with the chains. Note that the above query provides a Clustal Interface with the 1995 default parameters, so that your alignments match exactly the ones cited in this text ! If you use the standard BCM Launcher page, you will get different results. Your Query, in Fasta-Format, should look like:

>7FAB_light_chain
ASVLTQPPSVSGAPGQRVTISCTGSSSNIGAGHNVKWYQQLPGTAPKLLIFHNNARFSVSKSGTSATLAITGLQAEDEAD
YYCQSYDRSLRVFGGGTKLTVLRQPKAAPSVTLFPPSSEELQANKATLVCLISDFYPGAVTVAWKADGSPVKAGVETTTP
SKQSNNKYAASSYLSLTPEQWKSHKSYSCQVTHEGSTVEKTVAP
>2FB4_light_chain
QSVLTQPPSASGTPGQRVTISCSGTSSNIGSSTVNWYQQLPGMAPKLLIYRDAMRPSGVPDRFSGSKSGASASLAIGGLQ
SEDETDYYCAAWDVSLNAYVFGTGTKVTVLGQPKANPTVTLFPPSSEELQANKATLVCLISDFYPGAVTVAWKADGSPVK
AGVETTKPSKQSNNKYAASSYLSLTPEQWKSHRSYSCQVTHEGSTVEKTVAPTECS
>2FB4_heavy_chain
EVQLVQSGGGVVQPGRSLRLSCSSSGFIFSSYAMYWVRQAPGKGLEWVAIIWDDGSDQHYADSVKGRFTISRNDSKNTLF
LQMDSLRPEDTGVYFCARDGGHGFCSSASCFGPDYWGQGTPVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFP
QPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPKSC
>7FAB_heavy_chain
AVQLEQSGPGLVRPSQTLSLTCTVSGTSFDDYYWTWVRQPPGRGLEWIGYVFYTGTTLLDPSLRGRVTMLVNTSKNQFSL
RLSSVTAADTAVYYCARNLIAGGIDVWGQGSLVTVSSASTKGPSVFPLAPTAALGCLVKDYFPEPVTVSWNSGALTSGVH
TFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEP
>1FC1
PSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPQVKFNWYVDGVQVHNAKTKPREQQYNSTYRVVSVLTVLHQNWLDGK
EYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSREEMTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPV
LDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLS
>BS1-fragment
VTISCTGSSSNIGAGNHVKWYQQLPG
>BS2-fragment
VTISCTGTSSNIGSITVNWYQQLPG
>BS3-fragment
LRLSCSSSGFIFSSYAMYWVRQAPG
>BS4-fragment
LSLTCTVSGTSFDDYYSTWVRQPPG
>BS5-fragment
PEVTCVVVDVSHEDPQVKFNWYVDG
>BS6-fragment
ATLVCLISDFYPGAVTVAWKADS
>BS7-fragment
AALGCLVKDYFPEPVTVSWNSG
>BS8-fragment
VSLTCLVKGFYPSDIAVEWESNG

Here is the result you will get:

Page 1.1
              1            15 16           30 31           45 46           60 61           75 76           90 
 1 7FAB_light --------------- --------------- --------------- --------------- --------------- ---------------      0
 2 2FB4_light QSVLTQPPSASGTPG QRVTISCSGTSSNIG SSTVNWYQQLPGMAP KLLIYRDAMRPSGVP DRFSGSKSGASASLA IGGLQSEDETDYYCA     90
 3 2FB4_heavy --------------- --------------- --------------- --------------- --------------- ---------------      0
 4 7FAB_heavy --------------- --------------- --------------- --------------- --------------- ---------------      0
 
 5 1FC1       --------------- --------------- --------------- --------------- --------------- ---------------      0
 6 BS1-fragme --------------- --------------- --------------- --------------- --------------- ---------------      0
 7 BS2-fragme --------------- --------------- --------------- --------------- --------------- ---------------      0
 8 BS3-fragme --------------- --------------- --------------- --------------- --------------- ---------------      0

 9 BS4-fragme --------------- --------------- --------------- --------------- --------------- ---------------      0
10 BS5-fragme --------------- --------------- --------------- --------------- --------------- ---------------      0
11 BS6-fragme --------------- --------------- --------------- --------------- --------------- ---------------      0
12 BS7-fragme --------------- --------------- --------------- --------------- --------------- ---------------      0

13 BS8-fragme --------------- --------------- --------------- --------------- --------------- ---------------      0
Page 2.1
              91          105 106         120 121         135 136         150 151         165 166         180 
 1 7FAB_light --------------- -----------ASVL TQPPSVSGAPGQRVT ISCTGSSSNIGAG-H NVKWYQQLPGTAPKL LIFHNNARFSVSKSG     63
 2 2FB4_light AWDVSLNAYVFGTGT KVTVLGQPKANPTVT LFPPSSEELQANKAT LVCLISDFYPGA--V TVAWKADGSPVKAGV ETTKPSKQSNNKYAA    178
 3 2FB4_heavy --------------- -----------EVQL VQSGGGVVQPGRSLR LSCS-SSGFIFSS-Y AMYWVRQAPGKGLEW VAIIWDDGSDQHYAD     62
 4 7FAB_heavy --------------- -----------AVQL EQSGPGLVRPSQTLS LTCT-VSGTSFDD-Y YWTWVRQPPGRGLEW IGYVFYTG-------     55

 5 1FC1       --------------- ---------PSVFLF PPKPKDTLMISRTPE VTCVVVDVSHEDPQV KFNWYVDGVQVHNAK TKPREQQYNSTYRVV     66
 6 BS1-fragme --------------- --------------- -------------VT ISCTGSSSNIGAG-N HVKWYQQLPG----- ---------------     26
 7 BS2-fragme --------------- --------------- -------------VT ISCTGTSSNIGS--I TVNWYQQLPG----- ---------------     25
 8 BS3-fragme --------------- --------------- -------------LR LSCS-SSGFIFSS-Y AMYWVRQAPG----- ---------------     25

 9 BS4-fragme --------------- --------------- -------------LS LTCT-VSGTSFDD-Y YSTWVRQPPG----- ---------------     25
10 BS5-fragme --------------- --------------- -------------PE VTCVVVDVSHEDPQV KFNWYVDG------- ---------------     25
11 BS6-fragme --------------- --------------- -------------AT LVCLISDFYPGA--V TVAWKADS------- ---------------     23
12 BS7-fragme --------------- --------------- -------------AA LGCL-VKDYFPEP-V TVSWN---SG----- ---------------     22

13 BS8-fragme --------------- --------------- -------------VS LTCLVKGFYPSD--I AVEWESNG------- ---------------     23

(continues alignment of full chains)
Exercise [00B] Why did we not just use our text editor to find the fragments in the sequences ?

Exercise [00] Why did we not use a local alignment technique like the one presented in chapter 1 ? That would have worked much better !

[ ImmRetrievalVerifDisc ] Let us interpret the Clustal Alignment. First of all, 2FB4 light got shifted; its constant region seems to be very similar to the variable regions of other chains !

Exercise [02B] How do we know that the PDBFINDER files list the variable region followed by the constant region, and not vice versa ?

BS1, BS3 and BS4 align as expected to the variable regions of the 7FAB / 2FB4 chains. (There is an HN/NH difference between fragment BS1 and 7FAB, at positions 150-151, Also, Pos. 152 is inconsistent for BS4.) BS2 is supposed to align with the variable chain of 2FB4 light, but it doesn't ! Indeed, taking a look at the tree used by Clustal (Fig. 16), we see that it aligns the profiles containing BS2 and 2FB4 light at a rather late stage, so that BS2's high similarity (not identity, due to whatever errors) with the subsequence VTISCSGTSSNIG SSTVNWYQQLPG in the 2FB4 light variable region (pos. 18-42) has been overshadowed during profile alignment.


Figure 16: Phylogenetic Tree used by Clustal.

Next, observe that BS5 and BS6 are aligned properly to 1FC1 and 2FB4 light, respectively. For BS5 that's OK (1FC1, as we've said in the beginning, is indeed the concatenation of the heavy chain's second and third constant region, and BS5 is a fragment from the second constant region; see also Fig. 15.) For BS6, this is a little miracle; after all, it aligns to the constant region of 2FB4 light, which is not in our collection of 8 immunoglobulin sequences !

Exercise [02] So which two constant regions have an identical (sub)-sequence ? If you're not sure, use your text editor, searching this text for even smaller fragments like "LVCL". Can you find a reason for this identity ?

BS7 and BS8 are obviously misaligned; you will find copies of them at the end of 7FAB heavy, and 1FC1, in the constant regions (these ends are cut away in the Clustal Alignment shown.)

The following 3 exercises are concerned with some problems you encounter when doing databank retrieval.

Exercise [05, opt.] Find sequence 7FAB light (variable region) in SwissProt and confirm that it's got NH again, just as in fragment BS1, pos. 150-151. (I'm not exactly sure about the difference between "IG LAMBDA CHAIN V-VI REGION (NIG-48)" and "IG LAMBDA CHAIN V-I REGION (NEWM)". The latter is the correct one. If you've got the paper handy, you will note that this sequence is exactly the one from the paper, whereas the one we obtained from PDBFINDER differs in 4 positions !) If you want to detect further problems, you can go on and retrieve what seem to be the SwissProt equivalents of the 2FB4 heavy chain variable region (BS3) and 7FAB heavy chain variable region (BS4). Now they're farther away; I guess this must have got something to do with the labels "V-III(KOL)" and "V-II(NEWM)" of the SwissProt sequences. Can an immunologist help out ?

Exercise [25, opt.] Try to find the 2FB4 light variable region in another databank, i.e. not in PDB/PDBFINDER. This seems to be a challenge, and I couldn't find it, not even using Blast/Fasta searches. If you find it, or know why it's not in SwissProt, the author will email you a beer !

Exercise [15, opt.] Try to find the 7FAB light constant region in another databank, i.e. not in PDB/PDBFINDER. Another challenge !

If you've done the last few exercises, you have got some justification to cite for your cutting point between the variable and constant regions of our sequences; equivalently you could have searched for the respective constant regions in SwissProt. This would give you the "exact" cutting points between the constant regions, too; BS7, BS5, and BS8 are in one SwissProt file (why ?), and the headers list the exact cut-points ! Or you can be as lazy as the author and use the cut-marks employed in the Barton & Sternberg paper, as follows. (For perfectionists, BS6 and BS7 got a few residues added, and BS8 got one residue deleted. Now they've got the same length as the ones in the original paper.)

>BS1, 7FAB light chain variable region
ASVLTQPPSVSGAPGQRVTISCTGSSSNIGAGHNVKWYQQLPGTAPKLLIFHNNARFSVSKSGTSATLAITGLQAEDEAD
YYCQSYDRSLRVFGGGTKLTVLR
>BS2, 2FB4 light chain variable region
QSVLTQPPSASGTPGQRVTISCSGTSSNIGSSTVNWYQQLPGMAPKLLIYRDAMRPSGVPDRFSGSKSGASASLAIGGLQ
SEDETDYYCAAWDVSLNAYVFGTGTKVTVLGQ
>BS3, 2FB4 heavy chain variable region
EVQLVQSGGGVVQPGRSLRLSCSSSGFIFSSYAMYWVRQAPGKGLEWVAIIWDDGSDQHYADSVKGRFTISRNDSKNTLF
LQMDSLRPEDTGVYFCARDGGHGFCSSASCFGPDYWGQGTPVTVSS
>BS4, 7FAB heavy chain variable region
AVQLEQSGPGLVRPSQTLSLTCTVSGTSFDDYYWTWVRQPPGRGLEWIGYVFYTGTTLLDPSLRGRVTMLVNTSKNQFSL
RLSSVTAADTAVYYCARNLIAGGIDVWGQGSLVTVSS
>BS5, 1FC1 heavy chain constant region 
PSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPQVKFNWYVDGVQVHNAKTKPREQQYNSTYRVVSVLTVLHQNWLDGK
EYKCKVSNKALPAPIEKTISKAKG
>BS6, 7FAB light chain constant region
QPKAAPSVTLFPPSSEELQANKATLVCLISDFYPGAVTVAWKADGSPVKAGVETTTPSKQSNNKYAASSYLSLTPEQWKS
HKSYSCQVTHEGSTVEKTVAPtscs
>BS7, 7FAB heavy chain constant region 
ASTKGPSVFPLAPTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHK
PSNTKVDKKVEPksa
>BS8, 1FC1 heavy chain constant region
QPREPQVYTLPPSREEMTKNQVSLTCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGN
VFSCSVMHEALHNHYTQKSLSL

3.4 Optimal, Heuristic, and Structurally Verified Alignments of the Immunoglobulin Sequences.

  [ ImmMsa ] Now we've finally retrieved and confirmed the data (see the listing above, i.e. at the end of the previous section) and can start aligning ! These data are not exactly the ones from the paper, due to whatever errors, but they're close enough so that we can "reproduce" a few results from the Barton & Sternberg paper. This "reproduction" will be qualitative, because we've mainly got Clustal at our disposal. Clustal aligns along a tree, whereas Barton & Sternberg add one sequence at a time to a "growing" profile.

Let's start with an MSA alignment, since we cannot do any better than optimal if the underlying cost model is appropriate. However, we cannot get an MSA alignment on the net at the Washington University MSA server, http://ibc.wustl.edu/msa.html; the process gets killed after some time, we're using up too many resources ! All we can get is the heuristic alignment calculated by MSA (see 3.6).

Do not overload the Washington University MSA server by trying the MSA alignment yourself ! This time, only use the form with the "optimal alignment" option set to "off". If you've got MSA 1.0 or the newly released MSA 2.1 at your computer, and nobody's watching, you can try it out. Even with MSA 2.1, the author's workstation couldn't finish the job. I guess a supercomputer is needed ?!

Exercise [10*] Nevertheless, the author has obtained an alignment from the MSA server that he believes is "optimal". Very simple trick, explained a little later ;-)

So, here's the optimal MSA alignment (well, sort-of...).

-----asVLTQPPsvsgapgqrvTISCTGsssnigag-hNVKWYqqlpgtapk--llifhnn----------arf
-----qsVLTQPPsasgtpgqrvTISCSGtssnigs--sTVNWYqqlpgmapk--lliyrda---mrpsgvpdrf
-----evQLVQSGggvvqpgrslRLSCSSsgfifss--yAMYWVrqapgkglewvaiiwddgsdqhyadsvkgrf
-----avQLEQSGpglvrpsqtlSLTCTVsgtsfdd--yYWTWVrqppgrglewigyvfytg-ttlldpslrgrv
-p--SVFLFPpkpkdtlmisrtpEVTCVVvdvshedpqvKFNWYvd--gvqvh--naKTKPR----------eqq
qpkaapSVTLFPpsseelqankaTLVCLIsdfypga--vTVAWKadg-spvka--GVETTtp----------skq
--------astkgpSVFPLAptaALGCLVkdyfpep--vTVSWNs---galts--GVHTFpa----------vlq
qpr-epQVYTLPpsreemtknqvSLTCLVkgfypsd--iAVEWEsn--gqpen--NYKTTpp----------vld
                            
SVSKSgTSAT--LAItglqaedeadYYC--QSYdr--------slr--VFGggtkltvlr-   
SGSKSgASAS--LAIgglqsedetdYYC--AAWdv--------slnayVFGtgtkvtvlgq
TISRNdskNTLFLQMdslrpedtgvYFCARDgghgfcssascfgpd--YWGqgtpvtvss-
TMLVNtskNQFSLRLssvtaadtavYYCARNliag--------gid--VWGqgslvtvss-
ynstyrVVSV--LTVlhqnwldgkeYKC--KVSnk--------alp--apIEKtiskakg-
snnkyaASSY--LSLtpeqwkshksYSC--QVThe--------gst----VEKtvaptscs
ssglysLSSV--VTV-pssslgtqtYIC--NVNhk--------psn--tkVDKkvepksa-
sdgsffLYSK--LTVdksrwqqgnvFSC--SVMhe--------alh--nhyTQKslsl---

We can easily recognize the correct alignment of the two Cysteine residues, and the Tryptophane. So this alignment is at least not completely off, i.e. it reproduces some features that a working immunologist easily recognizes. (Capitalized residues are part of the structurally verified alignment, see below.)

Exercise [05, opt.] Get a colorful visualisation of the alignment, by using the Weblogo server, http://www.bio.cam.ac.uk/cgi-bin/seqlogo/logo.cgi. For your convenience, the Fasta format of our alignment is available, see 3.7.

Although this does not do justice to Tom Schneider's "Sequence Logo" theory, we just note that the large characters in the Weblogo output denote the conserved residues.

[ ImmMsaEpsAll ] The "optimal" alignment is pretty much different from the heuristic one calculated by MSA before bogging down (see 3.6). Indeed, the polyhedron that needs to be explored is huge, as you can see from looking at the differences between the projected heuristic and the optimal pairwise alignments. (These differences give rise to the "compensation term" that is used to establish the Carrillo-Lipman bounds that in turn influence the polyhedron. See section 2.1 on the theory of the Carrillo-Lipman Bound). They are called "epsilon" in the following MSA 2.0 printout (this was printed out before the author's computer started the insurmountable task of exploring the polyhedron). "I" and "J" are, of course, the direction of the projection for which the difference is given.

----Estimated epsilons----
I = 1  J = 2  epsilon =  8
I = 1  J = 3  epsilon = 50
I = 1  J = 4  epsilon = 34
I = 1  J = 5  epsilon = 50
I = 1  J = 6  epsilon = 50
I = 1  J = 7  epsilon = 28
I = 1  J = 8  epsilon = 50
I = 2  J = 3  epsilon = 50
I = 2  J = 4  epsilon = 26
I = 2  J = 5  epsilon = 50
I = 2  J = 6  epsilon = 50
I = 2  J = 7  epsilon = 34
I = 2  J = 8  epsilon = 50
I = 3  J = 4  epsilon =  5
I = 3  J = 5  epsilon = 50
I = 3  J = 6  epsilon = 50
I = 3  J = 7  epsilon = 50
I = 3  J = 8  epsilon = 50
I = 4  J = 5  epsilon = 50
I = 4  J = 6  epsilon = 50
I = 4  J = 7  epsilon = 50
I = 4  J = 8  epsilon = 50
I = 5  J = 6  epsilon =  5
I = 5  J = 7  epsilon = 50
I = 5  J = 8  epsilon = 25
I = 6  J = 7  epsilon =  9
I = 6  J = 8  epsilon = 22
I = 7  J = 8  epsilon = 43
Exercise [05, opt.] What does epsilon = 5 mean ? 5 units of what ? How has this been standardized ? (Hint: See the next exercise.)

Epsilon = 50 is a threshold, larger values are just cut ! Therefore, it is possible that even if the computer were not bogged down, the full-size polyhedron would not have been explored, and the alignment would not necessarily have been optimal. The reason for all our trouble is now becoming clear: Our immunoglobulin sequences are too dissimilar to even suggest a heuristic alignment that is indeed close to the optimal one; "expert knowledge" at least about the Cys and Trp (W) residues is needed. We will soon see that the "optimal" MSA alignment (which the author obtained by cutting all sequences in two parts, and piecing the alignments together :-) is approximately as far away from the "biological truth" as MSA's heuristic one, and Clustal's (see below).

[ ImmMsaEpsPartial ] Although the MSA server does not inform you about the "epsilon"- values if the process gets killed, you can still get an idea of these values yourself, by submitting subsets. For example, aligning the constant regions only, the following information is returned (the alignment will be displayed and discussed below.)

Costfile:                   pam250
Alignment cost:   13103     Lower bound:   12933
Delta:              170     Max. Delta:      199

Sequences  Proj. Cost  Pair. Cost  Epsilon  Max. Epsi.  Weight  Weight*Cost
  1   2        1672        1670        2        19         1        1672
  1   3        1624        1622        2         5         1        1624
  1   4        1656        1656        0         8         2        3312
  2   3        1633        1586       47        41         2        3266
  2   4        1608        1592       16        27         1        1608
  3   4        1621        1565       56        50         1        1621
Elapsed time =   1.469

The quantity called epsilon in the last table is now called Max. Epsi. !

Exercise [05A] Given the information above, it's now easier to answer the question ``What does epsilon = 5 mean ?'' (see the last exercise).

Exercise [05*A] Which quantities do Lower bound, Proj. Cost, Pair. Cost, Delta and Max. Delta represent ? Hint: The Delta's have to do with the epsilons, just looking at their size.

Exercise [10A] Calculate the Carrillo-Lipman bound for pair (1,2), under the assumption that the difference between the costs of the projected heuristic and the pairwise optimal alignment for pair (3,4) is indeed 50 (i.e. that no cutting down to 50 took place). Pairs (1,4) and (2,3) have weight 2 !

[ ImmStructVerif ] Looking at the 3-dimensional structures of our protein domains, experts have derived so-called "structurally verified alignments" for parts of them (called "motifs" hereafter). The following are listed in the Barton & Sternberg paper; they correspond to the different -chains of the immunoglobulin domains, and will be taken as the "standard of truth".

  A       B       C      D       E       F      G

VLTQPP  TISCTG  NVKWY  SVSKS  TSATLAI  YYCQSY  VFG
VLTQPP  TISCSG  TVNWY  SGSKS  ASASLAI  YYCAAW  VFG
QLVQSG  RLSCSS  AMYWV  TISRN  NTLFLQM  YFCARD  YWG
QLEQSG  SLTCTV  YWTWV  TMLVN  NQFSLRL  YYCARN  VWG
SVFLFP  EVTCVV  KFNWY  KTKPR  VVSVLTV  YKCKVS  IEK
SVTLFP  TLVCLI  TVAWK  GVETT  ASSYLSL  YSCQVT  VEK
SVFPLA  ALGCLV  TVSWN  GVHTF  LSSVVTV  YICNVN  VDK
QVYTLP  SLTCLV  AVEWE  NYKTT  LYSKLTV  FSCSVM  TQK

Exercise [10*] Looking at the "optimal" MSA alignment, which -chains were aligned correctly ? How many residues were misaligned ? For the latter, count residues as misaligned if they don't align with the majority of residues that are following the column of a motif, and count all residues if the column got completely scrambled (i.e. if there are no two residues that are aligned according to the motif). In other words, for all of the 38 columns displayed above, look whether you can at least identify a relative majority of residues aligned in the same way, and count those residues that are not aligned to them.

Exercise [05] Take a look at MSA's heuristic alignment (see 3.6), and/or its Weblogo diagram. Compared to the "standard-of-truth" data, which seemingly conserved residue is just an artifact, i.e. the result of misalignments ?

[ ImmClustal ] Here is the Clustal alignment (again using the BCM Search Launcher, http://dot.imgen.bcm.tmc.edu:9331/multi-align/multi-align-vsns.html, 1995 default settings), for comparison. The motifs are given in capital letters, but it's nevertheless a good idea to print out the alignment, and put the beta-sheets into boxes in the same way as in the Barton & Sternberg paper (p.331).

-----asVLTQPPsv--sgapgqrvTISCTGsssnigag-hNVKWYqqlpg--tapkllifhnnar---------
-----qsVLTQPPsa--sgtpgqrvTISCSGtssnigs--sTVNWYqqlpg--mapklliyrdamrpsgvpdr--
-----evQLVQSGgg--vvqpgrslRLSCS-Ssgfifss-yAMYWVrqapgkglewvaiiwddgsdqhyadsvkg
-----avQLEQSGpg--lvrpsqtlSLTCT-Vsgtsfdd-yYWTWVrqppgrglewigyvfytg-ttlldpslrg
-----pSVFLFPpkpkdtlmisrtpEVTCVVvdvshedpqvKFNWYvd--g----vqvhnaKTKPReqq------
qpkaapSVTLFPpss--eelqankaTLVCL-Isdfypga-vTVAWKad--g---spvkaGVETTtpsk-------
-------astkgpS---VFPLAptaALGCL-Vkdyfpep-vTVSWNsg-------altsGVHTFpavlq------
qpre-pQVYTLPpsr--eemtknqvSLTCL-Vkgfypsd-iAVEWEsn--g----qpenNYKTTppvld------
-fSVSK--SgTSATLAItglqaedeadYYCQ--------SYdrslr--VFGggtkltvlr-
-fSGSK--SgASASLAIgglqsedetdYYCA--------AWdvslnayVFGtgtkvtvlgq
rfTISRNdskNTLFLQMdslrpedtgvYFCARDgghgfcssascfgpdYWGqgtpvtvss-
rvTMLVNtskNQFSLRLssvtaadtavYYCARN--------liaggidVWGqgslvtvss-
--ynst--yrVVSVLTVlhqnwldgkeYKCK----------VSnkalpapIEKtiskakg-
-qsnnk--yaASSYLSLtpeqwkshksYSCQ----------VThegstVEKtvaptscs--
---ssg--lysLSSVVTVpssslgtqtYICN----------VNhkpsntkVDKkvepksa-
--sdgs--ffLYSKLTVdksrwqqgnvFSCS----------VMhea--lhnhyTQKslsl-

Exercise [05, opt.] Obtain the Clustal alignment from the WWW.

Exercise [05A] In the Clustal alignment, which motifs were aligned correctly ? How many residues were misaligned ? (See Exercise 53.)

Exercise [10, opt.] Find out about the tree along which Clustal did the alignment. (Unfortunately, the WWW Forms I know do not return a picture of the tree along which Clustal aligned; you need to interpret or convert the text description of the tree returned by the Washington University server, unless you have Clustal/Phylip on your computer. Alternatively, you may look at the tree from ETH Zurich's All-All service, http://cbrg.inf.ethz.ch/subsection3_1_1.html, i.e. Fig. 13. Topologically, it's the same as the Clustal tree.)

[ ImmMsaPartial ] Let us now align variable and constant regions alone;

asVLTQPPsvsgapgqrvTISCTGsssnigaghNVKWYqqlpgtapkll--ifhnn----------arfSVSKSg
qsVLTQPPsasgtpgqrvTISCSGtssnig-ssTVNWYqqlpgmapkll--iyrda---mrpsgvpdrfSGSKSg
evQLVQSGggvvqpgrslRLSCSSsgfifs-syAMYWVrqapgkglewvaiiwddgsdqhyadsvkgrfTISRNd
avQLEQSGpglvrpsqtlSLTCTVsgtsfd-dyYWTWVrqppgrglewigyvfytg-ttlldpslrgrvTMLVNt
                            
TSAT--LAItglqaedeadYYCQS--------Ydrslr--VFGggtkltvlr-
ASAS--LAIgglqsedetdYYCAA--------WdvslnayVFGtgtkvtvlgq
skNTLFLQMdslrpedtgvYFCARDgghgfcssascfgpdYWGqgtpvtvss-
skNQFSLRLssvtaadtavYYCAR--------NliaggidVWGqgslvtvss-

is the optimal MSA alignment of the variable regions, and

-----pSVFLFPpkpkdtlmisrtpEVTCVVvdvshedpqvKFNWYvdgvqv-hnaKTKPReqqynstyrVVSVL
qpkaapSVTLFPpssee--lqankaTLVCLIsdfypga--vTVAWKadgspvkaGVETTtpskqsnnkyaASSYL
----------astkgpSVFPLAptaALGCLVkdyfpep--vTVSWNsgalt--sGVHTFpavlqssglysLSSVV
qpre-pQVYTLPpsree--mtknqvSLTCLVkgfypsd--iAVEWEsngqpe-nNYKTTppvldsdgsffLYSKL
                            
TVlhqnwldgkeYKCKVSnkalpapIEKtiskakg-
SLtpeqwkshksYSCQVTheg--stVEKtvaptscs
TV-pssslgtqtYICNVNhkpsntkVDKkvepksa-
TVdksrwqqgnvFSCSVMhealhnhyTQKslsl---

is the optimal MSA alignment of the constant regions. Calculating the accuracy for these 2 alignments separately, we count 9 misaligned residues in the variable regions, and 14 in the constant regions, from a total of 152 each. Modifying the accuracy scores of the 8-sequence-alignments (see exercise 53), counting only misaligned amino acids within one group (either constant, or variable), we obtain 14 and 20 errors, respectively. These alignments were made with the help of the other group of sequences, and in fact multiple alignment deteriorates accuracy scores ! Barton & Sternberg perform a more detailed analysis, comparing the scores of all pairwise alignments within one group (without taking the other sequences into consideration) with the accuracy obtained from the 8-sequence alignment, and observe the same deterioration.

[ ImmClustalPartial ] The same phenomenon can be observed using Clustal alignments, viz.

asVLTQPPsvsgapgqrvTISCTGsssnigaghNVKWYqqlpg--tapkllifhnnar----------fSVSK--
qsVLTQPPsasgtpgqrvTISCSGtssnigs-sTVNWYqqlpg--mapklliyrdamrpsgvpdr---fSGSK--
evQLVQSGggvvqpgrslRLSCS-SsgfifssyAMYWVrqapgkglewvaiiwddgsdqhyadsvkgrfTISRNd
avQLEQSGpglvrpsqtlSLTCT-VsgtsfddyYWTWVrqppgrglewigyvfytg-ttlldpslrgrvTMLVNt
                            
SgTSATLAItglqaedeadYYCQSY--------drslr--VFGggtkltvlr-
SgASASLAIgglqsedetdYYCAAW--------dvslnayVFGtgtkvtvlgq
skNTLFLQMdslrpedtgvYFCARDgghgfcssascfgpdYWGqgtpvtvss-
skNQFSLRLssvtaadtavYYCARN--------liaggidVWGqgslvtvss-

is the Clustal alignment of the variable regions, and

-----pSVFLFPpkpkdtlmisrtpEVTCVVvdvshedpqvKFNWYvdgvqvhn-aKTKPReqqynstyrVVSVL
qpkaapSVTLFPpsse--elqankaTLVCLIsdfypg--avTVAWKadgspvkaGVETTtpskqsnnkyaASSYL
astkgpSVFPLApt----------aALGCLVkdyfpe--pvTVSWN-sgaltsG-VHTFpavlqssglysLSSVV
qpre-pQVYTLPpsre--emtknqvSLTCLVkgfyps--diAVEWEsngqpenN-YKTTppvldsdgsffLYSKL
                            
TVlhqnwldgkeYKCKVSnkalpapIEKtiskakg-
SLtpeqwkshksYSCQVTheg--stVEKtvaptscs
TVpssslgt-qtYICNVNhkpsntkVDKkvepksa-
TVdksrwqqgnvFSCSVMhea--lhnhyTQKslsl-

is the Clustal alignment of the constant regions. Calculating accuracy for this case, we observe only 4 misaligned residues in the variable regions, and only 9 misaligned residues in the constant regions.

[ ImmQualityAndRelatedness ] If multiple alignment gives us worse results, why bother with it ? As the previous examples show, distant sequences can have a malign influence on the alignment of more related sequences, but we are hopeful that by adding related sequences, we can improve the alignment of distant sequences.

Indeed, the Clustal alignment of BS3 and BS8 is as follows,

evQLVQSGggvvqpgr------slRLSCSSsgfifssyAMYWVr-qapgkglewvaiiwd-dgsdqhyadsvkgr
--qprepQVYTLPpsreemtknqvSLTCLVkgfypsdiAVEWEsngqpenNYKTTppvldsdgs----------f
                            
fTISRNdskNTLFLQMdslrpedtgvYFCARDgghgfcssascfgpdYWGqgtpvtvss
fLYSK--------LTVdksr---------wqqgnvFSCSVMhealhnhyTQKslsl---

It contains 24 misaligned residues (out of 38), and it's obvious that adding related sequences here improves the alignment significantly. Using their own alignment method, Barton & Sternberg perform all pairwise alignments, one variable aligned to one constant, and note that adding the remaining 6 sequences and aligning multiply improves accuracy from 41 to 63 percent, on average.

Exercise [15, opt.] Use Geoffrey Barton's AMAS utility, http://geoff.biop.ox.ac.uk/servers/amas_server.html, to analyse the multiple alignments from this section. Start with the "optimal" MSA alignment we pieced together. AMAS will give you an idea of the physical properties that are conserved at various positions. Can you find residues with hydrophobic properties at separated by unconserved or hydrophilic residues at ? Such a pattern is typical for a surface strand. AMAS currently accepts FASTA format, provided that you add the character "*" to the end of each sequence, like this:

>BS1, 7FAB light chain variable region
-----ASVLTQPPSVSGAPGQRVTISCTGSSSNIGAG-HNVKWYQQLPGTAPK--LLIFHNN----------ARF
SVSKSGTSAT--LAITGLQAEDEADYYC--QSYDR--------SLR--VFGGGTKLTVLR-*
>BS2, 2FB4 light chain variable region
-----QSVLTQPPSASGTPGQRVTISCSGTSSNIGS--STVNWYQQLPGMAPK--LLIYRDA---MRPSGVPDRF
SGSKSGASAS--LAIGGLQSEDETDYYC--AAWDV--------SLNAYVFGTGTKVTVLGQ*
[...]

3.5 Some Bibliographic Hints.

Review papers with an emphasis on heuristic multiple alignment are [CWC92] and [MVF94], the latter comparing the results of various implementations on 4 standard datasets. ClustalW is described in [THG94]. For MSA references, see the theory part of this chapter. A general survey on the sequence analysis of immunoglobulins is given in [Wil87]. Some papers dealing with the alignment of immunological sequences are [Tay86], [BaS87] (of course!), and [ViA91].

3.6 Appendix 1. Another Heuristic Alignment of the Immunoglobulin Sequences.

 

Here's the heuristic alignment that is calculated by the MSA preprocessing; the author is currently looking for some exact documentation. [LAK89] write about the MSA 1.0 implementation, that they use "a progressive alignment strategy similar to those described by Waterman and Perlwitz [WaP84], Feng and Doolittle [FeD87] and Taylor [Tay87]". "Progressive alignment" obviously refers to the "Once a gap, always a gap" rule mentioned above. However, the MSA 2.0 paper [GKS95] offers a different description ?!

ASVLTQPPSVSGAPG--------QRVTISCTGSSSNIGAGHNV--KWYQQLPGTAPK---LLIFHNN--------
QSVLTQPPSASGTPG--------QRVTISCSGTSSNIGSS-TV--NWYQQLPGMAPK---LLIYRDAM--RPSGV
EVQLVQSGGGVVQPG--------RSLRLSCSSSGFIFSSY-AM--YWVRQAPGKGLEWVAIIWDDGSDQHYADSV
AVQLEQSGPGLVRPS--------QTLSLTCTVSGTSFDDY-YW--TWVRQPPGRGLEWIGYVFYTGTT-LLDPSL
------PSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDP-QVKFNWYVDGVQVHNA----KTKPREQ-------
-QPKAAPSVTLFPPSSEE--LQANKATLVCLISDFYPGAV-TV--AWKADGSPVKAG---VETTTPSK-------
-ASTKGPSVFPLAPT----------AALGCLVKDYFPEPV-TV--SW--NSGALTSG---VHTFPAVL-------
--QPREPQVYTLPPSREE--MTKNQVSLTCLVKGFYPSDI-AV--EW-ESNGQPENN---YKTTPPVL-------
-ARFS--VSKSGTSATLAITGLQAEDEADYYCQSYDRSL--------R--VFGGGTKLTVLR--
PDRFS--GSKSGASASLAIGGLQSEDETDYYCAAWDVSL--------NAYVFGTGTKVTVLGQ-
KGRFTISRNDSKNTLFLQMDSLRPEDTGVYFCARDGGHGFCSSASCFGPDYWGQGTPVTVSS--
RGRVTMLVNTSKNQFSLRLSSVTAADTAVYYCARNLIAG--------GIDVWGQGSLVTVSS--
-QYNS--TYRVVSVLTVLHQNWLDGK--EYKCKVSNKAL--------P---APIEKTISKAKG-
-QSNN--KYAASSYLSLTPEQWKSHK--SYSCQVTHEG-------------STVEKTVAPTSCS
-QSSG--LYSLSSVVTVPSSSLGTQ---TYICNVNHKPS--------N---TKVDKKVEPKSA-
-DSDG--SFFLYSKLTVDKSRWQQGN--VFSCSVMHEAL--------H---NHYTQKSLSL---

3.7 Appendix 2. "Optimal" MSA-Alignment in Fasta-Format.

  For your convenience in solving some of the exercises, the following is the "Optimal" MSA-Alignment, in FASTA-Format.

>BS1, 7FAB light chain variable region
-----ASVLTQPPSVSGAPGQRVTISCTGSSSNIGAG-HNVKWYQQLPGTAPK--LLIFHNN----------ARF
SVSKSGTSAT--LAITGLQAEDEADYYC--QSYDR--------SLR--VFGGGTKLTVLR-
>BS2, 2FB4 light chain variable region
-----QSVLTQPPSASGTPGQRVTISCSGTSSNIGS--STVNWYQQLPGMAPK--LLIYRDA---MRPSGVPDRF
SGSKSGASAS--LAIGGLQSEDETDYYC--AAWDV--------SLNAYVFGTGTKVTVLGQ
>BS3, 2FB4 heavy chain variable region
-----EVQLVQSGGGVVQPGRSLRLSCSSSGFIFSS--YAMYWVRQAPGKGLEWVAIIWDDGSDQHYADSVKGRF
TISRNDSKNTLFLQMDSLRPEDTGVYFCARDGGHGFCSSASCFGPD--YWGQGTPVTVSS-
>BS4, 7FAB heavy chain variable region
-----AVQLEQSGPGLVRPSQTLSLTCTVSGTSFDD--YYWTWVRQPPGRGLEWIGYVFYTG-TTLLDPSLRGRV
TMLVNTSKNQFSLRLSSVTAADTAVYYCARNLIAG--------GID--VWGQGSLVTVSS-
>BS5, 1FC1 heavy chain constant region 
-P--SVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPQVKFNWYVD--GVQVH--NAKTKPR----------EQQ
YNSTYRVVSV--LTVLHQNWLDGKEYKC--KVSNK--------ALP--APIEKTISKAKG-
>BS6, 7FAB light chain constant region
QPKAAPSVTLFPPSSEELQANKATLVCLISDFYPGA--VTVAWKADG-SPVKA--GVETTTP----------SKQ
SNNKYAASSY--LSLTPEQWKSHKSYSC--QVTHE--------GST----VEKTVAPTSCS
>BS7, 7FAB heavy chain constant region 
--------ASTKGPSVFPLAPTAALGCLVKDYFPEP--VTVSWNSG--GALTS--GVHTFPA----------VLQ
SSGLYSLSSV--VTV-PSSSLGTQTYIC--NVNHK--------PSN--TKVDKKVEPKSA-
>BS8, 1FC1 heavy chain constant region
QPR-EPQVYTLPPSREEMTKNQVSLTCLVKGFYPSD--IAVEWESN--GQPEN--NYKTTPP----------VLD
SDGSFFLYSK--LTVDKSRWQQGNVFSC--SVMHE--------ALH--NHYTQKSLSL---

3.8 Acknowledgement.

This work was supported by the Association for the Promotion of Science and Humanities in Germany (Stifterverband für die Deutsche Wissenschaft). Peter Serocka from the Visualization Laboratory of the Research Center for Studies on Structure Formation has been very helpful with the preparation of several figures. Mandy Caird of the University of Colorado, and Chris Kiesewetter have offered invaluable technical assistance. I'd like to thank Hershel Safer, Rolf Engstrand, Gerard Pujadas, Robert Giegerich, Geoff Barton, Rebecca Parsons, Andrea Schafferhans, Peter Hjelmstrom, Jotun Hein, Jürgen Frey, Wolfram Altenhofen, Fredj Tekaia, Christian Büschking [more to follow] for valuable comments on the manuscript.

Back to VSNS BioComputing Division Home Page.
VSNS-BCD Copyright 1995, 1996.
Georg Fuellen



next up previous contents
Next: References Up: A Gentle Guide to Previous: 2 The Carrillo-Lipman Method




Fri Jul 26 16:26:10 MET DST 1996
Valid HTML 2.0!