Dr. Steffen Schulze-Kremer, 1995.
steffen@chemie.fu-berlin.de
Further instructions for obtaining the Solution Sheet are available by sending email to majordomo@lists.uni-bielefeld.de, with no subject line, and the following message body: subscribe vsns-bcd-solutions
EXERCISES (Submit these to your instructor !).
a) Use your knowledge in Networking to find the "International Interactive Genetic Art" WWW pages, and play around with them ! Tell your instructor about the URL.
b) What can be done to make a genetic algorithm run faster ?
c) Which schemata does the bitstring 000 contain ?
d) In Table 2, the "Expected Count" of schema H2 is quoted as 1.83. How does this value derive from Table 1 ?
e) Describe at least four different representation formalisms for a protein folding application with a genetic algorithm! What are the strengths and weaknesses of each?
f) Extra credit: What fitness components would you expect to guide the genetic algorithm towards native-like conformations and why?
g) i) Which genetic operators were used in the protein folding application in this chapter? Explain each of them briefly in your own words. ii) What is their effect on the search behaviour of the genetic algorithm?
h) Extra credit: Can you think of other genetic operators? Why would they be beneficial in the context of the protein folding application?
TEXT COMPREHENSION.
Section 1.1: Genetic Algorithm
1. How does, in principle, the genetic algorithm work?
2. What is the central statement of the schemata theorem by J. H. Holland?
3. What are the main differences between genetic algorithms and evolution strategies?
4. What are "genetic operators" in the context of genetic algorithms?
5. What is the meaning of a "fitness function" in the context of genetic algorithms?
6. What does "generation replacement" refer to in the context of genetic algorithms and which are the three most widely used modes of generation replacement?
7. What is the mathematical argument for genetic algorithms to converge to a solution?
8. Are genetic algorithms function optimisers? If not, what do they actually achieve?
Section 1.2.1.1: Protein Folding Application
9. What are the objectives of the protein folding application and the side chain placement application in this chapter?
10. Explain "representation formalism" and its relevance for a genetic algorithm application!
11. What does "dynamic parameterization" refer to, how does it work and why is it beneficial to the search performance of a GA?
Sections 1.2.1.2, 1.2.2.1: Fitness Function
12. What are the requirements for a suitable fitness function for the protein folding application? How can these requirements be met in reality?
13. What is the particular use of the r.m.s.-fitness function?
14. List all fitness function components in this application! What is the effect of each on the search behaviour of the genetic algorithm (in theory and in practice)?
15. Compare and contrast a scalar fitness function and a vector fitness function in the context of the protein folding application in this chapter! What problems arise with a scalar fitness function of multiple terms?
Section 1.2.2.3: Results
16. Evaluate the results of the protein folding application (ab initio and side chain placement) in this chapter with respect to the prediction performance. How would you rate the search performance of the genetic algorithm?
17. What are the main problems limiting the success of this application and how can they be overcome?
OPTIONAL Programming Exercises
1. Implement a simple genetic algorithm (GA) that solves the problem of optimising the function y=f(x) with f(x)=x*x. Let your GA work on bitstrings as the original algorithm by J. H. Holland did.
2. Implement a conversion program to transform a protein represented in Cartesian coordinates into a torsion angle representation with constant binding geometry.
3. Implement a conversion program to transform a protein represented by torsion angles with constant binding geometry into Cartesian coordinates.
4. Implement a fitness function on simple 2-dimensional proteins. 2-Dimensional proteins consist of only two types of "residues" (e.g. hydrophobic and hydrophilic) and are laid out on a planar, rectangular grid. Only if two hydrophobic residues (which must not be connected by a bond) are direct orthogonal neighbours, the energy (i.e. the fitness to be minimised) is decremented by -1. Be aware of invalid conformations where more than one atom is found on any grid point. Any such collision results in a penalty energy of +100 units.