In this section we will introduce additional fitness criteria for the protein folding application with genetic algorithms. The rationale behind is that a more of information about genuine protein conformations should improve the fitness function to guide the genetic algorithm towards native-like conformations. Some properties of protein conformations can be used as additional fitness components whereas others can be incorporated into genetic operators (e.g. constraints from the Ramachandran plot). For such an extended fitness function several incommensurable quantities will have to be combined: energy, preferred torsion angles, secondary structure propensities or distributions of polar and hydrophobic residues. This creates the problem of how to combine the different fitness contributions to arrive at the total fitness of a single individual. Simple summation of different components has the disadvantage that components with larger numbers would dominate the fitness function whether or not they are important or of any significance at all for a particular conformation. To cope with this difficulty individual weights for each of the components could be introduced. But this creates another problem. How should one determine useful values for these weights? As there is no general theory known for the proper weighting of each fitness component the only way is to try different combinations of values and evaluate them by their performance of a genetic algorithm on test proteins with known conformations. However, even for a small number of fitness components a large number of combinations of weights arises which requires as many test runs for evaluation. Also, „expensive“ fitness components as the van der Waals energy need considerable computation time. In this work [33] two measures were taken to deal with this situation:
In this application two versions of a fitness function are used. One version is a scalar fitness function that calculates the r.m.s.-deviation of a newly generated individual from the known conformation of the test protein. This geometric measure should guide the genetic algorithm directly to the desired solution but it is only available for proteins with a known conformation. R.m.s.-deviation is calculated as follows:

Here i is the index over all corresponding N atoms in the two structures to be compared, in this case the conformation of an individual (u
) in the current population and the known, actual structure (v
) of the test protein. The squares of the distances between the vectors u
and v
of corresponding atoms are summed and the square root is taken. The result is a measure of how much each atom in the individual deviates on average from its true position. R.m.s. values of 0 - 3 signify strong structural similarity; values of 4 - 6 Ĺ denote weak structural similarity whereas for small proteins r.m.s.-values over 6 Ĺ mean that probably not even the backbone folding pattern is similar in both conformations.
The other version of the fitness function is a vector of several fitness components which will be explained in the following paragraphs. The vector fitness function is in the following way used to determine the candidates for the next generation. If there is an individual that has better (i.e. lower) values in each fitness component, then we take it. If there is then another individual with the same property over the remaining set of individuals then take it as well, and so on until no unambiguously better individuals are found. Then remove the worst individuals, i.e. those with higher values in each fitness component than any other individual. The remaining set of individuals is heuristically reduced until the exact number of individuals for the next generation is reached. This is done by iteratively removing an individual with the worst fitness value in a randomly selected fitness component. This multi-value vector fitness function includes the following components:

R.m.s. is the r.m.s.-deviation as described above. It can only be calculated in test runs with the protein conformation known beforehand. For the multi-value vector fitness function this measure was calculated for each individual to see how close the genetic algorithm came to the known structure. In these runs, however, the r.m.s.-measure was not used in the offspring selection process. Selection was done only based on the remaining eight fitness components and a Pareto selection algorithm which will be explained shortly.
is the torsion energy of a conformation based on the force field data of the Charmm force field v.21 with k and n as force field constants depending on the type of atom and
as the torsion angle:
is the van der Waals energy (also called Lennard-Jones potential) with A and B as force field constants depending on the type of atom and r as the distance between two atoms in one molecule. The indices i and j for the two atoms may not have identical values and each pair is counted only once:

is the electrostatic energy between two atoms with
as the partial charges of the two atoms i and j and r as the distance between them:

is a measure to promote compact folding patterns. The expected diameter of a protein can be estimated by a number of techniques. A penalty energy term is then calculated as follows:
Polar is a measure that favours polar residues on the protein surface but not in the core. Because all fitness contributions should be minimised a factor of minus one is required before the sum. The larger the distances of polar residues to the centre of the protein, the better a conformation and the more negative the value of polar. If residue i is one of k polar residues (any of Arg, Lys, Asn, Asp, Glu, or Gln) in a protein of length N residues and with s as is the centre of gravity, then the polar fitness contribution of residue i at position u is calculated as follows:

Hydro is a similar measure that favours hydrophobic residues (Ala, Val, Ile, Leu, Phe, Pro, Trp) in the core of a protein, whereas scatter promotes compact folds as it adds up the distances over all C
atoms irrespective of amino acid type:

Solvent is the solvent accessible surface of a conformation in [2]. It is calculated by a surface triangulation method.
Crippen is an empirical, statistical potential developed by G. Crippen [34]. It is summed over all pairs of atoms that interact within a certain distance.
Clash is a term that counts the number of atomic collisions where any two atoms come closer than 3.8Ĺ to each other. This fitness term can be used to approximate the effect of the van der Waals energy at small distances but at only a fraction of the computational cost:
