The purpose of this thread is to discuss recent findings on the optimality of the genetic code.
Selected articles in the OP include the following:
This conclusion might be misleading though (addressed here), as the paper states that the tested codes were from a biosynthetically restricted set based on the current hypothesis of the evolution of the genetic code from pre-biotic scenarios. When not viewed from this point of view, other, more optimized codes are possible.
A) The actual code is far better than other possible codes in minimizing the number of amino acids incorporated until translation is interrupted after a frameshift error occurred.
B) The code is highly optimal for encoding arbitrary additional information, i.e., information other than the amino acid sequence in protein-coding sequences.
Thus, two more features for which the code is close to being optimal. What is interesting about these two optimal features is that they may facilitate evolution i.e. the code is primed for the future by being optimal in allowing future incorporation of additional information.
A few interesting observations can be made:
Firstly, from the article.
Thus an interesting question can be applied to an "evolving" code as posited in the above quote:
Are these "still evolving" mRNAs, still evolving? Or did it hit an inevitable global optimum?
Secondly, from the article:
Abstract:
Thus, an important point is raised:
From the article:
Also form the article:
An intriguing question arises from this research. It is easy to imagine these to arise through chance and selection (e.g. amino acids with photoaffinity) and then be incorporated into the standard code. Yet the code seems to remain stagnant. For billions of year after fixation, little evolution happened in the code. Why?
Did it arrive at a global optimum in a pre-existing fitness landscape, with a pre-existing fitness function?
Finally, Article 11:
The genetic code sure is interesting. Whatever the explanation for the origins of the code, whether intentional agency, only RV+NS, self-organization or a combination of these, the fact that these processes converged on a single, reasonably optimal code that is able to facilitate evolution makes it look like it was an inevitable result from the system. The system seems to be rigged and biased towards certain outcomes similar to the evolution of life.
Selected articles in the OP include the following:
- Early Fixation of an Optimal Genetic Code
- Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape
- The genetic code is nearly optimal for allowing additional information within protein-coding sequences
- An extension of the coevolution theory of the origin of the genetic code
- Can the genetic code be mathematically described?
- On the Hypercube Structure of the Genetic Code
- Topological structure of the triplet genetic code
- A Neutral Origin for Error Minimization in the Genetic Code.
- Does codon bias have an evolutionary origin?
- A chemical toolkit for proteins — an expanded genetic code
- Evolution and multilevel optimization of the genetic code
Article 1
Thus, to begin, in the first article it was determined by the researchers that:No better codes out of a million biosynthetically restricted codes.The Best of All Possible Codes?
When the error value of the standard code is compared with the lowest error value of any code found in an extensive search of parameter space, results are somewhat more variable. Estimates based on PAM data for the restricted set of codes indicate that the canonical code achieves between 96% and 100% optimization relative to the best possible code configuration (fig. 2c ). If our definition of biosynthetic restrictions are a good approximation of the possible variation from which the canonical code emerged, then it appears at or very close to a global optimum for error minimization: the best of all possible codes.
This conclusion might be misleading though (addressed here), as the paper states that the tested codes were from a biosynthetically restricted set based on the current hypothesis of the evolution of the genetic code from pre-biotic scenarios. When not viewed from this point of view, other, more optimized codes are possible.
The next article (nr 2) shows that:
Thus showing in that analysis which include all possible codes (not only biosynthetically restricted codes) that the genetic code is partially optimal with regards to error minimization. It should be noted though that analysis only included a subset of the possible optimality feature of the code.Thus, the standard genetic code appears to be a point on an evolutionary trajectory from a random point (code) about half the way to the summit of the local peak. The fitness landscape of code evolution appears to be extremely rugged, containing numerous peaks with a broad distribution of heights, and the standard code is relatively unremarkable, being located on the slope of a moderate-height peak.
From article 3
The analysis above did not include other nearly optimal features of the genetic code including:A) The actual code is far better than other possible codes in minimizing the number of amino acids incorporated until translation is interrupted after a frameshift error occurred.
B) The code is highly optimal for encoding arbitrary additional information, i.e., information other than the amino acid sequence in protein-coding sequences.
Thus, two more features for which the code is close to being optimal. What is interesting about these two optimal features is that they may facilitate evolution i.e. the code is primed for the future by being optimal in allowing future incorporation of additional information.
In article nr.4
The coevolution theory of the origin of the genetic code is discussed. The theory suggests that the genetic code is an imprint of the biosynthetic (biosynthetically restricted) relationships between amino acids.A few interesting observations can be made:
Firstly, from the article.
It should be noted that other exotic amino acids are also used by a few other codes (derived form the original). E.g. Selenocysteine and pyrrolysine are encoded for in many archaea and vertebrates. Archaea, however seem to be the most primitive organisms, thus these encoded amino acids must have been fixated early on.As will become clear in the following, I maintain that these amino acid-pre-tRNAs came directly from the biosynthetic pathways of the first six amino acids evolving along the biosynthetic pathways of energetic metabolism and that they were the first amino acids to be codified on these still evolving mRNAs.
Thus an interesting question can be applied to an "evolving" code as posited in the above quote:
Are these "still evolving" mRNAs, still evolving? Or did it hit an inevitable global optimum?
Secondly, from the article:
Is it correct to assume that in the presence of the precursors of the standard genetic code (e.g. intermediates of glucose degradation and the citric acid cycle), the intimate relationship between these molecules resulted in the inevitable organization of the genetic code (global optimum of the system)?While Wong [9] highlighted the precursor-product relationships between amino acids and their crucial role in defining the organisation of the genetic code, Miseta [10] clearly identified that the non-amino acid molecules that were precursors of amino acids might have been able to play an important role in organising the genetic code. Miseta [10] suggested the idea of an intimate relationship between molecules, the intermediates of glucose degradation, as precursors of precursor amino acids, and the organisation of the genetic code. This observation is also analysed by Taylor and Coates [11] who showed the relationship between the glycolytic pathway, the citric acid cycle, the biosyntheses of amino acids and the genetic code (Fig. 1) and, in particular, they point out that (i) all the amino acids that are members of a biosynthetic family tend to have codons with the same first base (Fig. 1) and (ii) that the five amino acids codified by GNN codons are found in four biosynthetic pathways close to or at the beginning of the pathway head (Fig. 1)[11]. More recently, Davis [12,13] has provided evidence that tRNAs descending from a common ancestor were adaptors of amino acids synthesised by a common precursor and he also discusses the biosynthetic families of amino acids, suggesting their importance in genetic code origin.
Articles 5-7
These articles discuss fascinating mathematical representation of the genetic code. For example, in article 6 a representation of the genetic code as a six–dimensional Boolean hypercube is proposed. Abstract:
It is assumed here that this structure is the result of the hierarchical order of the interaction energies of the bases in codon–anticodon recognition. The proposed structure demonstrates that in the genetic code there is a balance between conservatism and innovation. Comparing aligned positions in homologous protein sequences two different behaviors are found:
a)There are sites in which the different amino acids present may be explained by one or two “attractor nodes” (coding for the dominating amino acid(s)) and their one–bit neighbors in the codon hypercube, and
b) There are sites in which the amino acids present correspond to codons located in closed paths in the hypercube. The structure of the code facilitates evolution: the variation found at the variable positions of proteins do not corresponds to random jumps at the codon level, but to well defined regions of the hypercube.
Article 8:
In this article it once again discusses the optimality of the code and a few fascinating conclusions were made. For example:Also form the article:The genetic code has the remarkable property of error minimization, whereby the arrangement of amino acids to codons is highly efficient at reducing the deleterious effects of random point mutations and transcriptional and translational errors. Whether this property has been explicitly selected for is unclear. Here, three scenarios of genetic code evolution are examined, and their effects on error minimization assessed. First, a simple model of random stepwise addition of physicochemically similar amino acids to the code is demonstrated to result in substantial error minimization. Second, a model of random addition of physicochemically similar amino acids in a codon expansion scheme derived from the Ambiguity Reduction Model results in improved error minimization over the first model. Finally, a recently introduced 213 Model of genetic code evolution is examined by the random addition of physicochemically similar amino acids to a primordial core of four amino acids. Under certain conditions, 22% of the resulting codes produced according to the latter model possess equivalent or superior error minimization to the standard genetic code. These analyses demonstrate that a substantial proportion of error minimization is likely to have arisen neutrally, simply as a consequence of code expansion, facilitated by duplication of the genes encoding adaptor molecules and charging enzymes. This implies that selection is at best only partly responsible for the property of error minimization. These results caution against assuming that selection is responsible for every beneficial trait observed in living organisms.
The SGC (Standard Genetic Code) has an EM (Error Minimization) value (see Methods for calculation) of 60.7. Ten thousand random codes have an average EM value of 74.5, and only 0.03% of these have equal or greater optimality than the SGC. These calculations once again illustrate the remarkable ‘optimization’ of the genetic code for EM.
Thus, an important point is raised:
The article cautions on blithely using natural selection as an explanation for the features of the genetic code.The point should be made that explicit selection for EM seems to necessitate both the occurrence of codon reassignments and group selection to generate and select alternate codes. The proposal that explicit selection for the EM did not occur, and that EM arose neutrally from the addition of similar amino acids to similar codons, may be termed the ‘Nonadaptive Code’ Hypothesis, in contrast to the Adaptive Code Hypothesis. Finally, on a fundamental level, as a result of the analyses presented here, the presence of EM in the SGC may be used as evidence that enzymes, whether partially proteinaceous, RNA based, or based on some other macromolecule, were already extant during the evolution of the SGC.
Article 9:
In this article, the functional integrity and how the architecture of the code relates to it is discussed.From the article:
Thus, the properties of the code allow it to maintain its own functional integrity.The results put the concept of "codon bias" into a novel perspective. The internal connectivity of codons indicates that all synonymous codons might be integrated parts of the Genetic Code with equal importance in maintaining its functional integrity.
Also form the article:
The maintenance of the integrity of the code is not dependent on selection, but dependent on internal variables (feedback system) for maintaining functional integrity. Again, showing another form of optimality.The cumulative Codon Usage Frequency of any codon is strongly dependent on the cumulative Codon Usage Frequency of other codons belonging to the same species. The rules of this codon dependency are the same for all species and reflect WC base pair complementarity. This internal connectivity of codons indicates that all synonymous codons are integrated parts of the Genetic Code with equal importance in maintaining its functional integrity. The so-called codon bias is a bias caused by the protein-centric view of the genome.
In article 10:
Fascinating research was conducted whereby a sundry of different unnatural amino acids with novel three and four base codons have been selectively incorporated (engineered) into proteins yielding viable organisms.An intriguing question arises from this research. It is easy to imagine these to arise through chance and selection (e.g. amino acids with photoaffinity) and then be incorporated into the standard code. Yet the code seems to remain stagnant. For billions of year after fixation, little evolution happened in the code. Why?
Did it arrive at a global optimum in a pre-existing fitness landscape, with a pre-existing fitness function?
Finally, Article 11:
Indeed.The article cites many of the optimal properties above and concludes:
As we learn more about the functions of the genetic code, it becomes ever clearer that the degeneracy in the genetic code is not exploited in such a way as to optimize one function, but rather to optimize a combination of several different functions simultaneously. Looking deeper into the structure of the code, we wonder what other remarkable properties it may bear. While our understanding of the genetic code has increased substantially over the last decades, it seems that exciting discoveries are waiting to be made.
The genetic code sure is interesting. Whatever the explanation for the origins of the code, whether intentional agency, only RV+NS, self-organization or a combination of these, the fact that these processes converged on a single, reasonably optimal code that is able to facilitate evolution makes it look like it was an inevitable result from the system. The system seems to be rigged and biased towards certain outcomes similar to the evolution of life.
Last edited: