http://www.sciencemag.org/content/342/6164/1325
"Despite redundancy in the genetic code, the choice of codons used is highly biased in some proteins, suggesting that additional constraints operate in certain protein-coding regions of the genome. This suggests that the preference for particular codons, and therefore amino acids in specific regions of the protein, is often determined by factors unrelated to protein structure or function. On page 1367 in this issue, Stergachis et al. reveal that transcription factors bind within protein-coding regions (in addition to nearby noncoding regions) in a large number of human genes. Thus, a transcription factor “binding code” may influence codon choice and, consequently, protein evolution. This “binding” code joins other “regulatory” codes that govern chromatin organization, enhancers, mRNA structure, mRNA splicing, microRNA target sites, translational efficiency, and cotranslational folding, all of which have been proposed to constrain codon choice, and thus protein evolution (see the figure). "
http://www.sciencemag.org/content/342/6164/1367.abstract
"Genomes contain both a genetic code specifying amino acids and a regulatory code specifying transcription factor (TF) recognition sequences. We used genomic deoxyribonuclease I footprinting to map nucleotide resolution TF occupancy across the human exome in 81 diverse cell types. We found that ~15% of human codons are dual-use codons (“duons”) that simultaneously specify both amino acids and TF recognition sites. Duons are highly conserved and have shaped protein evolution, and TF-imposed constraint appears to be a major driver of codon usage bias. Conversely, the regulatory code has been selectively depleted of TFs that recognize stop codons. More than 17% of single-nucleotide variants within duons directly alter TF binding. Pervasive dual encoding of amino acid and regulatory information appears to be a fundamental feature of genome evolution."
Main takeway: DNA is a mess as past successes constrain future development into a hodge-podge of "good-enough" and accidental piles of complexity that just manages to get the job done without evidence of a plan or central organizing principle.
http://www.theguardian.com/science/2012/sep/05/genes-genome-junk-dna-encode
Long stretches of DNA previously dismissed as "junk" are in fact crucial to the way our genome works, an international team of researchers said on Wednesday.
It is the most significant shift in scientists' understanding of the way our DNA operates since the sequencing of the human genome in 2000, when it was discovered that our bodies are built and controlled by far fewer genes than expected. Now the next generation of geneticists have updated that picture.
Except by going back to the Encode results of
last year you ignore that they vastly overstated their results and so you have lost the thread.
http://en.wikipedia.org/wiki/Noncoding_DNA
"The Encyclopedia of DNA Elements (ENCODE) project suggested in September 2012 that over 80% of DNA in the human genome "serves some purpose, biochemically speaking". This conclusion however is strongly criticized by other scientists. The general consensus among knowledgeable scientists is that a large percentage of the human genome is junk DNA. Naturally, this junk DNA is all noncoding DNA but that does not mean that all noncoding DNA is junk.
...
The term "junk DNA" became popular in the 1960s. It was formalized in 1972 by Susumu Ohno, who noted that the mutational load from deleterious mutations placed an upper limit on the number of functional loci that could be expected given a typical mutation rate. Ohno predicted that mammal genomes could not have more than 30,000 loci under selection before the "cost" from the mutational load would cause an inescapable decline in fitness, and eventually extinction. This prediction remains robust, with the human genome containing approximately 20,000 genes. Another source for Ohno's theory was the observation that even closely related species can have widely (orders-of-magnitude) different genome sizes, which had been dubbed the C value paradox in 1971."