January 05, 2007

Forbidden DNA sequences

Via the New Scientist:
Could there be forbidden sequences in the genome - ones so harmful that they are not compatible with life? One group of researchers thinks so. Unlike most genome sequencing projects which set out to search for genes that are conserved within and between species, their goal is to identify "primes": DNA sequences and chains of amino acids so dangerous to life that they do not exist.


To do this, Hampikian and his colleage Tim Anderson, also at Boise, have developed software that calculates all the possible sequences of nucleotides - the "letters" of DNA - up to a certain length, and then scans sequence databases such as the US National Institutes of Health's Genbank to identify the smallest sequences that aren't present. Those that don't occur in one species but do in others are termed "nullomers", while those that aren't found in any species are termed primes.

Hampikian's team is deliberately searching for the shortest absent sequences in order to minimise the possibility that absent sequences are missing simply due to chance. So far they have found 86 sequences of 11 nucleotides long that have never been reported in humans.
In a back of the envelope calculation, a particular sequence of 11 nucleotides has a 4-11 chance of occurring, but there are around 3 x 109 11-nucleotide sequences in the human genome. Thus, it is very unlikely that such a sequence will not occur by accident in a human. Of course there are repetitive elements in the human genome, as well as functional elements that have been conserved intact across our species and even across millions of years of evolution that separate us from other taxa. Thus, our genomes are not 3-billion long random strings of CAGT. However, the principle is valid that short sequences have a higher chance of occurring by accident in longer genomes. Thus, the fact that some sequences may not occur at all is fairly good evidence that they "do something bad" and hence nature avoids them. From the article:
Whether these sequences have any biological significance in living organisms is not yet known - the next step is to test 20 of the peptoprimes in bacteria and human cells to see whether they have any effect such as causing death or provoking an immune reaction.

No comments: