Ari Löytynoja and Nick Goldman have developed a new method that detects and distinguishes insertions and deletions in genomes. Their work was published in the most recent issue of Science. While Löytynoja and Goldman didn’t explicitly write how their new algorithim, described in, “Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis,” impacts our understanding of human evolution and how we compare primate genomes, it is an important to understand what they’ve accomplished.
Up until now, people compared and contrasted sequencing similarities of multiple genomes using a tool that does a multiple sequence alignment. A commonly used tool is called CLUSTALW. And I’ve used it a lot. CLUSTAL will take long strings of DNA sequences and align them based upon their shared similarities. When a sequence is the same between the samples, they are matched… When sequences aren’t the same, they are marked as gaps. Every consecutive pairwise match between two or more sequences are given a score, and every gap is given a penalty.
Many different alignments are computed and the one with the best score is presented. Phylogenetic trees are drawn off of these sequence alignments. The problem is that this method disregards judging if a length difference between two sequences is a deletion in one or an insertion in the other sequence. This ultimately and systematically creates errors in comparisons of genetic sequences of different species… check it out for yourself, the image below shows the traditional alignment on the left and the new alignment algorithim on the right:
This is where Löytynoja and Goldman’s new algorithm, PRANK, a phylogeny aware algorithm, shines. The phylogeny-aware approach,
“flags the gaps made in previous alignments and, using evolutionary information from related sequences to indicate whether each gap has been created by an insertion or a deletion, permits their “reuse” for inserted characters without further penalty in the next stage of the progressive alignment. In addition, information from closely related sequences can be used to infer sites as “permanent” insertions that cannot be matched in subsequent alignments, so that distinct insertion events are correctly kept separate even when they occur at exactly the same position. If related sequences indicate that a gap is caused by a deletion, flags are removed and no further free gaps at that position are permitted, and the effect is correctly targeted on insertions only.”
“Say we are comparing the DNA of human and chimp and can’t tell if a deletion or an insertion happened. To solve this our tool automatically invokes information about the corresponding sequences in closely related species, such as gorilla or macaque. If they show the same gap as the chimp, this suggests an insertion in humans.”
In their sample set, they compared sequences of primates to primates, primates to rodents, and primates to all mammals, they were able to identify that insertions are far more common in primate evolution than deletions. Furthermore, the frequency of deletions have been exaggerated because of the inability of previous tools to effectively detect them… which makes me wonder if primates, relatively recent in evolutionary times has been under a relaxed, diversifying level of positively selection? Like some sort of explosion of adaptive radiation of the taxon… I haven’t completely thought this thru, just something that popped into my mind while writing this.
- Loytynoja, A., Goldman, N. (2008). Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis. Science, 320(5883), 1632-1635. DOI: 10.1126/science.1158395