Improving Multiple Sequence Alignments with a Phylogeny-Aware Algorithm

Ari Löytynoja and Nick Goldman have developed a new method that detects and distinguishes insertions and deletions in genomes. Their work was published in the most recent issue of Science. While Löytynoja and Goldman didn’t explicitly write how their new algorithim, described in, “Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis,” impacts our understanding of human evolution and how we compare primate genomes, it is an important to understand what they’ve accomplished.

Up until now, people compared and contrasted sequencing similarities of multiple genomes using a tool that does a multiple sequence alignment. A commonly used tool is called CLUSTALW. And I’ve used it a lot. CLUSTAL will take long strings of DNA sequences and align them based upon their shared similarities. When a sequence is the same between the samples, they are matched… When sequences aren’t the same, they are marked as gaps. Every consecutive pairwise match between two or more sequences are given a score, and every gap is given a penalty.

Many different alignments are computed and the one with the best score is presented. Phylogenetic trees are drawn off of these sequence alignments. The problem is that this method disregards judging if a length difference between two sequences is a deletion in one or an insertion in the other sequence. This ultimately and systematically creates errors in comparisons of genetic sequences of different species… check it out for yourself, the image below shows the traditional alignment on the left and the new alignment algorithim on the right:

This is where Löytynoja and Goldman’s new algorithm, PRANK, a phylogeny aware algorithm, shines. The phylogeny-aware approach,

“flags the gaps made in previous alignments and, using evolutionary information from related sequences to indicate whether each gap has been created by an insertion or a deletion, permits their “reuse” for inserted characters without further penalty in the next stage of the progressive alignment. In addition, information from closely related sequences can be used to infer sites as “permanent” insertions that cannot be matched in subsequent alignments, so that distinct insertion events are correctly kept separate even when they occur at exactly the same position. If related sequences indicate that a gap is caused by a deletion, flags are removed and no further free gaps at that position are permitted, and the effect is correctly targeted on insertions only.”

Löytynoja explains,

“Say we are comparing the DNA of human and chimp and can’t tell if a deletion or an insertion happened. To solve this our tool automatically invokes information about the corresponding sequences in closely related species, such as gorilla or macaque. If they show the same gap as the chimp, this suggests an insertion in humans.”

In their sample set, they compared sequences of primates to primates, primates to rodents, and primates to all mammals, they were able to identify that insertions are far more common in primate evolution than deletions. Furthermore, the frequency of deletions have been exaggerated because of the inability of previous tools to effectively detect them… which makes me wonder if primates, relatively recent in evolutionary times has been under a relaxed, diversifying level of positively selection? Like some sort of explosion of adaptive radiation of the taxon… I haven’t completely thought this thru, just something that popped into my mind while writing this.

    Loytynoja, A., Goldman, N. (2008). Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis. Science, 320(5883), 1632-1635. DOI: 10.1126/science.1158395

2 thoughts on “Improving Multiple Sequence Alignments with a Phylogeny-Aware Algorithm

  1. Actually, in this study, they did not compare sequences of primates to primates, primates to rodents, and primates to all mammals. Rather, they simulated synthetic DNA sequence data.

    “Fo the 16-taxon tree, we set evolutionary relationships close, intermediate, and distant, approximately representing comparisons of primates, primates and rodents, and mammals, respectively.”

    This is a difference. I think one cannot learn something about the frequency of indels from this study, because the frequency of indels actually went into their model as a parameter and was not a result.

    They were not able to “identify that insertions are far more common in primate evolution than deletions” — this is an assumption they made!

Comments are closed.

A WordPress.com Website.

Up ↑

%d bloggers like this: