The last time I did a little tutorial on how to use bioinformatic tools in anthropological research was last October. I’ve had some ideas since then and have decided to restart this project. The biggest change is the screencast format, rather than a set of static instructions.
Today, I’d like to introduce you to the first installation in this series of tutorials on how to use commonly used bioinformatic tools such as a multiple sequence alignment and drawing a phylogenetic tree. Multiple sequence alignments and phylogenetic trees are used in evolutionary analyses to understand the similarities and differences in sequences of DNA, RNA, or amino acids. The basic premise is built off the understanding that more similar sequences are more related than dissimilar sequences.
In this episode, I compare the D-Loop sequence of the mitochondrial genome of two Neandertals, one modern human, a chimpanzee, gorilla and orangutan using Swami — a cohesive collection of commonly used tools. Swami allows us to do a mutliple sequence alignment and generate a phylogenetic tree. The results are displayed above and to the right. I’ve recorded this 7 min 30 second screencast for you to follow. If you’d like to give it a run for yourself, here’s the array of primate D-Loop sequences I’ve used:
>Neandertal-1 (AF254446.1) CCAAGTATTGACTCACCCATCAACAACCGCCATGTATTTCGTACATTACTGCCAGCCACCATGAATATTG TACAGTACCATAATTACTTGACTACCTGTAATACATAAAAACCTAATCCACATCAACCCCCCCCCCCCAT GCTTACAAGCAAGCACAGCAATCAACCTTCAACTGTCATACATCAACTACAACTCCAAAGACACCCTTAC ACCCACTAGGATATCAACAAACCTACCCACCCTTGACAGTACATAGCACATAAAGTCATTTACCGTACAT AGCACATTATAGTCAAATCCCTTCTCGCCCCCATGGATGACCCCCCTCAGATAGGGGTCCCTTGA >Neandertal-2 (AF011222.1) GTTCTTTCATGGGGGAGCAGATTTGGGTACCACCCAAGTATTGACTCACCCATCAGCAACCGCTATGTAT CTCGTACATTACTGTTAGTTACCATGAATATTGTACAGTACCATAATTACTTGACTACCTGCAGTACATA AAAACCTAATCCACATCAAACCCCCCCCCCCATGCTTACAAGCAAGCACAGCAATCAACCTTCAACTGTC ATACATCAACTACAACTCCAAAGACGCCCTTACACCCACTAGGATATCAACAAACCTACCCACCCTTGAC AGTACATAGCACATAAAGTCATTTACCGTACATAGCACATTACAGTCAAATCCCTTCTCGCCCCCATGGA TGACCCCCCTCAGATAGGGGTCCCTTGAT >Human (X90314.1) TTCTTTCATGGGGAAGCAGATTTGGGTACCACCCAAGTATTGACTTACCCATCAACAACCGCTATGTATT TCGTACATTACTGCCAGCCACCATGAATATTGCACGGTACCATAAATACTTGACCACCTGTAGTACATAA AAACCCAATCCACATCAAAACCCCCTCCCCATGCTTACAAGCAAGTACAGCAATCAACCCTCAACTATCA CACATCAACTGCAACTCCAAAGCCACCCCTCACCCACTAGGATACCAACAAACCTACCCACCCTTAACAG TACATAGTACATAAAGCCATTTACCGTACATAGCACATTACAGTCAAATCCCTTCTCGTCCCCATGGATG ACCCCCCTCA >Chimpanzee (AF176766.1) GTACCACCTAAGTATTGGCCTATTCATTACAACCGCTATGTATTTCGTACATTACTGCCAGCCACCATGA ATATTGTACAGTACTATAACCACTCAACTACCTATAATACATTAAGCCCACCCCCACATTACAACCTCCA CCCTATGCTTACAAGCACGCACAACAATCAACCCCCAACTGTCACACATAAAATGCAACTCCAAAGACAC CCCTCTCCCACCCCGATACCAACAAACCTATGCCCTTTTAACAGTACATAGTACATACAGCCGTACATCG CACATAGCACATTACAGTCAAATCCATCCTTGCCCCCACGGATGCCCCCCCTCAGATAGG >Gorilla (AF089820.1) TTCTTTCATGGGGAGACGAATTTGGGTGCCACCCAAGTATTAGTTAACCCACCAATAATTGTCATGTATG TCGTGCATTACTGCCAGCCACCATGAATAATGTACAGTACCACAAACACTCCCCCACCTATAATACATTA CCCCCCCTCACCCCCCATTCCCTGCTCACCCCAACGGCATACCAACCAACCTATCCCCTCACAAAAGTAC ATAATACATAAAATCATTTACCGTCCATAGTACATTCCAGTTAAACCATCCTCGCCCCCACGGATGCCCC CCTTCAGATAGGGATCCCTTAAA >Orangutan (X97708.1) TTCTTTCATGGGGGACCAGATTTGGGTGCCACCCCAGTACTGACCCATTTCTAACGGCCTATGTATTTCG TACATTCCTGCTAGCCAACATGAATATCACCCAACACAACAATCGCTTAACCAACTATAATGCATACAAA ACTCCAACCACACTCGACCTCCACACCCCGCTTACAAGCAAGTACCCCCCCATGCCCCCCCACCCAAACA CATACACCGATCTCTCCACATAACCCCTCAACCCCCAGCATATCAACAGACCAAACAAACCTTAAAGTAC ATAGCACATACTATCCTAACCGCACATAGCACATCCCGTTAAAACCCTGCTCATCCCCACGGATGCCCCC CCTCAGTTAGTAATCCCTTACT
Please check it out and let me know what you think of it, i.e. do you like this format? Did you find it useful? Was I moving too fast, did I explain what I was doing thoroughly? And lastly, what would you like to see?
Hmmm… I’m not any expert in this bioinformatic tech but I find quite odd that gorillas look more distant than orangutans in that NJ tree (normally gorilla would be closer to chimp and human than to orangutan). I also find odd that the orangutan’s sequence is more than twice as long as any of the others and I wonder if that may have introduced errors.
Hi Luis,
Thanks for you comment. You raise some valid points that I overlooked, I’ve taken out the ‘extra’ sequences in the orangutan sample and updated the sample sequences as well as the image. The gorilla is now slightly more closer than the orangutan, but Neandertal-1 now is more dissimilar than Neandertal-2.
Kambiz
Glad to be of some use. It looked strange, just that.
The gorilla is now slightly more closer than the orangutan, but Neandertal-1 now is more dissimilar than Neandertal-2.
That’s pretty interesting. I wonder if that means something or is just “noise” introduced by the small sample. I can just assume that humans, chimps, etc. are not just points either but that they are distributed by some small range. Still the distance between both neanders seems pretty large. N-1 is almost twice as distant from the split with sapiens as his cousin N-2.
It looks as if they had a large genetic diversity and also as if N-2 would be more “archaic” and N-1 more “evolved”. Are they from very different dates? Or maybe geographies? (I think I know which are the two specimens but too tired to search for them now).
It cannot mean hybridation in any case, because that would not affect the mtDNA at all.
Looking in more detail, it looks like chimpanzees (or at least the dot representing them) would be less than 50% more distant from us than N-1. That is even more intriguing.
It seems to mean that, if 2+2=4 and that program works fine (and there are no errors in the sequences), N-1 is almost as distant from us as chimps, what goes against everything we think we know about Homo sp. and specially about our close cousins the Neanderthals.
It makes little sense. I’d check for any posible input error first of all.
If there are no such errors… then is it a discovery or just a misunderstanding?
Erratum:
I just wrote … would be less than 50% more distant…
Actually I forgot to measure the smaller segment. It’s more like 55, maybe 60%.
But the head scratching remains.