Anthropology.net

Beyond bones & stones

Posts Tagged ‘human genome

Are YOU a Neandertal?

with 2 comments

In 2010 the draft genome for Neandertals was released by Svante Pääbo and colleagues. It was reported that European and Asian populations are between 1-4% Neandertal—but what percentage Neandertal are you?

Researcher extracts DNA from a Neandertal specimen

The company known as 23andMe recently released an analysis that claims to answer precisely this question. While personal genome sequencing has not yet hit the mainstream market, 23andMe looks at SNPs, or variations in single nucleotide pairs. Through a comparison between your SNPs and those found in the Neandertal genome draft, for a couple hundred dollars you will be given a percentage. The service has been given the name “Neanderthal Ancestry Estimator.”

Computational biologist Eric Durand developed the project, and has previously worked on both the Neandertal genome draft and Denisova genetics.

I encourage you to take a look at an outline of the methodology, online in a white paper. Are we really at the point where a private company can tell us a likely percentage of our Neandertal ancestry for $207? I’ll let you be the judge.

By Matthew Magnani

Are Rapidly Evolving Human Promoter Regions Due To Higher Rates Of Neutral Substitution Or Positive Selection?

leave a comment »

Nature Genetics just published a brief correspondence on the evolution of promoter regions in the human genome. The basis of this study relies on the observation that 46% of promoter regions in the human genome have a higher number of nucleotide substitutions than corresponding introns. The authors don’t make the distinction that positive selection, relaxed constraint or mutation rate are the causes of this observation, but they suggest that they have been important to hominid evolution and the genetic diversity of humans.

If you don’t know what a promoter is, I’ll give you a quick run down. Promoters are regions of the genome that are upstream from genes. Regulatory elements such as transcription factors bind to these regions and either start the expression of the gene that is downstream or regulate expression. Any changes to these region can affect phenotypes related to the gene downstream. Between two populations with the exact same intronic sequence of a gene, a difference in the promoter region can have dramatic effects.

In this current paper, “Rapidly evolving human promoter regions,” the authors respond to a previous paper on the subject. They reanalyzed the alignments used by the previous paper. They asked whether each promoter region as a whole is evolving more rapidly than local intronic sequences. They find that almost all (569/575; 99%) of the promoter regions identified by the previous paper as containing positively selected sites have a higher average substitution rate than their paired intronic regions.

The previous authors say that positive selection is at play. They based this conclusion on positive selection on introns. But the current authors caution that promoters are unusual genomic regions, and cannot be compared to selection on introns. They simiply conclude that promoters have higher neutral substitution rates. The previous authors respond to this in this same issue of Nature Genetics. They defend that the current authors methodology “do[es] not affirm their contention that mutation is generally accelerated in primate promoters.” Either way, both teams have identified that promoter regions of the human genome are highly diversified. The reason why they are, is still unresolved, but these conclusions do fall in line with previous ones that I’ve covered here on Anthropology.net:

    Martin S Taylor, Tim Massingham, Yoshihide Hayashizaki, Piero Carninci, Nick Goldman, Colin A M Semple (2008). Rapidly evolving human promoter regions Nature Genetics, 40 (11), 1262-1263 DOI: 10.1038/ng1108-1262
    Ralph Haygood, Olivier Fedrigo, Gregory A Wray (2008). Reply to “Rapidly evolving human promoter regions” Nature Genetics, 40 (11), 1263-1264 DOI: 10.1038/ng1108-1263

Written by Kambiz Kamrani

October 29, 2008 at 11:05 am

Missing Pieces to the Human Genome Project

leave a comment »

Scientific American has a news piece explaining the implications of one of the new studies on the human genome that I reported on last week. In a nutshell, the news piece explains how the identification of 250 new regions throughout the genome impacts the current human reference genome… raising concerns that reference genome may be faulty—and that there may actually be yet-to-be-uncovered genes missing from it.

Human Genome Project assembled this reference genome I am referring to in 2003. The reference genome is an amalgamation of sequences from four people (two men and two women) and still has gaps in it. I look forward to seeing if amendments will be made to the reference genome based upon these findings. If you think about it, it is gonna be a really big challenge to assemble a more complete reference genome. To recap the conclusions of the study,

“The researchers identified 1,695 instances of structural variations, 800 of which had not been previously reported. Fifty percent of the regions affected by these mutations showed up in more than one of the people studied. Forty percent of the 525 regions found to be missing from the reference genome were due to copy number variations, which means that a crop of yet-to-be-discovered genes may be hiding within them.”

With so many variants floating around, both large ones like copy number variations and small ones like SNPs, the reference genome must be assembled with the most common sets of alleles. That’s gonna take a lot of work, the genomes of many people from different ethnic backgrounds will need to be sequenced, assembled and folded into the current model.

Written by Kambiz Kamrani

May 5, 2008 at 12:16 pm

Two new studies on exploring methods to study the structure of the human genome

with one comment

Two similar papers published the latest issues of Nature and Genome Research do high-resolution analyses of the structure of the human genome. They differ in methodology, but have some cool conclusions. The Nature paper, “Mapping and sequencing of structural variation from eight human genomes,” created libraries of 4 African, 2 Asian, and 2 European genomes. From these libraries they created thousands of clones to figure out if there are structural variations in genomes of these eight individuals from diverse geographic ancestry.

The Genome Research paper, “Scanning the human genome at kilobase resolution,” used ditag genome scanning (DGS) to analyze the human genome in high resolution. This method is really similar to serial analysis of gene expression (SAGE), in that genome is fragmented, each tag is ligated with a marker, and a sequencing technique (454 in this particular study) is used ultimately to determine the origin of the fragment in genome. The authors of this paper report that their method was strong enough to provides a kilobase resolution for studying genome structure. DGS is also highly specific and can cover a lot of the genome. Downstream applications of DGS are to validate assembled genomes but also to compare genome similarity and variation in normal populations.

Both methods are able to identify genomic abnormalities like insertions, inversions, deletions, and translocations, much better than current technologies. But why is this all important to anthropology? The Nature paper shows how they were able to find 525 new insertion sequences that are not present in the human reference genome. These new insertion sequences are shown to be variable in copy number between individuals, which ultimately make for 525 new ancestry inherited markers. Furthermore, when the authors of the Nature paper sequenced their clones they were able to find an additional 261 structural variants which reveals considerable locus complexity and provides insights into the different mutational processes that have shaped the human genome.

One last point, most ancestry inherited markers have been SNPs, but more recent research on the human genome has shown, however, that larger-scale differences like the copy number variations (CNVs) and others screened in these two papers, may account for a great deal of genetic variation among individuals.

Got race?

    Kidd, J.M., Cooper, G.M., Donahue, W.F., Hayden, H.S., Sampas, N., Graves, T., Hansen, N., Teague, B., Alkan, C., Antonacci, F., Haugen, E., Zerr, T., Yamada, N.A., Tsang, P., Newman, T.L., Tüzün, E., Cheng, Z., Ebling, H.M., Tusneem, N., David, R., Gillett, W., Phelps, K.A., Weaver, M., Saranga, D., Brand, A., Tao, W., Gustafson, E., McKernan, K., Chen, L., Malig, M., Smith, J.D., Korn, J.M., McCarroll, S.A., Altshuler, D.A., Peiffer, D.A., Dorschner, M., Stamatoyannopoulos, J., Schwartz, D., Nickerson, D.A., Mullikin, J.C., Wilson, R.K., Bruhn, L., Olson, M.V., Kaul, R., Smith, D.R., Eichler, E.E. (2008). Mapping and sequencing of structural variation from eight human genomes. Nature, 453(7191), 56-64. DOI: 10.1038/nature06862
    Chen, J., Kim, Y.C., Jung, Y., Xuan, Z., Dworkin, G., Zhang, Y., Zhang, M.Q., Wang, S.M. (2008). Scanning the human genome at kilobase resolution. Genome Research DOI: 10.1101/gr.068304.107

Written by Kambiz Kamrani

May 1, 2008 at 2:54 pm

The 20,500 Protein-Encoding Genome We Call Our Own

leave a comment »

Earlier this week, news of a new paper about the number of protein-encoding genes surfaced on Sandwalk and Henry. The paper’s title is straightforward, “Distinguishing protein-coding and noncoding genes in the human genome” but the concepts behind it may not be.

As mentioned in Sandwalk, the initial estimates of the number of genes in the human genome was about 30,000. That was when the first drafts of the human genome became available in June of 2000. Since then the numbers have been fluctuating, and for many it may seem like geneticists and molecular biologists working on annotating the human genome are riding a roller coaster of indecision. In reality, it is not easy to exactly calculate the number of the genes in any genome.

Why is it not easy to calculate the number of genes? The human genome is around 3,000,000,000 bases long. That’s three thousand million and the average human gene is 12,000 bases long! It is almost like finding a needle in a haystack, but thankfully there is some organization in the genome that helps us find genes faster. Large deserts of junk DNA exist, which helps weed out the possibility of finding genes. And since a gene have a start and a stop, we can harness the power of computers to scan and seek out these signals.

See, the current work flow to estimate the number of genes is to first isolate genomic DNA from the organism. The DNA is then sheared up into many fragments and depending on the cloning mechanism, the fragments are amplified by PCR, in vector expressing bacteria, or both! Once amplified the fragments are then sequenced. This is called shotgun sequencing, the method that Craig Venter deployed to help accelerate the sequencing of the human genome. Since some fragments are larger than other, it is possible to create scaffolding based on homologous sequences called contigs to figure our where fragments fall in order. This is called the assembly of the genome.

Once most of the fragments are assembled, it is also possible to annotate the genome. Annotate means to explain what the nucleotide sequence means. If a nucleotide sequence begins with a start codon and ends with a stop codon in frame, it creates a big flag that this sequence maybe a gene. There’s a lot of definitions of a gene, and for the sake of this post, let’s run on the one definition that calls a gene as any sequence of DNA that is transcribed. This segment of the genome is further scrutinized for splice sites and any other regions, such as regulatory sequences, to help figure out if it’s really a gene. The sequence is also compared to other known sequences, using BLAST, a tool the compares the sequence to a massive database of sequence. If any significant matches come up to already known genes, the possibility that the unknown sequence is a gene increases based on the observation that genes are generally highly conserved throughout evolutionary time.

If the sequence meets all the criteria of a gene, it is labeled an open reading frame or ORF. ORFs are putative genes. In order to confirm an ORF, researchers often need to turn to the wet-lab to either find the gene expressed as an RNA or protein in an organism. With 30,000 or so ORFs, the process of validating each gene is enormous and time consuming. Not every research lab is working on confirming if an ORF is really a gene, so that also slows down the process.

The research conducted in the paper above, involved scrutinizing 22,000 ORFs from the Ensembl database. The analysis revealed a lot of orphan DNA sequences. Orphan sequences look like they encode proteins because of their open reading frames, but they are not present in the mouse and dog genomes. Just cause dogs and mice didn’t have the ORFs didn’t mean the ORFs aren’t real genes. They could be unique primates genes, deriving during or after the primate lineage split from the rest of the mammals. Or, the genes could have been more ancient creations and lost in mouse and dog lineages. Either way, if the ORFs were also compared to primate genomes, then they should appear there as well.

Comparing the ORFs to the chimpanzee and macaque genomes invalidated a total of about 5,000 ORFs that had been incorrectly added to the lists of protein-coding genes. This reduces the current estimate to roughly 20,500 genes that encode for proteins in the human genome. That’s not much, evolution isn’t a numbers game. Some of the variation in the genes as well as the patterns of regulation and expression of these genes are what makes us human. So if you’re thinking, “Why do humans have so few genes?” don’t fret, size doesn’t matter in this case.

Written by Kambiz Kamrani

January 17, 2008 at 11:55 am

Follow

Get every new post delivered to your Inbox.

Join 689 other followers