Anthropology.net

Beyond bones & stones

Archive for January 2008

1,000 Genomes Project announced, but is it really 1,000 genomes?

leave a comment »

I welcome the news of the 1,000 human genome project that was announced a couple days ago eagerly.  It is a really ambitious effort that will involve sequencing (parts of) the genomes of at least a thousand people from around the world to create the most detailed and useful picture to date of human genetic variation.

As sweep of Gene Expression points out, the project won’t be actually sequencing the genomes of 1,000 people. Rather, six individuals from two families, will get their entire genomes sequenced. 180 different people, from European, Chinese, Japanese, and Nigerian populations will get a more shallow sequence. And then the rest will be sequenced in all of the known protein-coding regions from 1000-2000 genes in over 1000 people. Here are the populations that will be sampled,

“Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States. “

Much of the press is misinforming people, saying the actually genomes of 1,000 people will be sequenced. That’s a monumental and costly effort, even with the advances we have in sequencing technology. But the data will be invaluable because it will provide a lot of resolution in genetic variation from over 1,000 people. I can’t wait to see what they figure out in three years time. In the mean time, I guess book marking the project’s page, 1000genomes.org, and checking in once in a while to see the progress wouldn’t be a bad idea!

Written by Kambiz Kamrani

January 24, 2008 at 12:05 pm

100,000-year-old human skull found in Henan, China

with 16 comments

If you’re reached your saturation coefficient with my coverage of population genetics, news of a 100,000 year old Homo from Henan, China should be welcoming. Human Fossils found in Henan, ChinaAll I have so far is this press release, which is confusing. A quote from Li Zhanyang, an lead excavator with the Henan cultural relics and archaeology research institute, brings up some questions,

“The fossil consisted of 16 pieces of the skull with protruding eyebrows and a small forehead. More astonishing than the completeness of the skull is that it still has a fossilized membrane on the inner side, so scientists can track the nerves of the Paleolithic ancestors.”

I’m thinking somethings were lost in translation. First, fossils do not have eyebrows. Maybe a protruding brow ridge? That could be what they are trying to get at. Also, the article doesn’t say what membrane was fossilized within the brain case. My best guess is that it is the dura mater, the tough outer layer of the meninges that surrounds the brain. Even with the dura mater, it is hard to trace nerves from a Paleolithic fossil. One can probably trace the sulci and gyri and overall gyrification pattern imprinted from the brain, that will be useful in comparing the variation in the amount of fissures of archaic Homo brain.

Anyways, I’m super excited about this fossil find. No word on whether or not this fossils is Homo erectus or Homo sapiens. During this time period, two fossils come to mind that complicate assigning the taxonomy. First is, Jinniushan man, a 300k to 200k year old specimen that shows features of H. erectus, but with a endocranial volume similar to H. sapiens, as well as a overall thin cranial vault, expansion and rounding of the occipital and parietal region, the position of maximum cranial breadth, and overall facial morphology have resulted in Jinniushan being allotted to archaic Homo sapiens. Contending Jinniushan, are the Homo erectus looking Peking man fossils, which were lost in transit during World War II. These fossils are also from a similar time period.

This new Henan specimen is around 100,000 years old. Because of this date, I will very surprised if it is assigned as Homo erectus. Trinkaus’ 40k year old Tianyuan human also from Zhoukoudian, where the Peking fossils were found, would be awesome to compare. But the Tianyuan fossils can’t be used to compare these new Henan fossils, because the Henan collection is only of cranial bones. The closest things we have to cranial bones with the Tianyuan human is a mandible.

All hope is not lost, the endocranial volume from these new fossils can probably be calculated, which will give very good data on how to annotate this specimen. If the specimen is estimated to be an adult with a brain volume of around 1100cc or less, that would be really interesting!

Written by Kambiz Kamrani

January 22, 2008 at 8:11 pm

Genetic Relationships of Semitic and Indo-Iranian speaking groups in Iran

with 19 comments

If you don’t know already, I’m of Iranian decent. I was born in Tehran, but because of persistent socio-political instability in that region of the world, my family and I immigrated out of the country about 20 years ago. But just cause I live somewhere else doesn’t mean I’m not interested in my background. I’ve always been curious and inquisitive about my heritage. I’ve come to understand my mother’s and father’s lineage come from very different cultural backgrounds.

My mother’s family have been established Tehranians for quite sometime and because of the nature of big city life, their heritage has been mixed and lost. But if you look at members of my mother’s family, they are fair skinned and have blond hair with green or blue eyes. Often, they get mistaken for Europeans, which leads me to think they have a different heritage from my father’s side of the family. I’ve sequenced a short bit of my mtDNA and can only figure out that my maternal lineage has the haplotype H4 signature, which is very frequent in middle eastern populations, and not enough of a resolving feature to really make make any strong conclusions about where that half of me comes from.

Bakhtiari Women on HorsesWhat we know of my father’s family differs greatly. My dad’s parents hauled out of Lorestan and into Tehran. Lorestan is a western Iranian province smack dab in the Zagros Mountains. It is sometimes home to the Bakhtiari, a nomadic pastoralist group that you may have been introduced in your cultural anthropology learnings. The Bakhtiari regularly speak Luri, a language that’s classified as Indo-Iranian. Indo-Iranian languages are distinct from languages spoken by Semitic peoples, such as Arabic and Hebrew, if you want more information about this distinction check out Ethnologue.com.

Suffice to say, I got really interested to stumble upon an early online release paper from the Annals of Human Genetics, which investigates the, “Close Genetic Relationship Between Semitic-speaking and Indo-European-speaking Groups in Iran,” because it has tangents to at least half of my known heritage. Academics from the Max Planck Powerhouse of Evolutionary Anthropology and Tehran University collaborated on figuring out who the Bakhtiari are related to.

In order to carry out the study 99 people were sampled from a different province, Khuzestan, with almost 50 to 50 ratio of people from both ethnicities. The authors honed in on comparing the mtDNA HV1 sequences, eleven Y chromosome bi-allelic markers, and 9 Y-STR loci. STRs are a class of polymorphisms that, like microsatellites, consist of a repeated pattern of two or more nucleotides. The repeats are directly adjacent to each other and can range in length from 2 to 10 base pairs. They usually exist in the non-coding introns of genes.

Anyways, all these different loci show that the Iranian-Arabs share close relatedness of to the Bakhtiari as well as with neighboring geographic groups, irrespective of the language spoken. Haplogroups J2 and G are especially intriguing because they are found in really high frequencies in Bakhtiari and Iranian-Arab populations. Like I mentioned above, the Bakhtiari are a distinctly different cultural group that speak a Indo-Iranian language which does not belong to the Afro-Asiatic linguistic family that classify Semitic speaking Iranian-Arabs. Many cultural barriers have been formed to keep the Bakhtiari way of life unique, and one of them is language. So it doesn’t make sense that these two linguistically separate groups share two haplogroup signatures in such a disproportionally high frequency.

mtDNA haplogroups in Indo-European-speaking groups and in Semitic-speaking groups

A comparison of Iranian-Arabs to other Semitic speaking groups showed that Semitic-speaking North African groups are way more distant genetically from Semitic-speaking groups from the Near East and Iran. The above illustration documents this. Haplogroup L is almost nonexistent east of Iraq, despite the fact there are Semitic speaking populations in foothills of the Zagros mountains in Iran.

Now, I said that was surprising because often language is a big barrier, as recently expressed by Razib with the Slavs as an example. In Iran however, a different situation exists. There is a lack of significant differentiation between west Asian Semitic-speaking and Indo-European-speaking groups indicates that language has not been a substantial barrier to gene flow in this part of the world. But this leads me to wonder about the origins of Iranian-Arabs, if they are genetically less similar to other Semitic speakers, doesn’t that imply they were ‘cultural converts’?

P.S., If you do read the paper, take note of the disclaimer the authors put about inscribing identity.

Investigating microsatellite mutagenesis in human–chimpanzee orthologs

leave a comment »

I’ve been discussing genetic markers a lot this last week. SNPs are often the type of genetic marker I refer upon in population genetic studies, but microsatellites are also a very informative genetic marker. Microsatellites are short repeating units of 1-4 base pairs in length and are polymorphic like SNPs. In a paper published in Genome Research, the polymorphic tendency of microsatellites in human–chimpanzee orthologs are investigated.

The paper, “The genome-wide determinants of human and chimpanzee microsatellite evolution,” first honed in on the number of repeats, the length, and motif size of the mircosatellites. These factors influence the probability of slippage that makes for the polymorphisms. The length and motif composition of each microsatellites was also scrutinized because mutagenesis nonuniformly increases with length and depending on the motif composition, secondary DNA structure create break points.

Their regression models, tests to figure out the influence of one variable from one or more other variables in microsatellite mutagenesis, explained for about 90% of variation in microsatellite mutability. Not bad!

Written by Kambiz Kamrani

January 21, 2008 at 7:31 pm

Dopamine Transporter Gene and Primate Social Behavior

with 3 comments

Dopamine is a fundamental neurotransmitter and hormone. You may know it as one of the neurotransmitters associated with the limbic system, being released during eating and sex, which causes a sensation of pleasure. But it is more than just a hedonistic chemical, actually many of the functions of the brain are dependent on dopamine. Memory, attention and problem solving revolve around dopamine to control the flow of information from other areas of the brain to the frontal lobes. As a hormone, dopamine acts a precursor to noradrenaline and adrenaline and thus increases heart rate and blood pressure during sympathetic nervous system response.

For this purposes of this post, dopamine is an important neurotransmitter that regulates behavioral responses. In brains of people with deficiencies in dopamine levels, attention deficit hyperactivity disorder is an all too common diagnosis. Low levels of dopamine also cause social withdrawal, apathy, and anhedonia. Furthermore, social anxiety is associated with neurons that are unable to bind dopamine. When dopamine is unregulated and in excess, extraversion or gregarious and assertive behaviors are observed.

Before I jump deep into the post, let me first run down some neuron physiology. Without an understanding about how neurons and their associated chemicals function, it’s hard to comprehend how a mutation in any one of the components leads to neurological, cognitive and behavioral disorders. Neurons are specialized cells of the nervous system that are depolarizable and this ability allows signal to be transduced. Signals come in two forms, graded or action potentials. I won’t get into the nitty gritty of how potentials are formed but just know that once graded potentials reach a threshold, an action potential is generated that rushes down the axon of a neuron. Action potentials are an all or none response.

Neuron SynapseThe action potential travels down axon to the presynaptic terminal where it causes channel proteins to open. The presynaptic terminals contain vesicles chock full of neurotransmitters. The opening of channel proteins influences the vesicles full of neurotransmitters to fuse with the presynaptic membrane. The neurotransmitter is released into the space between the presynaptic membrane called the synaptic cleft and it targets its reciprocal receptor on the postsynaptic membrane of the next neuron. The effect of the neurotransmitter on the postsynaptic membrane will depend on the nature of the neurotransmitter, the nature of the postsynaptic receptors, and whether the postsynaptic ion channels are voltage-gated or chemically-gated. In dopamine’s case, it is hypothesized to provide a teaching signal to parts of the brain responsible for acquiring new behavior.

To clean up the neurotransmitters, specialized proteins called transporters function to re-uptake the bound neurotransmitters back into the neuron. I imagine them as vacuums. In dopamine’s case, a specific transporter exits, and is called the dopamine transporter… but we’ll be calling it DAT. Since DAT cleans up dopamine, and inactivates its function, it is critical in regulating (stopping) the network of effects dopamine is responsible for. Any mutation in the DAT gene that also changes the amino acid composition of the transporter ultimately affects the ability of the protein to stop dopamine’s effects.

In humans, the DAT gene is fairly large, around 64,000 base pairs long and consists of 15 exons. Evidence for the associations between DAT and dopamine related disorders have come from a genetic polymorphisms studies of the DAT gene. Currently mutations in DAT are implicated in a number of dopamine related disorders such as attention deficit hyperactivity disorder, bipolar disorder, clinical depression, and alcoholism.

Because DAT modulates the extent of dopamine activity on the receptor, it becomes an excellent candidate to study how variants of DAT effect behavior and ultimately if the variants offer an selective advantage. In what I consider a really awesome paper in the journal, Molecular Biology and Evolution, a half dozen geneticists at the University of Pittsburg, studied DAT for sequence variation in populations of two different macaque species and humans. They calculated the extent of the different combinations of DAT alleles in their populations that would be more or less frequent than what’s expected from a random formation of haplotypes. The amount of non-random associations between polymorphisms at different loci are measured by the degree of linkage disequilibrium, which is the basically the probability to find same set of alleles at two or more loci. The key word here is non-random. In order to study whether or not a mutation in the DAT gene has any affects on survivability, we need to figure out the random variants from the ones that are seemingly selected for.

The paper, “Sequence Variation in the Primate Dopamine Transporter Gene and Its Relationship to Social Dominance,” tells us how they went about doing that. First they sampled about 760 monkeys but only 23 humans. They designed primers for the DAT gene and each exon was sequenced. That’s a lot of sequencing, if my estimations are correct, that’s around 12,000 different reactions. But I don’t know that for sure. Either way, 78 polymorphisms were identified but only two functional variants were linked to high social rank. Social rank was observed through the level of dominance (aggressive, use of attack gestures, actions, and vocalizations more frequently, and consistently defeat individuals of lower rank).

What I’m kinda iffy on is how they identified the variants, located in 5′ UTR, if they only sequenced the exons. Regardless, they realized that heterozygous individuals, with one copy of the minor 5′ UTR allele, were more likely to be of subordinate rank than those who were homozygous for the major allele. In other words,

“the odds that a subordinate individual possesses at least one copy of the minor allele… are one and a half to nearly twice the odds of it being homozygous… In contrast, subordinates were significantly less likely to be heterozygous than homozygous.”

The two DAT 5′ UTR variants fall at a putative transcription factor–binding site. They don’t get deep into a discussion on how the variants affect the transcription factor-binding site (other than the minor allele abolishes the core sequence) nor what the putative transcription factor that binds to it, which would be two a really cool study in itself. If one could take the 2 variants and compare how levels of gene expression vary, then we can get an idea if the homozygous alleles allow for less DAT to be transcribed and ultimately allow for more dopamine to float around causing extraversion and socially dominant behavior. But they do identify NFAT as a regulator,

“…these transcription factors thus play a crucial role in shaping long-term changes in neuronal function. They are also sensitive to secondary messenger systems activated by brain-derived neurotrophic factor (BDNF) which regulates expression of the dopamine D3 receptor. It is thus possible that NFAT and/or BDNF also modulates expression of DAT. “

I consider this study extremely enlightening in understanding the biological mechanisms behind primate social behavior and ultimately evolution. See we have behavior, social dominance, that for the most part we think has evolutionary significance and has something to do with dopamine and the regulation of this neurotransmitter. In order to figure out if the two are linked, one needs to correlate that a variation in any portion of the gene that regulates dopamine activity (DAT) is linked to a heterozygote or homozygote state.

In this case, Robert Ferrell and his lab identified a difference in an area slightly upstream of DAT that controls the rate it is transcribed in macaques. By looking at the variants in each individual and the observations of the social behavior, his lab figured out heterozygous individuals we’re as bossy. Pretty amazing, if you ask me. But this putative binding site is not found in the homologous region of human DAT, which is also really interesting! Has social dominance by way of the dopamine network not been positively selected or lost in the human lineage? Or have we humans found another biochemical pathway to influence dominant behavior?

Fighting the mantra, “People vary more within the groups than vary between groups”

with 12 comments

In light of a discussion between Razib and Martin, I recently took arms and battled the concepts behind race and identity and how human genetic variation plays a role in forming these concepts. In the comments, I was disgusted to read Martin throw in this rhetorical line in his comment,

“Genetics have long ago shown that people vary more within the major racial groups than these groups vary among themselves.”

I’ve heard this so many times that I want to puke. It means nothing to me. Based on this line and others he’s said in his discussions with Razib, I’ve picked up that he’s a big proponent of the “if it sounds like a mantra, then it must be true.” In the following post, I will debunk this mentality. Several new publications have just come out in PLoS Genetics that show exactly how genetics can help identify groups, especially groups that are not demarcated by major social and phenotypic differences.

All of these publications, three in total actually, focus on identifying genetic markers to help identify populations of Europeans (there’s a bonus one on the genetic structure of Polynesians). They are open access, so you have freely readable first hand literature to follow along. As you know Europeans are often viewed as a homogeneous category of classification. We have a wealth of evidence that tells us of cultural admixture, wars, migrations, formations and declines of new states, etc. over thousands of years of European history. All of these social mechanisms have left an imprint on the cultural, biological, and linguistic composition of Europe. To further complicate things, and addressing the identity crisis Martin brought up when he stated that he’s a Swede, I’ve seen US census reports just lump every of European descendant as ‘white.’ Such a label actually groups together multiple populations, which have diverse origins due to the complex history of Europeans.

The first genetic dissection of the population structure of European Americans that I will share with you involved a lot of researchers collaborating together. They focused their work on identifying the contributions from different genetic ancestries. Why did they do it? Like I said earlier, the primary motivation was to identify genes that can be associated with disease. Here’s an excerpt of the abstract that is useful in demonstrating how genetic markers can help identify groups of people,

“Here, we investigate empirical patterns of population structure in European Americans, analyzing 4,198 samples from four genome-wide association studies to show that components roughly corresponding to northwest European, southeast European, and Ashkenazi Jewish ancestry are the main sources of European American population structure. Building on this insight, we constructed a panel of 300 validated markers that are highly informative for distinguishing these ancestries.

A sample size of 4,200 is large, folks. They could find and validate 300 markers that can distinguish regional ancestries, which helps narrow down ethnicity. But that’s not good enough, rather, if we had more markers we could classify more people who carrier similar markers into the ethnic groups. Well a second publication from a collaborating group did exactly that,

“European population genetic substructure was examined in a diverse set of >1,000 individuals of European descent, each genotyped with >300 K SNPs. Both STRUCTURE and principal component analyses (PCA) showed the largest division/principal component (PC) differentiated northern from southern European ancestry. A second PC further separated Italian, Spanish, and Greek individuals from those of Ashkenazi Jewish ancestry as well as distinguishing among northern European populations. In separate analyses of northern European participants other substructure relationships were discerned showing a west to east gradient.”

So moving from 300 to 300,000 SNPs increased the resolution immensely. Now people could be classified as Italian, Spanish, yada yada you read the list ancestry. Not bad, at all. These markers are labeled as ancestry informative markers or AIMs. These AIMs have been mixed and matched because doing so helps…

“.. distinguish the ancestries of these genetically very similar populations, whose real or perceived group differences may often be dominated by environmental, social, and cultural factors. Below, we outline the possible choices of marker sets for inferring various ancestries. In each case, a method such as structured association or principal components analysis can be applied to genotype data to correct for stratification.

To correct for stratification along the north–south (or northwest–southeast) cline, either the Price100 or Tian192 marker sets can be used. (The Tian192 markers, which were ascertained using northern European versus Ashkenazi Jewish ancestry, are effective in distinguishing north–south ancestry because southern Europeans attain intermediate ancestry values as compared to values at one extreme for northern Europeans.) To correct for stratification involving both north–south and Ashkenazi Jewish ancestry, one option is to use the Price100+Price200 marker sets, which together separate north, south, and Ashkenazi ancestry into three distinct clusters. Another option is to use the Tian192 marker set, which models these three ancestries along a single axis and will be sufficient in the case that the phenotype being analyzed has intermediate values for southern European as compared to northern European versus Ashkenazi Jewish ancestry. Finally, to correct for stratification involving a west–east gradient within northern Europe (e.g., Irish versus other northern European ancestry), the Tian1211 marker set is the only set of AIMs available.”

So there you have it, by increasing the screen to search for more markers and using different combinations of markers, the researchers were able to identify genetic similarities and differences between groups of genetically similar people. I don’t know how anyone can go about saying something like “people vary more within the major racial groups than these groups vary among themselves.” If it can be done in perceived homoegenous European groups, it can be done elsewhere too. It is probably being done elsewhere…

Maybe the reason why it hasn’t been done before is because we just haven’t had as many SNPs to screen for in the past. I attribute this shortcoming as one of the sources of the misinformation Martin reiterated. With more and more people studying human variation, more and more people will be sampled. As sample sizes increase, and as projects like the HapMap and Genographic projects expand, I imagine we’ll identify tons of SNPs and markers. Also, in the past, I imagine the mtDNA was the only easily researchable locus to screen for genetic variation and diversity… and mtDNA is small and does not store nearly as much variation as nuclear DNA. That could be where Martin picked up the “long ago” portion of his statement. Long ago in genetics was when mtDNA was the only accessible and reliable thing to study and that wasn’t that long ago. But we now look for variation in the nuclear genome of humans, which contains many more base pairs and many more heritable markers.

As kinda icing on the cake, I want to move away from European populations and shed the spot light onto human genetic diversity of the Pacific because as the authors write in their abstract,

“Human genetic diversity in the Pacific has not been adequately sampled, particularly in Melanesia. As a result, population relationships there have been open to debate.”

This is a cool study that’s made the rounds on a couple blogs. If you may not have caught it before, let me summarize the study for you. The study was just as comprehensive as the European genetic studies. In involved almost 1,000 individuals from 41 populations. Using more than 800 genetic markers the results revealed that Polynesians and Micronesians have almost no genetic relation to Melanesians, rather Polynesians’ and Micronesians’ closest relationships are to Taiwan Aborigines and East Asians. And that groups that live in the islands of Melanesia are remarkably diverse. The research also suggests that the ancestors of Polynesians moved through Island Melanesia relatively rapidly.

In conclusion, very recent papers are telling us exactly the opposite of what Martin said. Furthermore, I brought this up in my previous human genetic variation post, but I gotta bring it up again because it happened again, I really don’t appreciate this ‘long ago’ academic arrogance expressed when people say “long ago, we anthropologists decided race was a social construct,” or “long ago, genetics confirmed human don’t vary much.” Phrasing statements like that imply that if I think otherwise then I’m dated, I’m not with the times. How unscientific. It is actually those who do not refresh their knowledge and keep current with advances in population studies that look like the dated and uneducated ones.

Christopher Columbus’ Package of Love, Syphilis

with 7 comments

In my life, I’ve seen Christopher Columbus’ reputation take the downward spiral from hero to enemy. It has even affected me in a very superficial manner. See, even though the US government has commemorated the day he found the Americas as a holiday, I remember being devastated the year when my school district decided to nix the day off I was so looking forward too.

Since then, my education in anthropology hasn’t held him to a high standard either. He’s often vilified for starting the end of native American existence. And now a new study in PLoS Neglected Tropical Diseases, traces the emergence of syphilis in Europe to the time when Columbus returned from the Americas. Furthermore, a phylogenetic comparison of the syphlilis causes bacterium, Treponema pallidum, shows that it is a close cousin to the South American tropical disease yaws.

Actually for some time Columbus has been blamed for bringing syphilis to the East. But a skeleton of a man was found in north eastern Britain with signs of bone lesions similar to those causes by syphilis. Preliminary dates of this skeleton suggested that the man had died around 1442, exonerating Columbus for a bit. Here’s a link to that paper, ‘“The syphilis enigma”: the riddle resolved?

Since then, anthropologists re-evaluated the date and suggested that the fishy diet of the region somehow affected the dating technique, making the skeleton seem older than it is. With all this confusion over paleopathology and dating techniques, a genetic analysis of the Trepanoma bacterium seemed much more logical.

In order to conduct the study, 22 human samples, including two Yaws samples, were compared. Even though Yaws is not sexually transmitted (it is transmitted through skin contact), it was included because it is thought that South American variety is a good candidate for the source of the venereal disease. Since the bacteria are so fragile only some sections of the genome could be recovered, including 17 base pairs that ended up being diagnostic of the different Treponema. Through a SNP analysis, it was found that syphilis and South American yaws shared 4 identical base pairs. Not entirely convincing but the overlap with any other kind of Treponema was almost non-existant… A phylogenetic tree of the Treponema samples also showed that syphilis had evolved most recently of the bacterial strains studied, and by most recently we’re talking about 500 or so years ago.

A network path for four informative substitutions shows that New World subsp. pertenue, or yaws-causing strains, are the closest relatives of modern subsp. pallidum strains.

Columbus’ crew is the only one known to have voyaged to the Americas during that time. And the first recorded epidemic of syphilis in Europe broke out among French troops in 1495, two years after Columbus returned from his first voyage across the Atlantic, which points the finger to him and his people. But, with our knowledge of how bacteria can share genes, I’m not entirely convinced it was him. Furthermore, a 4 base homology isn’t a lot.

Written by Kambiz Kamrani

January 17, 2008 at 12:58 pm

The 20,500 Protein-Encoding Genome We Call Our Own

leave a comment »

Earlier this week, news of a new paper about the number of protein-encoding genes surfaced on Sandwalk and Henry. The paper’s title is straightforward, “Distinguishing protein-coding and noncoding genes in the human genome” but the concepts behind it may not be.

As mentioned in Sandwalk, the initial estimates of the number of genes in the human genome was about 30,000. That was when the first drafts of the human genome became available in June of 2000. Since then the numbers have been fluctuating, and for many it may seem like geneticists and molecular biologists working on annotating the human genome are riding a roller coaster of indecision. In reality, it is not easy to exactly calculate the number of the genes in any genome.

Why is it not easy to calculate the number of genes? The human genome is around 3,000,000,000 bases long. That’s three thousand million and the average human gene is 12,000 bases long! It is almost like finding a needle in a haystack, but thankfully there is some organization in the genome that helps us find genes faster. Large deserts of junk DNA exist, which helps weed out the possibility of finding genes. And since a gene have a start and a stop, we can harness the power of computers to scan and seek out these signals.

See, the current work flow to estimate the number of genes is to first isolate genomic DNA from the organism. The DNA is then sheared up into many fragments and depending on the cloning mechanism, the fragments are amplified by PCR, in vector expressing bacteria, or both! Once amplified the fragments are then sequenced. This is called shotgun sequencing, the method that Craig Venter deployed to help accelerate the sequencing of the human genome. Since some fragments are larger than other, it is possible to create scaffolding based on homologous sequences called contigs to figure our where fragments fall in order. This is called the assembly of the genome.

Once most of the fragments are assembled, it is also possible to annotate the genome. Annotate means to explain what the nucleotide sequence means. If a nucleotide sequence begins with a start codon and ends with a stop codon in frame, it creates a big flag that this sequence maybe a gene. There’s a lot of definitions of a gene, and for the sake of this post, let’s run on the one definition that calls a gene as any sequence of DNA that is transcribed. This segment of the genome is further scrutinized for splice sites and any other regions, such as regulatory sequences, to help figure out if it’s really a gene. The sequence is also compared to other known sequences, using BLAST, a tool the compares the sequence to a massive database of sequence. If any significant matches come up to already known genes, the possibility that the unknown sequence is a gene increases based on the observation that genes are generally highly conserved throughout evolutionary time.

If the sequence meets all the criteria of a gene, it is labeled an open reading frame or ORF. ORFs are putative genes. In order to confirm an ORF, researchers often need to turn to the wet-lab to either find the gene expressed as an RNA or protein in an organism. With 30,000 or so ORFs, the process of validating each gene is enormous and time consuming. Not every research lab is working on confirming if an ORF is really a gene, so that also slows down the process.

The research conducted in the paper above, involved scrutinizing 22,000 ORFs from the Ensembl database. The analysis revealed a lot of orphan DNA sequences. Orphan sequences look like they encode proteins because of their open reading frames, but they are not present in the mouse and dog genomes. Just cause dogs and mice didn’t have the ORFs didn’t mean the ORFs aren’t real genes. They could be unique primates genes, deriving during or after the primate lineage split from the rest of the mammals. Or, the genes could have been more ancient creations and lost in mouse and dog lineages. Either way, if the ORFs were also compared to primate genomes, then they should appear there as well.

Comparing the ORFs to the chimpanzee and macaque genomes invalidated a total of about 5,000 ORFs that had been incorrectly added to the lists of protein-coding genes. This reduces the current estimate to roughly 20,500 genes that encode for proteins in the human genome. That’s not much, evolution isn’t a numbers game. Some of the variation in the genes as well as the patterns of regulation and expression of these genes are what makes us human. So if you’re thinking, “Why do humans have so few genes?” don’t fret, size doesn’t matter in this case.

Written by Kambiz Kamrani

January 17, 2008 at 11:55 am

On Human Genetic Variation and Human Identity

with 10 comments

The breakthrough of 2007, as announced by AAAS, the nonprofit organization that publishes Science, is human genetic variation. Human genetic variation has been studied for quite sometime and the primary reason to study genetic variation in humans is to discover and describe the linkage of genes to many human diseases. This is an increasingly powerful motivation in light of our growing understanding of the contribution that genes make to the development of diseases such as cancer, heart disease, etc.

In 2007, we saw many publications in very prestigious journals that used genome-wide association studies (GWAS) to identify common genetic factors that influence health and disease, and I think that’s where the AAAS felt motivated to call human genetic variation the breakthrough of the year. May of 2007, Nature ran a paper where the sample size was around 17,000 and in June, Science published a paper with around 13,00 samples!

But the limelight’s not all in medicine. The impact of human genetic variation in anthropology is just as fascinating. For the last 30 or so years, we have gotten a glimpse of what genetic variation has to offer in helping anthropology; and I hope we can all appreciate (ahem, Martin) that we have another tool in our belt to help us understand human migrations and population structures. With some of the first theoretical papers coming out in ’70′s, I’ve appreciated how fast the field has progressed and how much resolution we now have to understand migratory patterns and the genetic composition of ethnic groups.

For example, in the late ’90′s, we read how Y chromosome sequences tend to be more localized geographically than those of mtDNA. This is an important study for anthropology. Why? The difference seen between female and male migration rates, tells us of human behaviors… women move away from their homes and into the male mate’s natal household far more often than the reverse scenario. Genetic data like this documents us many human populations practiced patrilocal residence, which is a term anyone well versed in socio-cultural anthropology would know by heart. There are a few populations that operate(d) with an opposite social system, such as the Chaco Canyon people, the Nair in Kerala, India, and the Mosou in southern China. But these populations aren’t the norm. For the most part human populations have been paternal and this genetic data supplements the ethnographic, archaeological, and historical data.

In 2007, several human genetic studies told us of how the Americas were peopled. One study analyzed single nucleotide polymorphisms (SNPs) from around 46 different populations within the Americas and Asia to tell us that people camped out in Beringia for a while before making it down to North America and then South America. SNPs are a form of genetic variation, in that a SNP is the difference of one base pair in the same location between two or more alleles.

In late 2007, we also saw a boom in personal genome products from the corporate sector. These products, such as kits from 23andMe, deCODEme, Family Tree DNA, dna.ancestry.com, etc. screen for SNPs to tell us our propensity for some heritable diseases but also our ethnic background. If you’re curious, Mark Fletcher of Wingedpig discussed his results from 23andMe, Myles Axton of Nature Genetics shared his deCODEme results at Free Association, and Megan Smolenyak put up a 17 minute screencast discussion of her husband’s deCODEme results.

Not everyone trusts these sorts of studies. For example, Meredith Small said DNA testing is a scam… and like Martin, argues that our genetic composition/variation can not ultimately tell us who we are. They say that personal identity is not found between base pairs of long sequences. Rather, identityis more effectively found in who humans have associated with, what culture(s) humans followed and are following, what humans do in their daily lives. As an someone trained in archaeology, I can see why Martin sees it this way. He even says his primary expertise is in studying material cultural, a product of human society and culture. As a cultural anthropologist, I can also see why Meredith Small is touting up the we are people… social beings. With their foundations in the social and cultural aspect of anthropology I’m not surprised that they are defending identity and ethnicity as social.

But ethnicity is not entirely social, there definitely is a biological component to ethnicity. The biological component of ethnicity can be seen in the similar phenotype of an ethnic group. Without being too offensive, I think we can all agree that the shape of the eyes of people from China or Japan have a characteristic shape that differ from other ethnicities. Furthermore, for anyone with a background in forensic anthropology, shovel shaped incisors indicate native American decent. Honing in on the genetic aspect of race, in 1997, a Nature paper demonstrated that around ~40% of Jewish males who shared an oral tradition of being Cohanim, also shared a unique set of SNP called the Cohen Modal Haplotype which is on the Y chromosome. Tracing this haplotype, we see it is also present in Italian, Lemba, and Kurdish populations which tells us of integration patterns.

There’s plenty of other haplotypes and genetic markers that help identify backgrounds. Large projects such as the HapMap and National Geographic Genographic projects are collecting data and resolving more haplotypes and genetic markers that can be used in a wide variety of applications. In one situation that comes to my mind, we saw how the unique mtDNA composition of a man named Yu Hong matched the normal composition of Europeans. This genetic evidence correlated with the archaeological evidence helped fulfill our understanding of Yu Hong’s identity.

Ultimately, humans can be classified into groups based on our biological traits, both phenotypically and genetically. Just like we can classify groups based on their cultural traits, such as the style of an artifact, the linguistic similarities. It is not wrong to classify people into groups. It is wrong how people interpret these classifications and apply them, and it is just as wrong for people to shun away any discussion of classifying people!

So, I am surprised that Martin said ethnic essentialism is a thing of the past. It is not a thing of the past. Despite the reductionism shown by the American Anthropological Association’s understanding race project, discussions of ethnic essentialism are a very active and modern aspect within anthropology. Recently, we had our own lively and very informative discussion on race. I hope it to always be an active discussion too. I hope it never stops because it is a very important part of practicing science… to be inquisitive and open. We should always be questioning and investigating. And just because there was a popular movement to eradicate the concept of race decades ago, I don’t think we should dust off our hands and call the debate resolved. I hope that he will once recognize his comment reflects a level of ignorance; in their own way Martin and Small are effectively bullying away any re-evaluation of the biological basis for race because they live by the mantra that “it is an outdated way of thinking.”

Hat tip to Razib.

Written by Kambiz Kamrani

January 16, 2008 at 1:27 pm

How much do anthropologists make?

Aside from, “What college/university has the best anthropology program?” I get asked, “How much do anthropologists make?” I will never be able honestly and thoroughly answer either question, but thanks to John Hawks, I think we’ve all have a better idea what the national average salary is for an anthropologist in the United States.

According to data collected from three major sites that specialize in job finding and salaries, the average salary of someone employed in anthropology is $66,861. You can expect a growth rate of 4.9% in the salary a year, based on the data. Unfortunately they only classify ‘anthropologists’ as people who,

“study the origin, cultural development and behavior of humans, [and] recover artifacts to gather information about humans.”

I really don’t know how much of a salary difference there is for anthropologists who work in other fields such as population genetics, medical anthropology, etc. Nevertheless, I think you can now understand that for the most part anthropologists aren’t making the mega bucks…. well at least not in the Bay Area.

Written by Kambiz Kamrani

January 11, 2008 at 10:45 am

Posted in Blog

Tagged with ,

Follow

Get every new post delivered to your Inbox.

Join 681 other followers