Fighting the mantra, “People vary more within the groups than vary between groups”

In light of a discussion between Razib and Martin, I recently took arms and battled the concepts behind race and identity and how human genetic variation plays a role in forming these concepts. In the comments, I was disgusted to read Martin throw in this rhetorical line in his comment,

“Genetics have long ago shown that people vary more within the major racial groups than these groups vary among themselves.”

I’ve heard this so many times that I want to puke. It means nothing to me. Based on this line and others he’s said in his discussions with Razib, I’ve picked up that he’s a big proponent of the “if it sounds like a mantra, then it must be true.” In the following post, I will debunk this mentality. Several new publications have just come out in PLoS Genetics that show exactly how genetics can help identify groups, especially groups that are not demarcated by major social and phenotypic differences.

All of these publications, three in total actually, focus on identifying genetic markers to help identify populations of Europeans (there’s a bonus one on the genetic structure of Polynesians). They are open access, so you have freely readable first hand literature to follow along. As you know Europeans are often viewed as a homogeneous category of classification. We have a wealth of evidence that tells us of cultural admixture, wars, migrations, formations and declines of new states, etc. over thousands of years of European history. All of these social mechanisms have left an imprint on the cultural, biological, and linguistic composition of Europe. To further complicate things, and addressing the identity crisis Martin brought up when he stated that he’s a Swede, I’ve seen US census reports just lump every of European descendant as ‘white.’ Such a label actually groups together multiple populations, which have diverse origins due to the complex history of Europeans.

The first genetic dissection of the population structure of European Americans that I will share with you involved a lot of researchers collaborating together. They focused their work on identifying the contributions from different genetic ancestries. Why did they do it? Like I said earlier, the primary motivation was to identify genes that can be associated with disease. Here’s an excerpt of the abstract that is useful in demonstrating how genetic markers can help identify groups of people,

“Here, we investigate empirical patterns of population structure in European Americans, analyzing 4,198 samples from four genome-wide association studies to show that components roughly corresponding to northwest European, southeast European, and Ashkenazi Jewish ancestry are the main sources of European American population structure. Building on this insight, we constructed a panel of 300 validated markers that are highly informative for distinguishing these ancestries.

A sample size of 4,200 is large, folks. They could find and validate 300 markers that can distinguish regional ancestries, which helps narrow down ethnicity. But that’s not good enough, rather, if we had more markers we could classify more people who carrier similar markers into the ethnic groups. Well a second publication from a collaborating group did exactly that,

“European population genetic substructure was examined in a diverse set of >1,000 individuals of European descent, each genotyped with >300 K SNPs. Both STRUCTURE and principal component analyses (PCA) showed the largest division/principal component (PC) differentiated northern from southern European ancestry. A second PC further separated Italian, Spanish, and Greek individuals from those of Ashkenazi Jewish ancestry as well as distinguishing among northern European populations. In separate analyses of northern European participants other substructure relationships were discerned showing a west to east gradient.”

So moving from 300 to 300,000 SNPs increased the resolution immensely. Now people could be classified as Italian, Spanish, yada yada you read the list ancestry. Not bad, at all. These markers are labeled as ancestry informative markers or AIMs. These AIMs have been mixed and matched because doing so helps…

“.. distinguish the ancestries of these genetically very similar populations, whose real or perceived group differences may often be dominated by environmental, social, and cultural factors. Below, we outline the possible choices of marker sets for inferring various ancestries. In each case, a method such as structured association or principal components analysis can be applied to genotype data to correct for stratification.

To correct for stratification along the north–south (or northwest–southeast) cline, either the Price100 or Tian192 marker sets can be used. (The Tian192 markers, which were ascertained using northern European versus Ashkenazi Jewish ancestry, are effective in distinguishing north–south ancestry because southern Europeans attain intermediate ancestry values as compared to values at one extreme for northern Europeans.) To correct for stratification involving both north–south and Ashkenazi Jewish ancestry, one option is to use the Price100+Price200 marker sets, which together separate north, south, and Ashkenazi ancestry into three distinct clusters. Another option is to use the Tian192 marker set, which models these three ancestries along a single axis and will be sufficient in the case that the phenotype being analyzed has intermediate values for southern European as compared to northern European versus Ashkenazi Jewish ancestry. Finally, to correct for stratification involving a west–east gradient within northern Europe (e.g., Irish versus other northern European ancestry), the Tian1211 marker set is the only set of AIMs available.”

So there you have it, by increasing the screen to search for more markers and using different combinations of markers, the researchers were able to identify genetic similarities and differences between groups of genetically similar people. I don’t know how anyone can go about saying something like “people vary more within the major racial groups than these groups vary among themselves.” If it can be done in perceived homoegenous European groups, it can be done elsewhere too. It is probably being done elsewhere…

Maybe the reason why it hasn’t been done before is because we just haven’t had as many SNPs to screen for in the past. I attribute this shortcoming as one of the sources of the misinformation Martin reiterated. With more and more people studying human variation, more and more people will be sampled. As sample sizes increase, and as projects like the HapMap and Genographic projects expand, I imagine we’ll identify tons of SNPs and markers. Also, in the past, I imagine the mtDNA was the only easily researchable locus to screen for genetic variation and diversity… and mtDNA is small and does not store nearly as much variation as nuclear DNA. That could be where Martin picked up the “long ago” portion of his statement. Long ago in genetics was when mtDNA was the only accessible and reliable thing to study and that wasn’t that long ago. But we now look for variation in the nuclear genome of humans, which contains many more base pairs and many more heritable markers.

As kinda icing on the cake, I want to move away from European populations and shed the spot light onto human genetic diversity of the Pacific because as the authors write in their abstract,

“Human genetic diversity in the Pacific has not been adequately sampled, particularly in Melanesia. As a result, population relationships there have been open to debate.”

This is a cool study that’s made the rounds on a couple blogs. If you may not have caught it before, let me summarize the study for you. The study was just as comprehensive as the European genetic studies. In involved almost 1,000 individuals from 41 populations. Using more than 800 genetic markers the results revealed that Polynesians and Micronesians have almost no genetic relation to Melanesians, rather Polynesians’ and Micronesians’ closest relationships are to Taiwan Aborigines and East Asians. And that groups that live in the islands of Melanesia are remarkably diverse. The research also suggests that the ancestors of Polynesians moved through Island Melanesia relatively rapidly.

In conclusion, very recent papers are telling us exactly the opposite of what Martin said. Furthermore, I brought this up in my previous human genetic variation post, but I gotta bring it up again because it happened again, I really don’t appreciate this ‘long ago’ academic arrogance expressed when people say “long ago, we anthropologists decided race was a social construct,” or “long ago, genetics confirmed human don’t vary much.” Phrasing statements like that imply that if I think otherwise then I’m dated, I’m not with the times. How unscientific. It is actually those who do not refresh their knowledge and keep current with advances in population studies that look like the dated and uneducated ones.

12 thoughts on “Fighting the mantra, “People vary more within the groups than vary between groups”

  1. Ok, please be patient with me as I am just a lay-reader with no specialized knowledge.

    It seems to me that your post does not contradict the mantra of “more variation within a broad category than between categories”.

    What I get out of your article (and please provide correction if you’d like) is this: If you take, say, a group of Europeans, you can keep refining your test so that you can make finer distinctions between subgroups of Europeans. That is, you can find stuff that, say southern Italians and Finns have in common, but stuff that distinguishes them too.

    The presumably, you can do the same with, say, Africans.

    So to me, you are demonstrating that there are large “variations” within the “broadly defined” groups (white, black), no?

    Am I missing something?

  2. I recently addressed this same point on my blog:

    The 85% truism

    How much of the genome varies within our species? The question remained unanswered in my last post. Hawks et al (2007) have recently estimated that at least 7% of our genome has changed over the last 40,000 years—a period that has seen humans move into diverse environments with different selection pressures. Yet this is a minimal estimate that excludes much variation that may or may not be due to natural selection. The real figure could be higher. Much higher.

    How is this genetic variation distributed among humans? Is it evenly scattered? Or does it form geographic clusters? Intuitively, the second answer seems more correct: This variation should be very unevenly distributed if it is due to humans settling in diverse environments with different selection pressures. It should occur primarily at the transition from one ecological zone to another or from one cultural zone to another (e.g., from agriculturalists to hunter-gatherers).

    Yet this is not what we see in the data. If we look at genetic markers (blood types, serum proteins, enzymes, etc.), we consistently find far more variation within human populations than between them. And this is true not only for large ‘continental’ groups but also for smaller local populations. In a landmark paper, Richard Lewontin (1972, p. 397) concluded that 85% of human genetic variation exists only between individuals and not between populations:

    “It is clear that our perception of relatively large differences between human races and subgroups, as compared to the variation within these groups, is indeed a biased perception and that, based on randomly chosen genetic differences, human races and populations are remarkably similar to each other, with the largest part by far of human variation being accounted for by the differences between individuals.”

    This finding is true. Like many findings, however, it does not necessarily mean what we think it means. This became apparent when geneticists looked at genetic markers in other animals, such as dogs:

    “… genetic and biochemical methods … have shown domestic dogs to be virtually identical in many respects to other members of the genus. … Greater mtDNA differences appeared within the single breeds of Doberman pinscher or poodle than between dogs and wolves. Eighteen breeds, which included dachshunds, dingoes, and Great Danes, shared a common haplotype and were no closer to wolves than poodles and bulldogs. These data make wolves resemble another breed of dog.

    … there is less mtDNA difference between dogs, wolves, and coyotes than there is between the various ethnic groups of human beings, which are recognized as a single species.” (Coppinger & Schneider, 1995)

    One could object that humans have created dog breeds using a limited set of criteria that reflect a limited set of genes. Therefore, all other criteria, especially those not visible to the eye, should vary independently of breed. The category ‘breed’ is thus an artificial construct that human selection, and not natural selection, has imposed on canine genetic variability.

    This objection is not wholly true. Many breeds, such as dingoes, originated in prehistory long before kennel clubs. More to the point, if one argues that human selection acts on a limited set of genes, the implication is that natural selection acts on the entire genome. It doesn’t. Natural selection also acts on a limited set of genes, often a larger set than the one used by dog breeders, but still much smaller than the entire genome.

    This point can be illustrated with non-canine examples. Considerable genetic overlap exists not only between breeds of dogs but also between many anatomically and behaviorally distinct species. In the deer family, genetic variability is greater within some species than between some genera (Cronin, 1991). Some masked shrew populations are genetically closer to prairie shrews than they are to other masked shrews (Stewart et al., 1993). Only a minority of mallards cluster together on an mtDNA tree, the rest being scattered among black ducks (Avise et al., 1990). All six species of Darwin’s ground finches seem to form a genetically homogeneous genus with very little concordance between mtDNA, nuclear DNA, and morphology (Freeland & Boag, 1999). In terms of genetic distance, redpoll finches from the same species are not significantly closer to each other than redpolls from different species (Seutin et al., 1995). Among the haplochromine cichlids of Lake Victoria, it is extremely difficult to find interspecies differences in either nuclear or mitochondrial genes, even though these fishes are well differentiated morphologically and behaviorally (Klein et al., 1998). Neither mtDNA nor allozyme alleles can distinguish the various species of Lycaedis butterflies, despite clear differences in morphology (Nice & Shapiro, 1999). An extreme example is a dog tumor that has developed the ability to spread to other dogs through sexual contact: canine transmissible venereal sarcoma (CTVS). It looks and acts like an infectious microbe, yet its genes would show it to be a canid and, conceivably, some beagles may be genetically more similar to it than they are to Great Danes (Cochran, 2001; Yang, 1996).

    Does this seem paradoxical? Let’s review how organisms become different from each other through natural selection. This typically happens when a group buds off from its parent population and colonizes a new environment. The environment may be another ecosystem, another mode of subsistence or even, as with CTVS, another form of existence. As the group adapts to its new environment, it will begin to diverge anatomically and behaviorally from its parent population, in part because the environmental boundary hinders gene flow between them but more importantly because the pressures of natural selection are no longer the same. The two populations will evolve differently because what is useful in one environment may not be in the other. And vice versa.

    Will these differences in selection affect the entire genome? No. For one thing, most genes have low selective value, some being little more than junk DNA. For another, many genes code for traits that are equally useful in a wide range of environments. The ‘building block’ proteins of human flesh and blood are largely identical to those of non-human primates and sometimes even non-primate mammals (King & Wilson, 1975).

    Thus, only a fraction of the genome changes when one population differentiates from another in response to differences in natural selection. The rest remains unchanged, either because the genes have little selective value or because they handle adaptive problems that are common to both populations. Over most of the genome, then, variability is due not to adaptive differences created by different selection pressures but rather to non-adaptive variations that similar selection pressures have left in place.

    Of course, once the two populations have become reproductively isolated, they will no longer accumulate the same non-adaptive variations and their entire genomes will drift steadily apart. But this takes time. Redpoll finches diverged into two species some 50,000 years ago and have distinct phenotypes, yet their mitochondrial DNA reveals a single undifferentiated gene pool (Seutin et al., 1995). It’s no surprise, then, that human populations exhibit so much genetic overlap. They began to move apart only 40,000 or so years ago (Pritchard et al., 1999).


    Avise, J.C., C.D. Ankney, and W.S. Nelson. (1990). Mitochondrial gene trees and the evolutionary relationship of mallard and black ducks. Evolution, 44, 1109-1119.

    Cochran, G. (2001). Personal communication.

    Coppinger, R. and R. Schneider (1995). Evolution of working dogs. In J. Serpell (ed.), The Domestic Dog: Its Evolution, Behaviour and Interactions with People. Cambridge: Cambridge University Press, pp. 21-47.

    Cronin, M. (1991). Mitochondrial-DNA phylogeny of deer (Cervidae). Journal of Mammalogy, 72, 533-566.

    Freeland, J.R. and P.T. Boag. (1999). The mitochondrial and nuclear genetic homogeneity of the phenotypically diverse Darwin’s ground finches. Evolution, 53, 1553-1563.

    Hawks, J., E.T. Wang, G.M. Cochran, H.C. Harpending, and R.K. Moyzis. (2007). Recent acceleration of human adaptive evolution. Proceedings of the National Academy of Sciences (USA) early view.

    King, M-C. and A.C. Wilson. (1975). Evolution at two levels in humans and chimpanzees. Science, 188, 107-116.

    Klein, J., A. Sato, S. Nagl, and C. O’hUigin. (1998). Molecular trans-species polymorphism. Annual Review of Ecology and Systematics, 29, 1-21.

    Lewontin, R.C. (1972). The apportionment of human diversity. Evolutionary Biology, 6, 381-398.

    Nice, C.C. and A.M. Shapiro. (1999). Molecular and morphological divergence in the butterfly genus Lycaeides (Lepidoptera: Lycaenidae) in North America: evidence of recent speciation. Journal of Evolutionary Biology, 12, 936-950.

    Pritchard, J.K., M.T. Seielstad, A. Perez-Lezaun, and M.W. Feldman. (1999). Population growth of human Y chromosomes: A study of Y chromosome microsatellites.” Molecular Biology and Evolution, 16, 1791-1798.

    Seutin, G., L.M. Ratcliffe, and P.T. Boag. (1995). Mitochondrial DNA homogeneity in the phenotypically diverse redpoll finch complex (Aves: Carduelinae: Carduelis flammea-hornemanni). Evolution, 49, 962-973.

    Stewart, D.T., A.J. Baker, and S.P. Hindocha. (1993). Genetic differentiation and population structure in Sorex Haydeni and S. Cinereus. Journal of Mammalogy, 74, 21-32.

    Yang, T.J. (1996). Parasitic protist of metazoan origin, Evolutionary Theory, 11, 99-103.

  3. Ollie, you asked an excellent question that I have a hard time explaining. So if I confuse you, I apologize in advance.

    In regards to your conclusion, this data shows the opposite. The whole “people are more genetically varied within a group than between groups” idea comes from the notion that there’s a lot of gray area in defining what genetically makes up group.

    Within any group, it has been presumed that little to no clear genetic boundaries exist. That’s because in the past, comparisons used did not use as many loci as we now have available. Comparison also used a much lower amount of genetic markers. When only a few markers are compared the probability that marker is shared with outsiders of the group is increased, leading to the idea that no clear group can be defined since outsiders share similar markers with assumed insiders.

    As more and more genetic markers are identified, the probability that a marker falls outside the group is reduced. The more markers identified the more there is to identify a group as a group. So, these reports show that if we expand array of genetic markers, we’ll see that people within a group share more genetic markers and the grey area that abstracted defining the group is reduced.

    Alternatively, let’s drop the genetic jargon cause sometimes it makes my own mind dizzy. Let’s consider this hypothetical but analogous alternative:

    Many individuals in population A are observed with a unique skeletal trait, an extremely long and robust right humerus. Comparing this trait with other populations show that some individuals in population B and C also exhibit this trait. The trait isn’t as uniformly distributed in population A as it is in B and C, but since it exists in population B and C we can’t define population as a unique group. But reanalyzing the skeletal traits of population B, we also see that a long and robust right humerus is also linked with a short and gracile left femur, an extra rib, and a fusion in the carpal bones. Comparing these new traits to population B and C, we now see that this composition of all the traits exist exclusively in population A and thus we can define population A as a unique population with more confidence.

    Now imagine this scenario but apply it in genetic terms rather than finding 3 more linked traits, we find 1,000 more genetic variations that link a group.

    Does that clarify it for you?


  4. Kambiz: thank you; I wasn’t understanding what you meant by variation.

    Yes, your explanation was very clear.

    This was the point of my confusion: let’s say that you were going to compare, say, the mean height of the adult male.

    Now if you took Africans, you’d see a huge statistical variation (due to, say, the Bushmen) within the group.

    If you took Europeans, you might also see some variation.

    But if you compared the means of the Africans to the means of the Europeans, you wouldn’t see much difference; the intergroup variation would be far greater.

    Now as far as what your article says: it seemed to tell me that if you analyze more data (the genetic markers), you can make sharper distinctions between ever smaller subgroups (e. g., distinguish northern Italians from the southern ones); that means, to me, that within the block of Europeans, you can see great variation if you take in more variables (the variables being the genetic markers).

    I understood that to be variation within the European subset.

    You can do the same within the African subset.

    But if you managed to take some sort of “average” (say, stuff that the smaller European groups had in common) and compared it with the genetic markers that the African subgroups had in common, you’d see less of a difference.

    So, in my case, I had a misunderstanding of the mantra that you were debunking to begin with. :)

    Thanks for your patience!

  5. “Yet this is not what we see in the data. If we look at genetic markers (blood types, serum proteins, enzymes, etc.), we consistently find far more variation within human populations than between them.”

    This is true too if you do the same test with apes. There is far more variation within Apes AND Humans populations than between Apes AND human populations

  6. My immediate reaction to this would be to say that the fact that I can find genetic markers that predictably correlate to races or ethnic groups within races does not say anything about how much variation there is between those races/groups, as opposed to the variation within those races/groups. It says that enough variation exists to differentiate between groups, and that, taken together, it’s predictable. I’ll leave to one side the fact that the “how much difference” that matters to people (including biologists, in the real world) is not something that can be quantitatively demonstrated — at least not yet. Of course this would also mean that the mantra objected to is wrong and objectionable. But being able to tell me whether I am Spanish on the basis of a DNA sample does not necessarily speak to this question, as far as I can see.

  7. The thing I don’t understand about the argument that “new breakthroughs in genetics imply race is not a social construct” is what justifies the use of categories of modern national identity to identify the “races” whose genetic basis is to be revealed. That is, while I might be persuaded to believe that a certain amount of information concerning the geography of human migration over the past 200,000 years is recoverable from DNA analysis, I remain pretty skeptical that the geography of, say, modern Spain is somehow genetic rather than an outcome of purely contingent historical processes.

  8. “This is true too if you do the same test with apes. There is far more variation within Apes AND Humans populations than between Apes AND human populations”

    Is it possible to find a person and a chimp that are, overall, more genetically similar than two chimps or two persons?

  9. It all depends on which markers are chosen for the comparison, and the nature of the populations. Under the generally accepted OOA model, Africa has the greatest genetic diversity within and between regions, and peoples, all built-in. Other populations worldwide are derived as sub-sets of this diversity. (Tishkoff 2000, 2009) So to say that greater difference within groups verus between groups is a myth is patently false, Africa being a case in point.

    A great deal depends on how the categories are manipulated and who is manipulating the definitions. True red hair for example is comparatively rare worldwide, and is primirily confined to Northern Europe- particularly the British isles. Yet few claim that the peoples of the British Isles are “another race” as a result of this trait. Unfortunately however, there is a double standard when it comes to African populations, and various “marker” traits are used to claim “different” races or what have you to an extent not seen when European populations are being dealt with. Arbitrarily defining particular traits or DNA element as ‘European” or “Asian” is all part of this game.

    Now of course DNA analysis can be broken down to find ever finer distinctions within groups, but it can also find more overlap as well. The PN2 clade of Haplogroup E for example links numerous African populations together from Cape to Cairo, shattering the boundaries of phenotypically defined “races.” The PN2 transition of course is not the last word. It is an important part in the genetic mix alongside others, nevertheless it does illustrate that DNA analysis can demonstrate broader unity and overlap.

    Same goes for cranio-facial analysis oftenused in conjunction with DNA analysis. Again, Africans not only show greater intra-regional and inter-regional diversity, but there is substantial overlap as well between data derived from african populations and others. This is expected and can be predicted under the OOA model. Narrow noses for example appear among the oldest populations of Africa, among people with tropical body proportions, showing that such traits, often arbitrarily defined as “Caucasian” do not depend on any “race mix” to occur. At the same time people a few dozen miles away can be found with broad noses. The point again is that greater variation WITHIN groups is by no means a “myth.” It is confirmed by both DNA and skeletal analyses.

Comments are closed.

A Website.

Up ↑

%d bloggers like this: