Genetic, Geographic, And Linguistic Structure Of European Populations

Both Razib and Dienekes have put a posts about this new Current Biology paper, “Correlation between Genetic and Geographic Structure in Europe.” The authors of the paper compare the genetic make up of 2,514 individuals from Europe using the Affymetrix GeneChip Human Mapping 500K Array Set.

Correlation between Genetic and Geographic Structure in Europe
Correlation between Genetic and Geographic Structure in Europe

Always the over achiever of science blogging, Razib has dutifully labeled the populations on the graph. His modifications help better visualize the genetic similarities and differences among and between the European populations tested. And there are some interesting patterns. There’s a similarity among northern European populations as well as a similarity among southern European populations.

Fins tested are the least similar group to other European populations. Swedes and Spanish people are clearly different, while the Irish and British share a lot of admixture among the 500,000 SNPs tested. So what does that all mean? This result indicates that there is a genetic component to European ethnic groups.

Not entirely surprising, because in 2006, we saw the open access journal PLoS Genetics publish a typing of 5,000 SNPs among about 1,000 Europeans and European Americans. In that paper, the researchers were able to resolve the genetic differences between northern and southern European groups. Image below. Also, in January of this year I read and reviewed two papers that did similar tests, comparing 300,000 SNPs between approximately 4,198 European Americans. After some principal component analyses (PCA), there was a clear distinction between individuals with northern from southern European ancestry, as well as separation of Italian, Spanish, and Greek individuals from those of Ashkenazi Jewish ancestry.

European Population Substructure
European Population Substructure

PLoS Genetics has also recently published a similar paper, “Tracing Sub-Structure in the European American Population with PCA-Informative Marker,” which announces a purely computational method of identifying ancestry — one that doesn’t require a poll of the individuals’ identified ethnic background. The researchers analyzed 1,521 individuals for more than 300,000 SNPs across the entire genome.

While not as robust of a data set as the Current Biology paper, the authors were able to pluck out 200 ancestry informative SNPs that accurately predict fine structures in European American datasets, as identified by PCA. They did so by removing any redundant SNPs uncovered during the modeling process. Moreover, much of the genetic variation identified were between the northern and southern European ancestry groups.

Going back to the ‘is this surprising?’ point, in 1990, Barbujani et al. noted the delineation of northern and southern Europeans between the distribution of 63 allele frequences, in “Zones of sharp genetic change in Europe are also linguistic boundaries,” and attributed the language affiliation of European populations playing a major role in maintaining and probably causing genetic differences. Makes sense.

    LAO, O., LU, T., NOTHNAGEL, M., JUNGE, O., FREITAGWOLF, S., CALIEBE, A., BALASCAKOVA, M., BERTRANPETIT, J., BINDOFF, L., COMAS, D. (2008). Correlation between Genetic and Geographic Structure in Europe. Current Biology DOI: 10.1016/j.cub.2008.07.049
    Paschou, P., Drineas, P., Lewis, J., Nievergelt, C.M., Nickerson, D.A., Smith, J.D., Ridker, P.M., Chasman, D.I., Krauss, R.M., Ziv, E., Pritchard, J.K. (2008). Tracing Sub-Structure in the European American Population with PCA-Informative Markers. PLoS Genetics, 4(7), e1000114. DOI: 10.1371/journal.pgen.1000114

7 thoughts on “Genetic, Geographic, And Linguistic Structure Of European Populations

  1. The most interesting conclussion I get from that graph (that, btw, should be expanded 20% in the horizontal axis to be proportional) is that an E-W axis does exist in central/northern Europe as well.

    In contrast, Baucher et al. (2007) results suggested (in the comparable PC graph) a Mediterranean E-W axis with northern Europeans appearing much more homogenous. Not sure why these differences but may be due to sample bias.

    The best thing about this study is that intermediate populations (French, Austrians, Hungarians) are not anymore ignored, what makes the overall result much more clearly clinal.

    Nevertheless, I believe sampling should be more concerned not about state boundaries but about actual regional historical differences, not ignoring population sizes either. Hence large heterogenous countries like France, Spain or Britain should probably get regionally representative samples. It is impossible that any single French sample may represent the complex genetic landscape of that state, for instance. Another confusion factor derives from the lack of sampling in Eastern Europe and West Asia.

  2. First of all, the first PC has 31.6% of the variation, so no need to expand the graph.

    Secondly, the north here shows an E-W cline because of the greater amount of markers used. At the same time, unlike in Bauchet, we’ve got here Northern Greeks as opposed to Greeks, and no Basques or Armenians. Thus the E-W cline in the south looks smaller in this paper.

    Btw, here’s a figure I found on the net from the paper that does a better labelling job than Razib…

  3. …unlike in Bauchet, we’ve got here Northern Greeks as opposed to Greeks, and no Basques or Armenians. Thus the E-W cline in the south looks smaller in this paper.

    In Bauchet-2007 the southern cline was mostly caused by Greek vs. Spanish duality, very clear in the K-means clustering. Basques instead looked weirdly spread around in the PC map because their specificity did not seem to weight in either PC and were therefore subject to minority elements’s randomness, that made them appear intermediate between Spanish and Brits. Basque specifity (very clear) could only be detected in the K-means clustering, and only because they were an important sample (smaller samples are likely to see their possible specificty undetected).

    In this paper instead the Greek-Spanish duality seems almost totally supressed, and the Spanish-Italian difference (many in Bauchet) has fully vanished.

    I find these differences most suspicious, specially when Bauchet’s study so clearly determined that the main component of Iberians was so distinct from that of Greeks.

    PC stats are only somewhat meaningful and are largely dependent on samples and their relative weight. When you sample a large region, only the most widespread components at regional level are likely to show up, while local specificties, even if very dominant locally, won’t.

    For that reason I strongly prefer Bayesian K-means clustering, specially when it reaches enough depth. For the case of Europe, so far, Baucher 2007 is the best I have seen so far and it has many shortcomings). But much greater depth has been achieved for instance among Native Americans. I hope to see also a K=16 or greater Bayesian structure analysis of Europe or (better) West Eurasia soon – and I hope it includes some Eastern European and Balcanic samples.

  4. Luis: “Nevertheless, I believe sampling should be more concerned not about state boundaries but about actual regional historical differences, not ignoring population sizes either. Hence large heterogenous countries like France, Spain or Britain should probably get regionally representative samples.”

    I completely agree. From the perspective of deep history, there is no such thing as France, Spain, Italy, Britain, or even Albania, Bulgaria, Georgia, etc. From a musical perspective, we see a clear correlation between singing style and residence in “refuge” areas, such as mountains, islands, forests, a correlation that might well distinguish “Old Europe” from Indoeuropean Europe, to use Gimbutas’ terminology. Future genetic research in Europe (and elsewhere) will need to focus on such distinctions if we are to expect a meaningful reconstruction of the earliest phase of “modern” human history.

  5. Former Yugoslavia??? What does this mean?? If they are referring to FROM or Serbia than these people are different genetically to Croats, Bosnians and Slovenians. I don’t know how they could be lumped into one group a the study suggests. There is no mention what proportions of each “former Yugoslav country” were sampled.

Comments are closed.

A WordPress.com Website.

Up ↑

%d bloggers like this: