Last month, I read and reviewed two papers that correlated the genetic and linguistic structure of European populations to their geographic locations. A new paper came out several days ago in Nature which announces a new model where the genetics of 3,200 Europeans is used to predict their country of origin, with an accuracy of a few hundred kilometers.
The paper, “Genes mirror geography within Europe,” comes from John Novembre and crew. They got their data from GlaxoSmithKline. 500,000 SNPs were compared and when they were plotted after two principle component analyses, the authors were able to see distinct clustering of geographically distinct populations. This is the image they provided.
Placing a map under this distribution shows how distinct the Iberian peninsula is from the Italian peninsula and within the ‘mixing pot’ French, German and Italian-speaking Swiss individuals are genetically separated. To touch on the accuracy issue again, with countries that had more individuals represented in the sample, this new method could pinpoint their origins to 310 kilometers. On average, the accuracy was able to identify an individual’s origins to 540km.
As P-ter of Gene Expression points out, the biggest restriction to this study is the array of SNPs on the GeneChip and the number of individuals sampled. With higher resolution GeneChips, ideally full genomes, and larger samples, we’ll be able see much more accurate genetic-geographic separations of populations.
- John Novembre, Toby Johnson, Katarzyna Bryc, Zoltán Kutalik, Adam R. Boyko, Adam Auton, Amit Indap, Karen S. King, Sven Bergmann, Matthew R. Nelson, Matthew Stephens, Carlos D. Bustamante (2008). Genes mirror geography within Europe Nature DOI: 10.1038/nature07331