, , , ,

I’ve read the short article on punctuated equilibrium in language evolution and have discussed it with some colleagues of mine. I’m assuming they don’t want me to publicize their name since this blog’s reputation for being coy and graceful isn’t what most would consider honorable. For that reason, I’m omitting their names, even though they didn’t explicitly ask for me to do so… But, I want to emphasize that the questions and concerns I’ll be discussing aren’t all originally mine. And to Simon and the crew, don’t sweat it, what we’re wondering doesn’t make up for a slam, just curiosities about how they went about this problem.

Here we go. In the supplemental materials associated with the paper, Simon and the other authors write that they tested for punctuational effects by looking at lexical divergence. Lexical divergence, in my understanding, is the process by which a word in the Swadesh list is completely different in phonology and syntax from other languages in the comparison. The authors even mention that lexical divergence is a replacement of words. This is a very important point of distinction that should be clear and simple: if a word completely differs from another word then it shows the languages are less related.

But the framework, cladistics, that Atkinson, Meade, Venditti, Greenhill, and Pagel used extract a pattern from the data has flaws. See, they write that they generated phylogenetic trees to,

“…describe the separated paths of evolution leading from a common ancestral language to a set of observed extant languages…”

Phylogenetic trees are constructed based on the amount of difference. In biology, phylogenetic trees or cladograms can be made many different ways. One way is to construct trees based upon the amount of genetic sequence difference. This method ran into a major road block when it was observed that organisms can generally inherit genes in two way, one was the tradition transfer from parent to offspring, which is called vertical gene transfer. The other way, known as horizontal or lateral gene transfer, in which genes jump between unrelated organisms.

Lateral gene transfer is a very common phenomenon in bacteria. These microbes are able to export sequences of genes to other ‘species’ of bacteria, which are then incorporated. Gene transfer is not just documented in bacteria, transfer of genes from bacteria to yeast has been well documented. More complex organisms such as the adzuki bean beetle and other arthopods (as well as nematodes) have somehow imported genetic material from an endosymbiont microbe, Wolbachia. It also happens in plants.

Suffice to say it has made classifying organisms based upon sequence difference a serious pain. Thankfully, an alternative has been found for drawing up phylogenetic trees using differences in genetic sequence — use ribosomal RNA, it ain’t transfered! But in the case of analyzing linguistic data, how can one screen out the ‘lateral transfer’ situation?

Here’s just one example of lateral transfer in linguistic data, and how it can mess things up. I speak Farsi. Farsi is a very old Indo-Iranian language that is understood as a foundation to many descending languages. I gave an example in my previous post of how the Farsi word for father related to Spanish and English. Farsi speakers have always tried to keep their linguistic identity cohesive… but Farsi has many influences. For example, the Arab conquests of Persia, shifted the linguistic ‘purity’ of Farsi… many Arabic words are now integral parts of Farsi. Likewise, a long standing history of French influence in Iran has brought many ‘borrowed’ words, such as merci. There’s many ways to say thank you in Farsi, but the most common and casual way to say thank you is merci… just like in French.

Understanding lateral transfer of genes created a conundrum in classifying living organisms, it can likewise create a conundrum in classifying and understanding patterns in languages. I’ve shown how this can be the case in Farsi, and thankfully the historical records can inform us of when external influences changed language and culture. So how did Atkins, et al. screen how lateral transfer or words?

They defined lateral transfer of words as borrowing, which is what it is. And,

“to determine whether the punctuational effects we observe could be attributable to borrowing between languages, we repeated the test procedures… using simulated lexical data derived from the programme TraitLab. TraitLab uses a stochastic-dollo model of cognate gain and loss and allows words to be borrowed between languages either globally (languages can borrow cognates from any other language) or locally (languages only borrow cognates from languages with which they share a most recent common ancestor within a specified time span). We simulated global and local borrowing allowing the chance of a word being borrowed to vary between 0% (no borrowing) and 50% (high rates of borrowing). None of the simulated data sets produced the positive values of β expected of a punctuational effect. The punctuational effects we observe in the real language data are thus unlikely to be caused by borrowing of lexical terms between languages.”

I’m surprised that a positive value wasn’t seen in the chance of no word being borrowed. Could it be possible that the Swadesh lists weren’t more thorough? I think so. I don’t want this to be a ‘you need a consistent sample’ argument but quite frankly it is. More words would increase the possibility to see a borrowed word or two.

Furthermore, the three different language families are pretty distinct, and geographically separate, but the Swadesh lists aren’t consistent in number. My estimates for the Bantu family shows that about 120 words were analyzed with some overlap in the other Swadesh lists of the other languages. But many of the same words aren’t represented in each Swadesh list. The Indo-European language group has 4 times as many words. The Austronesian language group has just as many, if not more. In order to do both a global and local comparison of example of language differences, a consistent large set of words must be used. That expands the project of course, but it raises an eyebrow when we consider that Rwanda has an outstanding history of colonization by Germany, the Maori have a horrifically vivid history with the British, etc.

I’ll wrap this post up by confirming that I see and understand the point that,

“Punctional language change may thus reflect a human capacity to rapidly adjust languages at critical times of cultural evolution, such as during the emergence of new and rival groups.”

All I’m saying is what if we had more words, would things be more punctional or would there be more noise because of borrowed words?