General questions on assessing language evolution

I’ve read the short article on punctuated equilibrium in language evolution and have discussed it with some colleagues of mine. I’m assuming they don’t want me to publicize their name since this blog’s reputation for being coy and graceful isn’t what most would consider honorable. For that reason, I’m omitting their names, even though they didn’t explicitly ask for me to do so… But, I want to emphasize that the questions and concerns I’ll be discussing aren’t all originally mine. And to Simon and the crew, don’t sweat it, what we’re wondering doesn’t make up for a slam, just curiosities about how they went about this problem.

Here we go. In the supplemental materials associated with the paper, Simon and the other authors write that they tested for punctuational effects by looking at lexical divergence. Lexical divergence, in my understanding, is the process by which a word in the Swadesh list is completely different in phonology and syntax from other languages in the comparison. The authors even mention that lexical divergence is a replacement of words. This is a very important point of distinction that should be clear and simple: if a word completely differs from another word then it shows the languages are less related.

But the framework, cladistics, that Atkinson, Meade, Venditti, Greenhill, and Pagel used extract a pattern from the data has flaws. See, they write that they generated phylogenetic trees to,

“…describe the separated paths of evolution leading from a common ancestral language to a set of observed extant languages…”

Phylogenetic trees are constructed based on the amount of difference. In biology, phylogenetic trees or cladograms can be made many different ways. One way is to construct trees based upon the amount of genetic sequence difference. This method ran into a major road block when it was observed that organisms can generally inherit genes in two way, one was the tradition transfer from parent to offspring, which is called vertical gene transfer. The other way, known as horizontal or lateral gene transfer, in which genes jump between unrelated organisms.

Lateral gene transfer is a very common phenomenon in bacteria. These microbes are able to export sequences of genes to other ‘species’ of bacteria, which are then incorporated. Gene transfer is not just documented in bacteria, transfer of genes from bacteria to yeast has been well documented. More complex organisms such as the adzuki bean beetle and other arthopods (as well as nematodes) have somehow imported genetic material from an endosymbiont microbe, Wolbachia. It also happens in plants.

Suffice to say it has made classifying organisms based upon sequence difference a serious pain. Thankfully, an alternative has been found for drawing up phylogenetic trees using differences in genetic sequence — use ribosomal RNA, it ain’t transfered! But in the case of analyzing linguistic data, how can one screen out the ‘lateral transfer’ situation?

Here’s just one example of lateral transfer in linguistic data, and how it can mess things up. I speak Farsi. Farsi is a very old Indo-Iranian language that is understood as a foundation to many descending languages. I gave an example in my previous post of how the Farsi word for father related to Spanish and English. Farsi speakers have always tried to keep their linguistic identity cohesive… but Farsi has many influences. For example, the Arab conquests of Persia, shifted the linguistic ‘purity’ of Farsi… many Arabic words are now integral parts of Farsi. Likewise, a long standing history of French influence in Iran has brought many ‘borrowed’ words, such as merci. There’s many ways to say thank you in Farsi, but the most common and casual way to say thank you is merci… just like in French.

Understanding lateral transfer of genes created a conundrum in classifying living organisms, it can likewise create a conundrum in classifying and understanding patterns in languages. I’ve shown how this can be the case in Farsi, and thankfully the historical records can inform us of when external influences changed language and culture. So how did Atkins, et al. screen how lateral transfer or words?

They defined lateral transfer of words as borrowing, which is what it is. And,

“to determine whether the punctuational effects we observe could be attributable to borrowing between languages, we repeated the test procedures… using simulated lexical data derived from the programme TraitLab. TraitLab uses a stochastic-dollo model of cognate gain and loss and allows words to be borrowed between languages either globally (languages can borrow cognates from any other language) or locally (languages only borrow cognates from languages with which they share a most recent common ancestor within a specified time span). We simulated global and local borrowing allowing the chance of a word being borrowed to vary between 0% (no borrowing) and 50% (high rates of borrowing). None of the simulated data sets produced the positive values of β expected of a punctuational effect. The punctuational effects we observe in the real language data are thus unlikely to be caused by borrowing of lexical terms between languages.”

I’m surprised that a positive value wasn’t seen in the chance of no word being borrowed. Could it be possible that the Swadesh lists weren’t more thorough? I think so. I don’t want this to be a ‘you need a consistent sample’ argument but quite frankly it is. More words would increase the possibility to see a borrowed word or two.

Furthermore, the three different language families are pretty distinct, and geographically separate, but the Swadesh lists aren’t consistent in number. My estimates for the Bantu family shows that about 120 words were analyzed with some overlap in the other Swadesh lists of the other languages. But many of the same words aren’t represented in each Swadesh list. The Indo-European language group has 4 times as many words. The Austronesian language group has just as many, if not more. In order to do both a global and local comparison of example of language differences, a consistent large set of words must be used. That expands the project of course, but it raises an eyebrow when we consider that Rwanda has an outstanding history of colonization by Germany, the Maori have a horrifically vivid history with the British, etc.

I’ll wrap this post up by confirming that I see and understand the point that,

“Punctional language change may thus reflect a human capacity to rapidly adjust languages at critical times of cultural evolution, such as during the emergence of new and rival groups.”

All I’m saying is what if we had more words, would things be more punctional or would there be more noise because of borrowed words?

4 thoughts on “General questions on assessing language evolution

  1. Kambiz,

    Interesting post, thanks!

    First of all, we’ve (Mark Pagel’s lab, Ruth Mace’s research group, as well as my lab under Russell Gray), have been exploring these methods on languages and cultures for, oh, about a decade now. We’re very well aware of the problems caused by horizontal transmission of things between languages and cultures.

    However, in many cases it’s not a major problem. In this study, we used data from word lists (“Swadesh lists”) which contain 200 items of basic vocabulary. There are two reasons for this – first, this information is easy to get (I’ve collected over 500 on the site linked above). Second, and more importantly, these basic vocabulary items are thought to be more stable over time. For example, English is a mongrel of a language that has over 60% of its TOTAL lexicon borrowed from French or Romance languages. BUT, only around 6% of the English BASIC vocabulary is borrowed from French/Romance, whilst another ~10% comes from other languages. We also removed any identified loans before analysing the data.

    So – even in this really stable vocabulary, we see a lot of punctuated evolution. I would suspect, that we’d see even MORE punctualism in extended vocabulary, for the reasons you point to.

    Finally, as hinted by the paper by Q. that you cite above, we’ve done quite a lot of simulation work with horizontal transmission – I’m about to submit two more detailed papers on this. My results show that phylogenetic methods aren’t actually that messed up by horizontal transmission. If it’s NOT systematic, then it’s just noise. If it’s systematic (i.e. lots of vocab from one language shifting into another) then it’s more problematic, BUT can be identified and dealt with.

    You may be interested in this paper which we just finished which covers this debate in more detail.


  2. This discussion presents an interesting paradigm which scientists encounter in virtually any study: how much can we limit and focus our data pool before the results skew themselves to the desired outcome? Linguists often point to the shortcomings of Greenburg’s (Not Greenhill’s) work in Macro-Historical Linguistics because he picks and chooses from data sets that have the desired result while omitting ones that contradict his theories. Shliemann went looking for Troy; when he found something, it was Troy.
    The criticism of this study however may be unfounded as borrowed words would only appear as a form of punctuated equilibrium in the data. How much faster can a word take precident than one that comes with a little imperialism? The removal of possible false positives in this case appears to be just, especially considering the coorilation observed in the true cognates.
    If you were aiming to criticize the methodology, perhaps you might take a stab at Glottochronology, as Lyle Campbell has in chapter 6 of Historical Linguistics.

Comments are closed.

A Website.

Up ↑

%d bloggers like this: