Languages, like genes, can tell evolutionary tales
Talk is cheap, but scientific value lurks in all that gab. Words cascading out of countless flapping gums contain secrets about the evolution of language that a new breed of researchers plan to expose with statistical tools borrowed from genetics.
For more than a century, traditional linguists have spent much of their time doing fieldwork — listening to native speakers to pick up on words with similar sounds, such as mother in English and madre in Spanish, and comparing how various tongues arrange subjects, verbs, objects and other grammatical elements into sentences. Such information has allowed investigators to group related languages into families and reconstruct ancestral forms of talk. But linguists generally agree that their methods can revive languages from no more than 10,000 years ago. Borrowing of words and grammar by speakers of neighboring languages, the researchers say, erases evolutionary signals from before that time.
Now a small contingent of researchers, many of them evolutionary biologists who typically have nothing to do with linguistics, are looking at language from in front of their computers, using mathematical techniques imported from the study of DNA to wring scenarios of language evolution out of huge amounts of comparative speech data.
These data analyzers assume that words and other language units change systematically as they are passed from one generation to the next, much the way genes do. Charles Darwin similarly argued in 1871 that languages, like biological species, have evolved into a series of related forms.
And in the same way that geneticists use computerized statistical approaches to put together humankind’s family tree from the DNA of living people and a few long-dead individuals, these newcomers can generate family trees, called phylogenies, for languages. From existing data on numbers of speech sounds and types of grammatical structure, these phylogenies can point to ancient root languages and trace a path to today’s tongues.
The new approach is making a splash — some would say a splatter — among mainstream linguists, who haven’t exactly been anxiously waiting for advice from the fossils-and-genes crowd.
One recent study upends the traditional view that ancient languages did not evolve neatly, one into another and so on, arguing that modern tongues indeed contain telltale marks of how past languages moved across continents. Other results question the influential idea that grammar everywhere reflects innate properties of the human mind. Both investigations have appeared in high-profile science journals, drawing unprecedented publicity for explorations of speech sounds and word orders.
Subscribe to Science News
Get great science journalism, from the most trusted source, delivered to your doorstep.
“Linguists spin a bit of a story with case studies of individual languages,” says evolutionary biologist Russell Gray of the University of Auckland in New Zealand, a pioneer of the phylogenetic analysis of speech. “Statistical methods can now be used to examine languages rigorously and on a global scale.”
Traditional language studies are still vital, he says, because they provide the massive amounts of speech and grammatical information needed for statistical breakdowns.
Talk of ages
Auckland’s Quentin Atkinson, a psychologist and colleague of Gray’s, stands in the eye of the phylogenetic storm. In a controversial paper in the April 15 Science, Atkinson concluded that because African languages have greater numbers of speech sounds than others, language probably originated in Africa. A parallel argument from evolutionary biology holds that greater numbers of DNA alterations in African populations reflect humankind’s African roots.
Atkinson’s study grew out of observations by other researchers that the number of sounds, or phonemes, employed in words declines as populations shrink and increases as populations enlarge. A succession of smaller and smaller groups migrating away from a larger founding population should thus lose more and more phonemes with increasing distance from the point of origin, Atkinson reasoned. Settlers of new lands could come up with their own phonemic twists but would have less time to build up the big inventories of speech sounds found in larger, well-established populations.
Using vowel, consonant and tone inventories from 504 languages, obtained from an online database, Atkinson evaluated all possible geographic origins of language, from Africa to South America, for signs of progressively declining phoneme frequencies as languages got farther away from a given source. Southwestern Africa emerged as the strongest candidate for an area where language got its start.
This pattern held after accounting for additional factors that alter phoneme numbers, such as word and phoneme borrowing among neighbors. “Languages apparently expanded out of Africa, with a loss of phonemic diversity along the way,” Atkinson says.
Global phoneme patterns say nothing about when Africans started talking, but other evidence does, he says. Seashell necklaces, engraved pigment chunks and other signs of symbolic cultural behavior (SN: 8/13/11, p. 22) date to between 160,000 and 80,000 years ago in Africa, a period when languages like those today must have first been spoken. Language expansions to other continents coincided with estimated migration times of genetic lineages out of Africa, Atkinson suggests.
Many traditional linguists view Atkinson’s analysis as a strange, wayward statistical creature. Language, like people, may have flowered in Africa and spread from there, they say, but Atkinson’s proposed phonemic highway misleadingly races over language’s long and winding road. Too many factors affect how speech sounds get added and subtracted from contemporary languages to enable reliable evolutionary reconstructions, these linguists assert.
“Nothing of what’s known about language acquisition or change suggests that either fewer or more phonemes will appear as people move around,” says linguist Lyle Campbell of the University of Hawaii at Manoa. Languages lose and acquire sounds for many reasons, including cultural adaptations to new habitats and conquest by foreigners.
Others suspect Atkinson’s analytical approach could be fruitful if informed by more sophisticated assumptions about how languages change. “I think many linguists would praise Atkinson’s contribution if it weren’t for the fact that his conclusions are so outlandish and contrary to linguistic intuition,” says linguist Michael Cysouw of Ludwig Maximilians University Munich in Germany.
One problem lies in Atkinson’s focus on frequencies of only one linguistic element, phonemes, to retrace language evolution. “That could be compared to tracking the history of vertebrates by counting the number of bones in their skeletons,” Cysouw says.
The database of phonemes consulted by Atkinson incorrectly gives greater weight to vowels and tones than to consonants, inflating the estimated number of speech sounds in western Africa where people speak languages that include clicks, Cysouw adds. In an analysis of a linguistic database corrected for such issues, he and his colleagues find the most phoneme-heavy tongues in North America. Languages of West Africa, New Guinea and Australia contain the fewest sounds.
Using this database, Cysouw’s team repeated Atkinson’s technique and found two separate geographic origins for language, one in East Africa and another in West Asia’s Caucasus region, with a large swath of the Middle East and South Africa also possible. Crucially, Cysouw’s analysis suggests that none of these regions contain phoneme-rich languages that stand out as having far more speech sounds than any of the others.
Linguist Florian Jaeger of the University of Rochester in New York agrees with Cysouw’s criticisms. Many languages that Atkinson folds into his analysis belong to families that don’t display declining phoneme numbers among speakers located at increasing distances from Africa, Jaeger says. Statistical tests conducted by Jaeger and colleagues find that only three of the nine largest language families that Atkinson examined behave according to his hypothesis. The other five language families consist of tongues that gain phonemes with increasing distance from West Africa or show no geographic patterns in phoneme numbers, Jaeger’s team reports in an upcoming Linguistic Typology.
All in the families
Similar disagreements swirl around another phylogenetic study, published in the May 5 Nature. That paper challenges a long-standing linguistic consensus that universal patterns exist in the ways that languages assemble words into sentences, reflecting innate grammatical rules or predispositions in the human mind. Grammatical standards instead develop in distinctive ways from one language family to another, indicating that cultural forces have orchestrated language evolution, say evolutionary linguist Michael Dunn of the Max Planck Institute for Psycholinguistics in Nijmegen, the Netherlands, and his colleagues.
“We don’t find any evidence for a universal structure in language,” says Gray, a coauthor of the paper.
Dunn and Gray’s findings run counter to MIT linguist Noam Chomsky’s position that a small set of innate rules for putting words together limits how languages can change. The new investigation also defies an influential idea, championed by the late Stanford University linguist Joseph Greenberg, that certain structural patterns appear in all languages.
Researchers inspired by Greenberg propose, for example, that languages in which verbs come before objects tend to use prepositions, as in “The man (subject) put (verb) the dog (object) in (preposition) a canoe.” Languages in which verbs follow objects tend to use postpositions, as in “The man (subject) the dog (object) put (verb) the canoe in (postposition).”
Using eight word-order features, Dunn and Gray’s team statistically reconstructed evolutionary trees of languages from four major families: Austronesian, Bantu, Indo-European and Uto-Aztecan. These families contain about one-third of the world’s roughly 7,000 languages.
The team found that languages within each family, but not languages across families, formed branching patterns with related sentence structures. Pairs of word-order features, such as a particular arrangement of numerals and nouns or of nouns and adjectives, almost always occurred together within single language families.
If universal properties of the human mind provide a framework for speech, then word-order patterns should have shown commonalities across language families. But this work suggests that speakers of, say, Indo-European and Austronesian languages — heirs of distinctive cultural traditions — take vastly different routes to ordering various types of words in sentences.
“This finding indicates that different cultures come up with their own, quite sensible word-order rules,” says evolutionary biologist Mark Pagel of the University of Reading in England, a pioneer of phylogenetic studies of language.
Borrowing of words and grammatical styles can cloud lines of descent, but Dunn’s analysis accounts for such non-evolutionary factors while ferreting out systematic word-order changes over time, Pagel says. “What makes phylogenetic findings so extraordinary is that, despite lots of uncertainty, we still see that language gets transmitted in evolutionary ways,” he says.
Many linguists think that what makes Dunn’s phylogenetic study so extraordinary is a cavalier, data-challenged dismissal of the bedrock notion that talk everywhere shares common properties. Other work indicates that languages around the world pick from a limited menu of possible word-order choices, says MIT linguist David Pesetsky. Language families can’t opt for off-menu, one-of-a-kind word sequences, he says.
Consider verb-second positioning, in which the second word or group of related words in a main clause is always a verb. A verb-second structure appears in the following Dutch sentences: “I read this book yesterday,” “Yesterday read I this book” and “This book read I yesterday.” Among Germanic languages, which include Dutch, only English lacks a structure that always puts the verb second. (English speakers, for example, could say, “Yesterday I read this book,” a verb-third positioning.) Researchers have now identified verb-second tongues in West Africa and Brazil.
Languages everywhere can easily pop into a verb-second framework, Pesetsky proposes. Other verb placements sometimes appear, as in English, but the menu of alternatives is limited; no languages that always put the verb third or second from the end have been found.
Sketchy and unreliable descriptions of many languages, including those in major families, also undermine Dunn’s work, in Pesetsky’s view. Word orders shift in some languages depending on the situation, another poorly understood phenomenon. “Languages are complex beasts,” Pesetsky says. “Even the best-studied ones have secrets.”
Furthermore, word-order changes occur so infrequently in documented languages that Dunn’s study could have lacked enough statistical power to identify word-order patterns shared by two or more language families, contend linguist William Croft of the University of New Mexico in Albuquerque and his colleagues, including Jaeger, also in an upcoming Linguistic Typology. Several other critiques of phylogenetic studies are set to appear in the journal.
Although many traditional linguists have so far greeted phylogenetic findings with the academic equivalent of a Bronx cheer — a derisive act with a grammar all its own — those with a statistical background suspect that the techniques those studies use have a future.
Many difficulties remain in constructing sophisticated scenarios of language change that can be tested with phylogenetic methods, Cysouw says, but it’s not an impossible goal.
Pagel is more hopeful: Statistical methods for testing models of DNA evolution with voluminous amounts of genetic data revolutionized molecular genetics more than 20 years ago and will do the same to linguistics. “Some linguists are very skeptical,” he says, “but others see that this approach has a ring of truth.”
Pagel knows, though, that it’s one thing to walk the statistical walk. It’s quite another to talk the linguistic talk, at least in a dialect that makes sense to most language researchers.