From: jimruttshow8596
Eric Smith, a researcher at the Earth Life Science Institute in Tokyo, the Biology Department of Georgia Tech, and external faculty at the Santa Fe Institute (SFI), has applied his background in statistical physics to various fields, including the nature and origins of language [00:01:08].
Interest in Language as a Phenomenon
Languages are seen as a source of joy, something people are embedded in with enormous tacit knowledge as speakers and participants [00:04:09]. Encountering foreign languages can provide insight into one’s own language that was previously ununderstood; for instance, recognizing Latin and German logic in English [01:04:27].
The opportunity to seriously work on reconstructing the history of the world’s languages as a platform for understanding language as a phenomenon was a key draw [01:05:02].
Challenges in Linguistics
Linguistics has historically been an excessively conservative field, resisting the use of modern probability methods much longer than it should have, although this is starting to change [01:05:17]. At SFI, linguists sought to engage biologists who had used probability methods for genomics for decades, but the exchange was often frustrating due to:
- Biologists being arrogant, believing they had solved problems, and uninterested in learning the depth of linguistic structure [01:05:48].
- Linguists being resistant to putting in the hard work of learning and adopting modern probability methods [01:06:04].
A simple version of this problem took seven years to complete and publish [01:06:31]. The field remains rich for new researchers due to extensive knowledge in probability methods and language structure, but creating probabilistic models of the intrinsic dynamics of language use and change is still largely undone [01:06:44].
How Languages Evolve and Change
From a physics perspective, while genome change can often be understood by looking at context-independent flips of individual bases, language functions as a system where components are interconnected [01:07:30]. For example, if a sound like “la” changes, it must change in all instances and for all speakers simultaneously to maintain its meaning through contrast [01:08:02]. This makes modeling language change distinctly different from modeling genome change, requiring an understanding of the joint change of the system with the tokens carrying its properties [01:08:26].
Deep Learning and Grammar Discovery
Context discovery remains a difficult problem, and the deep learning community could benefit from taking more interest in grammar and syntax discovery [01:08:40]. By asking deep learning systems to unpack what they know about language, potentially through zero-shot translation, new insights into grammar and syntax could be revealed [01:09:05]. This would be a collaboration between those in deep learning interested in interpreting their “black boxes” and linguists with rich understanding of grammar, typology, syntax, and morphology [01:09:34]. There is ongoing work in “grammar induction” to mechanically induce grammar from large corpora of language [01:10:26].
Origin of Full Language
Regarding the timeline for the emergence of full language, there are theories ranging from 300,000 years ago to as recent as 10,000 years ago [01:11:17]. Eric Smith leans towards the 100,000-year ballpark, aligning with the appearance of modern Y chromosome and mitochondrial forms, and not seeing reasons for overall brain development changes much later [01:11:45]. The question of what caused cultural modernity around 40,000 years ago remains compelling [01:12:11].
A strong argument for language existing over 65,000 years ago stems from the “Out of Africa” migration. Since almost all non-African people are descended from this migration (around 65,000 years ago), and Chomskyan language exists both inside and outside Africa with almost no genetic backflow, the capability for full language must have existed prior to that migration [01:12:25].