From: jimruttshow8596

The study of language, its origins, and evolution is a rich area of research, with Eric Smith noting its inherent joy and the “enormous amount of tacit knowledge” held by speakers and participants in various languages [01:04:07]. Encountering foreign languages can illuminate the logical underpinnings of one’s own language, such as understanding how English blends the logic of Latin and German [01:04:45].

Academic Approaches

At the Santa Fe Institute (SFI), Eric Smith engaged with a group dedicated to reconstructing the history of all the world’s languages to better understand language as a phenomenon [01:05:02].

Challenges in Linguistics

The field of linguistics has been described as “excessively conservative,” having resisted the adoption of modern probability methods decades longer than it should have [01:05:17]. While linguists at SFI sought to collaborate with biologists who had been utilizing probability methods for genomics, these exchanges were often “frustrating” [01:05:37]. This frustration stemmed from biologists’ perceived arrogance and disinterest in the depth of linguistic structure, and linguists’ reluctance to embrace modern probability methods [01:05:48]. Despite these challenges, there remains significant potential to create new fields by developing probabilistic models of language use and change [01:07:04].

Understanding Language Change

Unlike genome change, which can often be studied by examining context-independent flips of individual bases, language functions as a system where components are interconnected [01:07:33]. For instance, if a sound like “la” changes, it must do so across all its occurrences in the language and for all speakers simultaneously, making the modeling of language change a distinct and complex problem [01:08:13]. This emphasizes the “joint change of the system with the tokens that carry the properties of the system” [01:08:31].

Intersection with Artificial Intelligence

The challenge of “context discovery” in language remains a difficult problem [01:08:40]. Eric Smith expresses interest in seeing the deep learning community, which has primarily been “data-centric,” take a greater interest in grammar and syntax discovery [01:08:47]. This approach could reveal more about the nature of grammar, especially with systems capable of “zero-shot translation” [01:09:09]. While some deep learning researchers are beginning to explore what is “inside the black box” and how to interpret it in more symbolic terms, the field largely remains resistant to this [01:09:18]. A collaboration between deep learning experts and linguists, leveraging existing knowledge of grammar, typology, syntax, and morphology, could lead to significant advancements [01:10:01]. The concept of “grammar induction,” where grammar is mechanically induced from a large corpus of language, is an area of ongoing progress [01:10:26].

Timing of Language Emergence

Regarding the timeline for the emergence of full human language, Smith views differing theories (e.g., 300,000, 40,000, or 10,000 years ago) as a “tourist of those opinions” [01:11:31], lacking a unique stance. However, if pressed for a guess, he would favor the “hundred thousand year ballpark” [01:11:45], aligning with the modern form of the Y chromosome and mitochondrion appearing around that time [01:11:50], and noting no significant changes in overall brain development much later.

A compelling argument for language existing over 65,000 years ago stems from the “Out of Africa migration” [01:12:25]. Since almost all non-African people are descended from this migration, which had minimal genetic backflow to Africa, and Chomskyan language exists both within and outside Africa, the capability for full language must have been present before this dispersal [01:12:36].