New Machine Learning System Deciphers Lost Languages

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a new system that can “automatically decipher a lost language, without needing advanced knowledge of its relation to other languages.” 

As reported by Adam Conner-Simons, a language is considered to be lost when too little is known about its grammar, vocabulary, or syntax to be able to understand its texts. Such languages often also lack a well-researched related language to which they can be compared.

The researchers developed a decipherment algorithm, which “can handle the vast space of possible transformations and the scarcity of a guiding signal in the input.” The system relies on established linguistic principles, such as the patterns in which languages typically evolve. 

A 2019 paper describes the model and reports successful results deciphering the languages of Ugaritic, an extinct dialect of the Amorite language, and Linear B, a syllabic language related to ancient Greek.