‘Lost’ languages to be resurrected
Tim Hornyak National Geographic News:
A new computer program has quickly
deciphered a written language last used in Biblical times, possibly
opening the door to ‘resurrecting’ ancient texts that are no longer
understood, scientists announced last week.
Created by a team at the Massachusetts Institute of Technology, the
program automatically translates written Ugaritic, which consists of
dots and wedge-shaped stylus marks on clay tablets. The script was last
used around 1200 B.C. in western Syria.
A sample of Ugaritic script on a gift-shop replica. Photograph
courtesy Regina Barzilay |
Written examples of this ‘lost language’ were discovered by
archaeologists excavating the port city of Ugarit in the late 1920s. It
took until 1932 for language specialists to decode the writing.
Since then, the script has helped shed light on ancient Israelite
culture and Biblical texts. Using no more computing power than that of a
high-end laptop, the new program compared symbol and word frequencies
and patterns in Ugaritic with those of a known language, in this case,
the closely related Hebrew. Through repeated analysis, the program
linked letters and words to map nearly all Ugaritic symbols to their
Hebrew equivalents in a matter of hours.
The program also correctly identified Ugaritic and Hebrew words with
shared roots 60 percent of the time. Shared roots are when words in
different languages spring from the same source, such as the French
homme and Spanish hombre, which share the Latin root for ‘man.’
Led by computer science professor Regina Barzilay, the team may be
the first to show that a computer approach to dead scripts can be
effective, despite claims that machines lack the necessary intuition.
“Traditionally, decipherment has been viewed as a sort of scholarly
detective game, and computers weren’t thought to be of much use,”
Barzilay said.
“Our aim is to bring to bear the full power of modern machine
learning and statistics to this problem.”
Not always a ‘Rosetta stone’
The next step should be to see whether the program can help crack the
handful of ancient scripts that remain largely incomprehensible.
Etruscan, for example, is a script that was used in northern and
central Italy around 700 BC but was displaced by Latin by about AD 100.
Few written examples of Etruscan survive, and the language has no known
relations, so it continues to baffle archaeologists. “In the case (of
Ugaritic), you’re dealing with a small and simple writing system, and
there are closely related languages,” an Oregon Health and Science
University computational linguist Richard Sproat who was not involved in
the new work noted.
‘It’s not always going to be the case that there are closely related
languages that one can use’ for Rosetta Stone-like comparisons.
But study leader Barzilay thinks the decoding program can overcome
this hurdle by scanning multiple languages at once and taking contextual
information into account, improvements that could uncover unexpected
similarities or links to known languages. |