What is this?
This is not a translator from Latin or Sanskrit to other languages. Instead, this is a predictive model: it takes words in Latin or Sanskrit (what I'll call “classical” languages) and predicts their descendents, through the natural evolution of language, in modern languages. (Plus Pali, which we'll get to later.)
Over time, languages change. These changes are sometimes unpredictable, but more often than not, they actually follow well-defined rules. In theory, these rules can be programmed into a computer, which is what you're seeing here. For example, a stressed o in Latin tends to become ue in Spanish, such as iocum becoming juego.[] There are many more such rules, often much more complicated.
There are at least five ways this tool could fail to predict a translation:[]
- The descendent language borrowed a word directly from the classical one. We might think óculo in Spanish evolved from Latin oculus (eye). In fact, the descendent of oculus in Spanish is ojo (eye). Óculo is a specific architectural term, and was not inherited but was consciously brought into Spanish by people who knew the Latin word.
- The descendent language borrowed a related word from another language. We might expect the descendent of Sanskrit म्रक्षण mrakṣaṇa (butter) in Hindi/Urdu to be माखन ماکھن mākhan. Instead, the word is मक्खन مکھن makkhan, which is a loan from Punjabi. mākhan is a mostly obsolete term used in only a few contexts.
- The meaning of a word has changed over time. The Portuguese word relha, descended from Latin rēgula (rule), actually means plowshare.
- The evolution is an exception to the rule. We would expect the descendent of युष्मान् yuṣmān (you, formal) in Marathi to be something like *जूम्हा *zūmhā. Instead, we see तुम्ही tumhī with initial t-, indicating that युष्मान् yuṣmān took on tv- in analogy to त्वम् tvam (you, informal) in the course of its evolution.
- The descendent language uses a completely unrelated word. Spanish aceite (oil) comes from Arabic and is unrelated to the Latin oleum.
Historical Background
Sanskrit dates back to around 1500 BCE, the time of the Rigveda. While it was not written at this time, it was preserved orally so that we know what this early stage of the language was like. Sanskrit was standardized much later, around 500 BCE (give or take a few hundred years), by Pāṇini in his Aṣṭādhyāyī. This standardized form is known as Classical Sanskrit.[]
While Sanskrit is a dead language, its spoken dialects evolved into various Prakrits,[] from which the modern Indo-Aryan languages descend. These include languages such as Hindi/Urdu, Punjabi, and Marathi, mostly found in North India and nearby countries like Pakistan and Bangladesh.[][] There's also Pali, another dead language which continues to be used as the liturgical language of Theravada Buddhism. Pali dates to around 200 BCE, the time of the Prakrits.[]
The situation for Latin is similar. Old Latin dates to around 753 BCE (the traditional founding date of Rome), while Classical Latin dates to around 75 BCE. Classical Latin was not the creation of one person like Classical Sanskrit, but was developed by various writers who wrote the classics of Latin literature. In contrast to Classical Latin, the Latin that was spoken by common people is known as Vulgar Latin. Unfortunately, this form of Latin was rarely written, so we have limited knowledge of what it was like.[]
Like Sanskrit, Latin is now a dead language. However, over the centuries, the various dialects of Vulgar Latin evolved differently and formed modern languages such as Spanish, French, Portuguese, and Italian. The languages descended from Vulgar Latin are known as the Romance languages.[][]
Thus we can make a rough analogy:
Latin | Sanskrit |
---|---|
Old Latin | Vedic Sanskrit |
Classical Latin | Classical Sanskrit |
Vulgar Latin | Prakrits |
N/A | Pali |
Romance languages | Modern Indo-Aryan languages |
This is a little complicated, and I'm glossing over a lot of important differences between the two, but the takeaway is this: in general, you can enter words in Classical Latin or Classical Sanskrit and get a rough idea of what they might look like in modern languages. In a few cases, you may have to adjust the input to match an unattested Vulgar Latin or Old Indo-Aryan dialect to get the best results.
An interesting fact: Latin and Sanskrit are related, as both descend from Proto Indo-European. An example cognate is पाद pāda in Sanskrit and pedem in Latin, both meaning foot. This makes French pied and Urdu پاؤں pāõ distantly related, along with English foot which had a p- to f- change.[]
Why Latin and Sanskrit?
So why choose Latin and Sanskrit as the source languages for a “classics converter”? They have a number of really attractive qualities that make a “classics converter” work well.
- Both are millenia old, and have extensive literature we can use to source words.
- Both evolved into not just one but many modern languages, and many of these are widely spoken and have their own extensive literatures.
- Both used some kind of phonetic writing, so we have a good idea of what they sounded like. (Panini even described the sound system of Sanskrit in linguistic terms.)
As far as I can tell, these are among the only languages that fit these criteria.
- The only widely-spoken descendent of Ancient Greek is Modern Greek. Likewise for Old English, Persian, and Malay.
- Classical Chinese used Chinese characters, making it difficult to tell how it was pronounced.
- Modern Arabic dialects are widely spoken, but not widely written, as people write in Fus'ha instead.
References
The rules applied in the converter code, while not necessarily referenced in the main article, mostly came from:
- English Wiktionary ^
- Sanskrit (Wikipedia) ^
- Indo-Aryan languages (Wikipedia) ^
- Pali (Wikipedia) ^
- Prakrit (Wikipedia) ^
- Latin (Wikipedia) ^
- The Linguistics of Spanish (Ian Mackenzie, Newcastle University, 1999–2022) ^
- The Indo-Aryan Languages (Colin P. Masica, University of Chicago, 1991) ^
- Phonological history of French (Wikipedia) ^
- Comparison of Portuguese and Spanish (Wikipedia) ^
- How Latin Became Italian (Damyan Lissitchkov, 2013) ^
- Phonological changes from Classical Latin to Proto-Romance (Wikipedia) ^
- History of the Spanish language (Wikipedia) ^
Footnotes
- This type of borrowing has a special name in Sanskrit: तत्सम tatsama. Inherited terms are called तद्भव tadbhava. ^
- Pali is a literary language related to Prakrit, corresponding to Vulgar Latin in our analogy. This would be like if Vulgar Latin had a standardized written form separate from Classical Latin. ^
- The question of how “old” a language is is politicized and not very well-defined. Here I just mean that both Sanskrit and Latin have attestation from millenia ago. ^