Language genetic proximity calculator


Select two languages to determine their genetic proximity:
 
The genetic proximity is expressed as a value between 0 (same language) and 100 (biggest possible distance). In practice, the values reflect approximately following degree of genetic proximity:
  • Between 1 and 20: Highly related languages. Protolanguage (common "ancestor") between several centuries and 2000 years.
  • Between 21 and 40: Related languages. Protolanguage between 2000 and 4000 years.
  • Between 40 and 70: Remotely related languages. Protolanguage between 4000 and 8000 years.
  • Between 70 and 80: Very remotely related languages. Protolanguage older than 8000 years.
  • Between 80 and 100: No recognizable relationship.
The values are calculated according to a series of criteria and language features which remain constant in each comparison. The project contains a program which stores the values for all relationships between the listed languages in a distance matrix. The resulting matrix can be processed to an evolution tree, using MEGA 5*. Here is a result sample. The language families and sub-families in which the languages are sorted in the tree are easily recognizable and are no surprise for linguists. The remarkable point about these results is that the tree is the computerised output of completely automatically processed data.
This project is managed as a desktop application written in C# on Visual Studio 2008 using Linq to XML. The online genetic proximity calculator is a asp.net implementation of classes used on the desktop application.
 
*MEGA 5: www.megasoftware.net - Tamura K, Peterson D, Peterson N, Stecher G, Nei M, and Kumar S (2011) MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution (submitted).