Many languages uses agglutination where the form of a word is dependent on neighboring words and we would like to be able to represent this as well. We use agglutination as a word that covers many distinct phenomena such as assimilation, liasion, sindha and vowel harmony. For example, in Maltese the definite article “il-” is a proclitic that assimilates with the first phoneme of the following word if it is a coronal constonant or vowel. For example
il- + missier | il-missier | the father |
il- + omm | l-omm | the mother |
il- + tifel | it-tifel | the boy |
The rules for doing this are similar to inflection rules but include the symbol “+” to indicate the word with which the agglutination occurs. So we describe maltese agglutination as follows
:maltese_il_assimilation a lemon:MorphPattern ; lemon:transform [ lemon:rule "~l+([ċdnrstxżz])/~$2-$2" ; lemon:rule "~l+([aeiou])/l-$2" ] .
Note that this pattern only applies to one word, it is written in a way that could be used elsewhere if the pattern occurs differently. Similarly $2 is used as the matcher (as $1 corresponds to ˜).
Hungarian (as well as all Uralic and many Altaic langauges) exhibit an interesting agglutinative property known as vowel harmony, in which the vowels in a suffix must agree with the vowels in the preceding word. Hungarian groups the vowels into three categories back vowels (a, o, u), front vowels (ö, ü) and intermediate vowels (e, i). Prepositions and cases in Hungarian are prefices which must agree in terms of their vowel for example
lakás + hoz | lakáshoz | to the house |
szem + hoz | szemhez | to the eye |
kör + hoz | körhöz | to the circle |
This can be modeled using different stem forms as in example 67,so we would model this as follows15
:hungarian_vowel_harmony a lemon:MorphPattern ; lemon:transform [ lemon:onStem [ dcr:vowelHarmony dcr:back ] lemon:rule "^([^öőüűeéií]*[aáoóuú].*)+~/$1+~" ; ] , [ lemon:onStem [ dcr:vowelHarmony dcr:intermediate ] lemon:rule "^([^aáoóuúeéií]*[öőüű].*)+~/$1+~" ; ] , [ lemon:onStem [ dcr:vowelHarmony dcr:front ] lemon:rule "^([^aáoóuúöőüű]*[eéií].*)+~/$1+~" ; ] .
Here the regular expression matches the hole of the agglutination, indicated by the ˆ indicating the start of string.
John McCrae 2012-07-31