Canonical forms and preferred lexicalizations

lemon allows syntactic variants to be differentiated from preferred forms by sub-properties of form.

:animal lemon:canonicalForm [ lemon:writtenRep "animal"@en ] ;
   lemon:otherForm [ lemon:writtenRep "animals"@en ] .

Image core-ex2

Example 4

This allows lemon to differentiate between different forms of a word and different words. For example consider we have two labels for an ontology entity, “animal” and “creature”. This is modeled in lemon as follows:

:animal lemon:canonicalForm [ lemon:writtenRep "animal"@en ] ;
   lemon:otherForm [ lemon:writtenRep "animals"@en ] ;
   lemon:sense [ lemon:reference ontology:animal ] .
:creature lemon:canonicalForm [ lemon:writtenRep "creature"@en ] ;
   lemon:otherForm [ lemon:writtenRep "creatures"@en ] ;
   lemon:sense [ lemon:reference ontology:animal ] .
Example 5

It is also possible to state the lexicon-ontology relationship in the reverse direction because reference and sense have inverse properties isReferenceOf and isSenseOf. This allows example 5 to be stated as follows

   lemon:isReferenceOf [
      lemon:isSenseOf [
         lemon:canonicalForm [ lemon:writtenRep "animal"@en ] ;
         lemon:otherForm [ lemon:writtenRep "animals"@en ]
   ] ;
   lemon:isReferenceOf [
      lemon:isSenseOf [
         lemon:canonicalForm [ lemon:writtenRep "creature"@en ] ;
         lemon:otherForm [ lemon:writtenRep "creatures"@en ]
   ] .
Example 6

It is also possible to state the lexicon without any senses or references and then introduce an ontology mapping layer by creating links such as

:animal_sense lemon:isSenseOf :animal ;
   lemon:reference ontology:animal .
Example 7

In lemon we assume that each lexical entry is not semantically disambiguated, and that the reference provides the semantics of the term. We introduce sense to represent those occurrences when the lexical entry is used with the given meaning. As such it is assumed that “feline” and “cat” would not share a sense, even though they can be considered as synonyms. Similarly “he is a cool cat” and “cats are mammals” are assumed to have the same lexical entry as they exhibit the same morphological/syntactic behaviors. This is summarized in the following diagram

Image venn2-crop

One of the most important aspects of lemon is that senses should be unique to a given lexical entry/ontology reference pair, this means that “creature” and “animal”, should not refer to the same sense entity, but can be related using the equivalent property. If two lexical entries do share a sense, then it is assumed that they are lexically equivalent, which may be appropriate for example for an initialism or acronym and its full form. Similarly it should be understood that if a sense has two references then these references are equivalent (for example by OWL's equivalentClass) property. For more details of this see sections [*], [*] and 4.1.

In lemon for each lexically different element, a different lexical entry and sense is used. This means that both lexical entries are considered possible representations of the ontology entity.

Also like SKOS, we allow the inclusion of partial terms, which we refer to as “abstract”, this is useful for representing stems, affices and other morphological units. These are implemented by three sub-properties of form can also be used to describe linguistically relevant differences.

As lemon defines a form as being invariant across different orthographies, different spellings of a word are represented by deriving a sub-property of representation. Here it is important to include the xml:lang tag to indicate the particular usage of a given term. For example the representation of “color” in US English spelling and “colour” in British English spelling.

:color lemon:canonicalForm [
    lemon:writtenRep "color"@en-us ;
    lemon:writtenRep "colour"@en-gb ] .
Example 8

It is also important to note that SKOS's prefLabel, altLabel and hiddenLabel do not distinguish between syntactic preference (like canonicalForm etc) and pragmatic preference, that is whether the term is preferred for terminological reasons. To cover this pragmatic preference lemon has three sub-properties of isReferenceOf, that cover this, namely, prefRef, altRef, hiddenRef. prefRef represents the preferred term of an ontology reference (there should be only one such entry), hiddenRef represents a term that is not used for various reason (for example it is antiquated) and altRef represents any other term. For example the following shows “tuberculosis”, with an alternative “TB” and an antiquated term “phthisis”.

   lemon:prefRef [
      lemon:isSenseOf [ :tuberculosis ]
   ] ;
   lemon:altRef [
      lemon:isSenseOf [ :tb ]
   ] ;
   lemon:hiddenRef [
      lemon:isSenseOf [ :phthisis ]
   ] .

Image core-ex7

Example 9

Between the sub-properties of form and isReferenceOf we can more precisely capture the same semantics as SKOS's prefLabel, altLabel and hiddenLabel. The conversion is as follows:

  Canonical Form Other Form Abstract Form
Preferred Reference of prefLabel altLabel hiddenLabel
Alternative Reference of altLabel altLabel hiddenLabel
Hidden Reference of hiddenLabel hiddenLabel hiddenLabel

John McCrae 2012-07-31