!! Lexicon files
Lexicon files are a part of the ''langs/sms/src/morphology'' infrastructure.
!! Entry structure
! level
TODO: example sms entry
! level
Man skiller mellom synonymer og meningsgrupper. Synonymer har samme
(meaning group / meningsgruppe) og samme (translation group /
oversettelsesgruppe). Hvis en entry har flere betydninger, så skilles disse som
forskjellige .
! level
Elementet inneholder en eller flere (oversettelsesgruppe eller translation group) som igjen kan inneholde:
! - a word
{{{ TODO: example entry with }}}
! - a phrase
{{{ TODO: example entry with }}}
! - An explanation: a sentence which explains the meaning of a word, but can't be used in the translation.
{{{ TODO: example entry with }}}
! - Restriction
* gives a restriction for the translation, f.ex. norwegian ''vest'' has the restriction ''of clothes'', to separate it from the navigational direction.
{{{ TODO: example entry with }}}
! attribute documentation
TODO:
! attribute documentation
TODO:
! for references
'''' is used to display a reference in the dictionary to another
entry.
Typically these words also include an node in the so we can
provide ad-hoc analyses that don't come from the FST.
{{{
muʹnne
mon
Pron_Pers_Sg_Ill
}}}
Leads to ...
{{{
mon
}}}
These are found in ''Pron_references_sms2x.xml''.
! Example sentences
TODO:
In sms these can come in under either or , for good reasons.
{{{ TODO: example of reasons
}}}
! Files with static paradigms
Currently all sms files have a minimal miniparadigm, but in NDS we generate more.
In NDS we can tell the system to not use the static miniparadigm with the @exclude attribute:
{{{
muu
muʹnne
}}}
If this attribute is not present as in the above, then the static paradigm will
be displayed in NDS.
! Other files
TODO:
!! Generated miniparadigms
Miniparadigms are generated in lexicon entries in order to help users. They
vary from POS to POS and sometimes within POS.
!Use/NGminip og Allegro i lexc
TODO: are these the tags we use now in sms?
+Use/NGminip - remove inflectional forms that one does not want to present in
the miniparadigm. One example, North Saami adjectives.
NB: judicious use of +Use/NGminip from sme to clean up many possibilities into
one.
|| Inflection || Without +Use/NGminip || With +Use/NGminip
| A+Sg+Nom| heittot | heittot
| A+Attr | heittogis heittohis (bivttas) | heittogis (bivttas)
| A+Pl+Nom | heittogat heittohat | heittogat
| A+Comp+Attr | heittogit heittogut heittoget heittogat heittohit heittohut heittohet heittohat | heittoget heittogat
| A+Comp+Sg+Nom | heittogit heittogut heittoget heittogeabbo heittogat heittogabbo heittohit heittohut heittohet heittoheabbo heittohat heittohabbo | heittogeabbo heittogabbo
| A+Superl+Sg+Nom | heittogeamos heittogamos heittoheamos heittohamos | heittogeamos heittogamos
! Nouns
Display the whole paradigm in two columns for plural. In NDS, because there
are case inflections that do not have +Sg or +Pl, we use a special tagset to
separate these cases out to display them across one column.
TODO: Noun attributes that affect miniparadigms ?
|| Bøyning || Eksempel
| Sg+Nom | võrr
| Sg+Gen | võõr
| Sg+Acc | võõr
| Sg+Ill | võʹrre
| Sg+Loc | võõrâst
| Sg+Com | võõrin
| Sg+Abe | võõrtää
| Sg+Abe | võõrtaa TODO: does this need an attribute to control?
| Pl+Nom | võõr
| Pl+Gen | võõri
| Pl+Acc | võõrid
| Pl+Ill | võõrid
| Pl+Loc | võõrin
| Pl+Com | võõrivuiʹm
| Pl+Abe | võõritää
| Pl+Abe | võõritaa TODO: does this need an attribute to control?
| Ess | võrrân
| Par | võrrâd
! Proper nouns
For now, all proper nouns are not generated in Plural.
Sg+Nom Njuõttjokk
Sg+Gen Njuõttjook
Sg+Acc Njuõttjook
Sg+Ill Njuõttjoʹǩǩe
Sg+Loc Njuõttjookâst
Sg+Abe Njuõttjooktää
Sg+Par Njuõttjokkâd
EX: Äʹnnjääuʹraž
TODO: determine how to display these in sms
|| Form || Context || Example || Translation
| - | - | |
| Sg+Gen | X pääiʹǩ | |
| Sg+Ill | - | |
| Sg+Loc | - | |
TODO: Any plural-only proper nouns?
! Holidays?
use räjja in context for e.g. eeʹjjpeeiʹv räjja
! Adjectives
For adjectives we use context as an attribute on the lemma node, in order to
provide an attributive adjective example with a noun.
TODO: determine some good contexts for adjs
|| Inflection || Context || Example
| A+Pred+Sg | | oođâs
| A+Attr | context:"??" | ođđ (??)
| A+Comp | | ođđsab
| A+Superl | | ođđsumus
TODO: +A+Pred+Pl ?
! numerals
TODO:
! Pronouns
! Personal pronouns
Most personal pronouns can be generated live from FSTs, depending on the
analysis, but it may be easier to include the whole paradigm in a miniparadigm
because of complexities in tags matching up with lemmas.
This also requires the ''type="Pers"'' attribute on the '''' node, and the
+Sg, +Du, and +Pl marking in the pronoun tag. If these change, we will need to
adjust the paradigm rules.
|| Inflection || Example
| Sg+Nom | mon
| Sg+Gen | muu
| Sg+Acc | muu
| Sg+Ill | muʹnne
| Sg+Loc | muʹst
| Sg+Com | muin
| Sg+Abe | muutää
| Ess | muuʹnen
| Par | muuʹđed
TODO:
! Indef pron
måtam Måtmin
TODO:
!! Pregenerated paradigms
! pronouns
Because the analyzer uses tags that make generation difficult, the thought was
to include miniparadigms in the XML file that will contain the whole displayed
paradigm.
{{{
TODO:
}}}
! negative verb
TODO:
{{{
Sg1
Sg2
Sg3
Pl1
Pl2
Pl3
}}}
!! Homonymous entries
Homonymous entries (lemma + POS) may be tricky for a combination of the lexicon
and the analyzer. An additional way to deal with this is to mark these on an
additional attribute, POS type, or something else. This is also problematic
when generating the correct paradigm for the lexicon entry, or when lining up
analyses with the meanings.
TODO: jokk is homonymous in sms, find examples for documentation from there.
! Non-systematic homonymy
TODO:
element gets an attribute hid="1" or hid="2". The lemmas are marked
similarly in the norm FST. Thus,
|| Nom || Gen || norsk || norm-fst-analyse
| lohkki | lohki | lokk | lohkki+N+Sg+Nom
| lohkki | lohkki | lesar | lohkki+N+Actor+Sg+Nom
{{{
TODO: xml examples of homonymous entries (either actor type, or hid type, etc.)
1.
2.
}}}