!! Lexicon files Lexicon files are a part of the ''langs/sms/src/morphology'' infrastructure. !! Entry structure ! level TODO: example sms entry ! level Man skiller mellom synonymer og meningsgrupper. Synonymer har samme (meaning group / meningsgruppe) og samme (translation group / oversettelsesgruppe). Hvis en entry har flere betydninger, så skilles disse som forskjellige . ! level Elementet inneholder en eller flere (oversettelsesgruppe eller translation group) som igjen kan inneholde: ! - a word {{{ TODO: example entry with }}} ! - a phrase {{{ TODO: example entry with }}} ! - An explanation: a sentence which explains the meaning of a word, but can't be used in the translation. {{{ TODO: example entry with }}} ! - Restriction * gives a restriction for the translation, f.ex. norwegian ''vest'' has the restriction ''of clothes'', to separate it from the navigational direction. {{{ TODO: example entry with }}} ! attribute documentation TODO: ! attribute documentation TODO: ! for references '''' is used to display a reference in the dictionary to another entry. Typically these words also include an node in the so we can provide ad-hoc analyses that don't come from the FST. {{{ muʹnne mon Pron_Pers_Sg_Ill }}} Leads to ... {{{ mon }}} These are found in ''Pron_references_sms2x.xml''. ! Example sentences TODO: In sms these can come in under either or , for good reasons. {{{ TODO: example of reasons }}} ! Files with static paradigms Currently all sms files have a minimal miniparadigm, but in NDS we generate more. In NDS we can tell the system to not use the static miniparadigm with the @exclude attribute: {{{ muu muʹnne }}} If this attribute is not present as in the above, then the static paradigm will be displayed in NDS. ! Other files TODO: !! Generated miniparadigms Miniparadigms are generated in lexicon entries in order to help users. They vary from POS to POS and sometimes within POS. !Use/NGminip og Allegro i lexc TODO: are these the tags we use now in sms? +Use/NGminip - remove inflectional forms that one does not want to present in the miniparadigm. One example, North Saami adjectives. NB: judicious use of +Use/NGminip from sme to clean up many possibilities into one. || Inflection || Without +Use/NGminip || With +Use/NGminip | A+Sg+Nom| heittot | heittot | A+Attr | heittogis heittohis (bivttas) | heittogis (bivttas) | A+Pl+Nom | heittogat heittohat | heittogat | A+Comp+Attr | heittogit heittogut heittoget heittogat heittohit heittohut heittohet heittohat | heittoget heittogat | A+Comp+Sg+Nom | heittogit heittogut heittoget heittogeabbo heittogat heittogabbo heittohit heittohut heittohet heittoheabbo heittohat heittohabbo | heittogeabbo heittogabbo | A+Superl+Sg+Nom | heittogeamos heittogamos heittoheamos heittohamos | heittogeamos heittogamos ! Nouns Display the whole paradigm in two columns for plural. In NDS, because there are case inflections that do not have +Sg or +Pl, we use a special tagset to separate these cases out to display them across one column. TODO: Noun attributes that affect miniparadigms ? || Bøyning || Eksempel | Sg+Nom | võrr | Sg+Gen | võõr | Sg+Acc | võõr | Sg+Ill | võʹrre | Sg+Loc | võõrâst | Sg+Com | võõrin | Sg+Abe | võõrtää | Sg+Abe | võõrtaa TODO: does this need an attribute to control? | Pl+Nom | võõr | Pl+Gen | võõri | Pl+Acc | võõrid | Pl+Ill | võõrid | Pl+Loc | võõrin | Pl+Com | võõrivuiʹm | Pl+Abe | võõritää | Pl+Abe | võõritaa TODO: does this need an attribute to control? | Ess | võrrân | Par | võrrâd ! Proper nouns For now, all proper nouns are not generated in Plural. Sg+Nom Njuõttjokk Sg+Gen Njuõttjook Sg+Acc Njuõttjook Sg+Ill Njuõttjoʹǩǩe Sg+Loc Njuõttjookâst Sg+Abe Njuõttjooktää Sg+Par Njuõttjokkâd EX: Äʹnnjääuʹraž TODO: determine how to display these in sms || Form || Context || Example || Translation | -   | - |   |   | Sg+Gen | X pääiʹǩ |   |   | Sg+Ill | - |   |   | Sg+Loc | - |   |   TODO: Any plural-only proper nouns? ! Holidays? use räjja in context for e.g. eeʹjjpeeiʹv räjja ! Adjectives For adjectives we use context as an attribute on the lemma node, in order to provide an attributive adjective example with a noun. TODO: determine some good contexts for adjs || Inflection || Context || Example | A+Pred+Sg |   | oođâs | A+Attr | context:"??" | ođđ (??) | A+Comp |   | ođđsab | A+Superl |   | ođđsumus TODO: +A+Pred+Pl ? ! numerals TODO: ! Pronouns ! Personal pronouns Most personal pronouns can be generated live from FSTs, depending on the analysis, but it may be easier to include the whole paradigm in a miniparadigm because of complexities in tags matching up with lemmas. This also requires the ''type="Pers"'' attribute on the '''' node, and the +Sg, +Du, and +Pl marking in the pronoun tag. If these change, we will need to adjust the paradigm rules. || Inflection || Example | Sg+Nom | mon | Sg+Gen | muu | Sg+Acc | muu | Sg+Ill | muʹnne | Sg+Loc | muʹst | Sg+Com | muin | Sg+Abe | muutää | Ess | muuʹnen | Par | muuʹđed TODO: ! Indef pron måtam Måtmin TODO: !! Pregenerated paradigms ! pronouns Because the analyzer uses tags that make generation difficult, the thought was to include miniparadigms in the XML file that will contain the whole displayed paradigm. {{{ TODO: }}} ! negative verb TODO: {{{ Sg1 Sg2 Sg3 Pl1 Pl2 Pl3 }}} !! Homonymous entries Homonymous entries (lemma + POS) may be tricky for a combination of the lexicon and the analyzer. An additional way to deal with this is to mark these on an additional attribute, POS type, or something else. This is also problematic when generating the correct paradigm for the lexicon entry, or when lining up analyses with the meanings. TODO: jokk is homonymous in sms, find examples for documentation from there. ! Non-systematic homonymy TODO: element gets an attribute hid="1" or hid="2". The lemmas are marked similarly in the norm FST. Thus, || Nom || Gen || norsk || norm-fst-analyse | lohkki | lohki | lokk | lohkki+N+Sg+Nom | lohkki | lohkki | lesar | lohkki+N+Actor+Sg+Nom {{{ TODO: xml examples of homonymous entries (either actor type, or hid type, etc.) 1. 2. }}}