! Divvun & Giellatekno - open source grammars for Sámi and other languages ! Copyright © 2000-2010 The University of Tromsø & the Norwegian Sámi Parliament ! http://giellatekno.uit.no & http://divvun.no ! ! This program is free software; you can redistribute and/or modify ! this file under the terms of the GNU General Public License as published by ! the Free Software Foundation, either version 3 of the License, or ! (at your option) any later version. The GNU General Public License ! is found at http://www.gnu.org/licenses/gpl.html. It is ! also available in the file $GTHOME/LICENSE.txt. ! ! Other licensing options are available upon request, please contact ! giellatekno@hum.uit.no or feedback@divvun.no ! ========================================================================== ! !! !!!Inari Sámi morphological analyser ! ========================================================================== ! Multichar_Symbols !!≈ !!!@CODE@ definitions !! !!Parts of speech +N +A +Adv +V !!= * @CODE@ +Pron +CS +CC !!= * @CODE@ +Adp +Po +Pr !!= * @CODE@ +Interj +Pcle !!= * @CODE@ +Num +ABBR +ACR +Coll +Arab +Rom !!= * @CODE@ !! !!Subclasses +Pers +Dem +Interr +Indef !!= * @CODE@ +Refl +Recipr +Rel +Ord +NomAg !!= * @CODE@ !! !!Grammatical properties +IV +TV !! !Person - number +Sg +Pl +Du !!= * @CODE@ +Sg1 +Sg2 +Sg3 !!= * @CODE@ +Du1 +Du2 +Du3 !!= * @CODE@ +Pl1 +Pl2 +Pl3 !!= * @CODE@ +PxSg1 +PxSg2 +PxSg3 !!= * @CODE@ +PxDu1 +PxDu2 +PxDu3 !!= * @CODE@ +PxPl1 +PxPl2 +PxPl3 !!= * @CODE@ !! !Case +Nom +Gen +Acc !!= * @CODE@ +Ill +Ine +Ela !!= * @CODE@ +Com +Ess +Par +Abe !!= * @CODE@ +Loc !!= * @CODE@ +Known !!= * @CODE@ mon , till we found a better tag !! !Adjectival forms +Comp +Superl !!= * @CODE@ +Attr !!= * @CODE@ !! !Adverb types +Spat !!= * @CODE@ Spatial adverbs +Temp !!= * @CODE@ Temporal adverbs !! !Tense - mood +Ind +Pot +Cond +Imprt +ImprtII !!= * @CODE@ +Prs +Prt !!= * @CODE@ +Opt !! !Indefinite verb forms +Pass +Sup !!= * @CODE@ +Inf +Ger +GerII !!= * @CODE@ +ConNeg +Neg !!= * @CODE@ +PrsPrc +PrfPrc !!= * @CODE@ +VGen +VAbess !!= * @CODE@ +Actio !!= * @CODE@ ! Der#begin !! {{{ ! Derivation position in a derivation row: Affix and ! 1 2 3 4 POS type +Der1 +Der2 +Der3 +Der4 ! Der#1 +Der/t ! NA (XXX check and remove) +Der/Dimin ! NN (was: Der/aš & Der/š) +Der/lasj ! NA +Der/d ! VV +Der/tt ! VV - Causative čälittiđ +Der/Caus ! VV - 3-syll causatives +Der/l ! VV +Der/st ! VV čälistiđ +Der/Car ! NA * +Der1+Der2 - can only combine with Der3 caritive: peljittem +Der/laakan ! AA * +Der1+Der2 - can only combine with Der3 +Der/Pass ! VV - short passive ! Der#2 +Der/NomAg +Der/NomAct ! VN Der/NomAct har to realisasjonar, med ulike restriksjonar, +Der/sasj ! NA +Der/alla ! VV +Der/AAdv ! adverb pyeremusávt pyeremusâht +Der/taa ! adverb pyeremustáá !This is not the best tag? ! Der#3 +Der/Pass ! VV - long passive +Der/vuota ! AN ! Der#4 +Der/InchL ! VV ! +Der/NomAct ! VN Der/NomAct har to realisasjonar, med ulike restriksjonar, ! this is previous Der/n. This realisation is Der4. ! Outcommented to not define the tag twice, but kept ! here for documentation purposes. +Der/upmi ! VN +Der/mas ! VN !! }}} ! Der#end !! All non-positional derivations should be preceded by this tag, to make it possible !! to target regular expressions at all derivations in a language-independent way: !! just specify +Der|+Der1 .. +Der5 and you are set. +Der !!≈ * @CODE@ !! !Derivations !! !Other/unclassified derivations, can appear in all positions: +Der/ag !!= * @CODE@ neeljičievâg neeljijienâg kuulmâloonjâg neeljičievâg neeljijienâg +Der/ahasas !!= * @CODE@ 85-ahasâš škovlâahasâš +Der/ivvaas !!= * @CODE@ +Der/vualasas !!= * @CODE@ tutkâmvuálásâš !! !Clitics +Qst +Foc !!= * @CODE@ +Foc/gan !!= * @CODE@ +Foc/gas !!= * @CODE@ +Foc/ges !!= * @CODE@ +Foc/gis !!= * @CODE@ +Foc/gin !!= * @CODE@ +Foc/han !!= * @CODE@ +Foc/kin !!= * @CODE@ +Foc/ba !!= * @CODE@ +Foc/pa !!= * @CODE@ +Foc/sun !!= * @CODE@ +Foc/kis !!= * @CODE@ +Foc/ban !!= * @CODE@ +Foc/baa !!= * @CODE@ +Foc/baan !!= * @CODE@ +Foc/ge !!= * @CODE@ +Foc/go !!= * @CODE@ +Foc/kas !!= * @CODE@ +Foc/nii !!= * @CODE@ +Foc/uv !!= * @CODE@ !! !Usage tags +Err/Orth !!= * __@CODE@__ substandard, not in normative fst +Err/Lex !!= * __@CODE@__ substandard, not in normative fst, no normative lemma +MWE !!= * __@CODE@__ - MultiWord Expression, used for abbreviation extraction for preprocess.sh +Use/-PLX !!= * __@CODE@__ - do not include in Polderland spellers (most likely irrelevant for smn) +Use/-Spell !!= * __@CODE@__ - do not include in speller (even though the entry is formally correct) +Use/SpellNoSugg !!= * __@CODE@__ - Recognized, but not suggested in speller !! !!Semantic properties of names +Prop +Sem/Ani +Sem/Atr !!= * @CODE@ +Sem/Mal +Sem/Fem +Sem/Sur !!= * @CODE@ +Sem/Plc +Sem/Org +Sem/Obj +Sem/Obj-el !!= * @CODE@ +Sem/Measr +Sem/Money +Sem/Veh +Sem/Year !!= * @CODE@ !! !!Punctuation +CLB +PUNCT +HYPH !!= * @CODE@ +PAR +LEFT +RIGHT !!= * @CODE@ !! !!Morphophonemes ^P ^K ^Č ^H ^T !!= * @CODE@ for pp:v etc. gradation ! m7 n7 ŋ7 v7 s7 š7 r7 đ7 j7 l7 h7 '7 these are the dotted ones k4 l4 t4 p4 c4 t4 č4 !!= * @CODE@ = these are consonants that change in cg '7 !!= * @CODE@ i4 i6 !!= * @CODE@ = this is the postvocalic i consonant, realised as i i6 j6 !!= * @CODE@ = these are fake vowel and consonant, to get rules to function for exeptions i5 !!= * @CODE@ = comitative suffix-begin in loanwords a5 ä5 á5 u5 o5 !!= * @CODE@ these vowels do not change h5 j5 m5 ŋ5 t5 c5 d5 l5 t5 r5 č5 k5 !!= * @CODE@ these consonants do not change in WG y5 !!= * @CODE@ these vowels do not change, e.g. pyerá i2 u2 i3 â2 !!= * @CODE@ stemvowel changing to e, e.g. kyeli:kyeˊle ⎈ !!= * @CODE@ used for dynamic compounds, U+1F631 !! !Archiphonemes ^RC !!= * @CODE@ Root consonant dummy ^RV !!= * @CODE@ Root vowel dummy ^SC !!= * @CODE@ Suffix consonant dummy ^SV !!= * @CODE@ Suffix vowel dummy ^V !!= * @CODE@ = vowel copy !! !Triggers ^CLEN !!= * @CODE@ Consonant lengthening in qual WG ^CSH !!= * @CODE@ Consonant shortening (not WG) ^FCD !!= * @CODE@ Final consonant deletion ^EA !!= * @CODE@ is á and root vowel change in Ill Sg of i-stems ^RLEN !!= * @CODE@ Root vowel lengthening (impl. WG) ^RVSH !!= * @CODE@ Root vow shortening ^SLEN !!= * @CODE@ Suffix vowel lengthening ^SVLOW !!= * @CODE@ Suffix vowel lowering â > á and u > o ^SVSH !!= * @CODE@ Second syllable vowel shortening ^VLOW !!= * @CODE@ is Vowel lowering in 3rd sg of contract verbs tuhhid:tohhe ^WG !!= * @CODE@ Weak grade trigger ^ÁE !!= * @CODE@ á->e ^ÁI !!= * @CODE@ á->i ^VHIGH !!= * @CODE@ = hightening of vowels for verbs o to uu, a to oo ^VBACK !!= * @CODE@ = back vowels for verbs, ä to a (when needed, normally 2syll a|â is enough ^BLOCK !!= * @CODE@ = This symbol just to block otherwise triggering contexts !! !!Symbols that need to be escaped on the lower side (towards twolc): »7 ! » «7 ! « %[%>%] ! > %[%<%] ! < +Use/NG ! not-generate, for ped generation isme-ped.fst +Use/MT ! generate only for MT +Use/Circ !! !!Variants +v1 +v2 +v3 +v4 +Hom1 +Hom2 !! !!Semantic tags +Sem/Body !!= * @CODE@ denotes bodyparts +Sem/Plc !!= * @CODE@ denotes places !! !!Compound tags +Cmp !!= * @CODE@ compounds +Cmp/Hyph !!= * @CODE@ compounds +Cmp/SgNom !!= * @CODE@ compounds +Cmp/PlNom !!= * @CODE@ compounds +Cmp/Attr !!= * @CODE@ compounds +Cmp/SgGen !!= * @CODE@ compounds +Cmp/PlGen !!= * @CODE@ compounds +Cmp/SplitR !!= * @CODE@ compounds +Cmp/Sh !!= * @CODE@ compounds +CmpNP/All !!≈ * __@CODE@__ - ... in all positions, __default__, this tag does not have to be written +CmpNP/First !!≈ * __@CODE@__ - ... only be first part in a compound or alone +CmpNP/Pref !!≈ * __@CODE@__ - ... only __first__ part in a compound, NEVER alone +CmpNP/Last !!≈ * __@CODE@__ - ... only be last part in a compound or alone +CmpNP/Suff !!≈ * __@CODE@__ - ... only __last__ part in a compound, NEVER alone +CmpNP/None !!≈ * __@CODE@__ - ... does not take part in compounds +CmpNP/Only !!≈ * __@CODE@__ - ... only be part of a compound, i.e. can never !! be used alone, but can appear in any position !! The tagged part of the compound should make a compound using: +CmpN/SgN !!≈ * __@CODE@__ Singular Nominative +CmpN/SgG !!≈ * __@CODE@__ Singular Genitive +CmpN/PlG !!≈ * __@CODE@__ Plural Genitive !! Unmarked = Default, ie {{+CmpN/SgN}} for SMN. !! The second part of the compound may require that the previous (left part) is: +CmpN/SgNomLeft !!≈ * __@CODE@__ Singular Nominative +CmpN/SgGenLeft !!≈ * __@CODE@__ Singular Genitive +CmpN/PlGenLeft !!≈ * __@CODE@__ Plural Genitive !! !!Language tagged names +OLang/ENG !!= * @CODE@ +OLang/FIN !!= * @CODE@ +OLang/NNO !!= * @CODE@ +OLang/NOB !!= * @CODE@ +OLang/SME !!= * @CODE@ +OLang/SMA !!= * @CODE@ +OLang/SWE !!= * @CODE@ +OLang/UND !!= * @CODE@ +OLang/RUS !!= * @CODE@ !! !!Flag diacritics !! We have manually optimised the structure of our lexicon using following !! flag diacritics to restrict morhpological combinatorics - only allow compounds !! with verbs if the verb is further derived into a noun again: @P.NeedNoun.ON@ !!≈ | @CODE@ | (Dis)allow compounds with verbs unless nominalised @D.NeedNoun.ON@ !!≈ | @CODE@ | (Dis)allow compounds with verbs unless nominalised @C.NeedNoun@ !!≈ | @CODE@ | (Dis)allow compounds with verbs unless nominalised @R.NeedNoun.ON@ !!≈ | @CODE@ | (Dis)allow compounds with verbs unless nominalised !! !! For languages that allow compounding, the following flag diacritics are needed !! to control position-based compounding restrictions for nominals. Their use is !! handled automatically if combined with +CmpN/xxx tags. If not used, they will !! do no harm. @P.CmpFrst.FALSE@ !!≈ | @CODE@ | Require that words tagged as such only appear first @D.CmpPref.TRUE@ !!≈ | @CODE@ | Block such words from entering ENDLEX @P.CmpPref.FALSE@ !!≈ | @CODE@ | Block these words from making further compounds @D.CmpLast.TRUE@ !!≈ | @CODE@ | Block such words from entering R @D.CmpNone.TRUE@ !!≈ | @CODE@ | Combines with the next tag to prohibit compounding @U.CmpNone.FALSE@ !!≈ | @CODE@ | Combines with the prev tag to prohibit compounding @U.CmpNone.TRUE@ !!≈ | @CODE@ | Combines with the two previous ones to block compounding @P.CmpOnly.TRUE@ !!≈ | @CODE@ | Sets a flag to indicate that the word has passed R @D.CmpOnly.FALSE@ !!≈ | @CODE@ | Disallow words coming directly from root. @D.CmpHyph.TRUE@ !!≈ | @CODE@ | Flag to control hyphenated compounds like proper nouns @U.CmpHyph.FALSE@ !!≈ | @CODE@ | Flag to control hyphenated compounds like proper nouns @U.CmpHyph.TRUE@ !!≈ | @CODE@ | Flag to control hyphenated compounds like proper nouns @C.CmpHyph@ !!≈ | @CODE@ | Flag to control hyphenated compounds like proper nouns @P.CmpHyph.TRUE@ !!≈ | @CODE@ | Flag to control hyphenated compounds like proper nouns @N.CmpHyph.TRUE@ !!≈ | @CODE@ | Flag to control hyphenated compounds like proper nouns !! !! Use the following flag diacritics to control downcasing of derived proper !! nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use !! these flags. There exists a ready-made regex that will do the actual down-casing !! given the proper use of these flags. @U.Cap.Obl@ !!≈ | @CODE@ | Allowing downcasing of derived names: deatnulasj. @U.Cap.Opt@ !!≈ | @CODE@ | Allowing downcasing of derived names: deatnulasj. ! @P.Need3Part.ON@ @D.Need3Part.ON@ @C.Need3Part@ !3Part @U.NeedsVowRed.OFF@ !!≈ * @CODE@ is used to force hyphenation/non-reduction: samediggi- @U.NeedsVowRed.ON@ !!≈ * @CODE@ is used to force reduction w/o hyphen: samedigge#xxx @C.NeedsVowRed@ !!≈ * @CODE@ Clearing this feature, so that it doesn't interfere with further compounding @P.Px.add@ !!≈ * @CODE@ @R.Px.add@ !!≈ * @CODE@ @P.Px.block@ !!≈ * @CODE@ @D.Px.block@ !!≈ * @CODE@ @P.Nom3Px.add@ @R.Nom3Px.add@ @R.SpellRlx.ON@ !!≈ * @CODE@ Flag used to tag spell-relax-analysed strings (and only those). @D.SpellRlx.ON@ !!≈ * @CODE@ Flag used to tag spell-relax-analysed strings (and only those). @C.SpellRlx@ !!≈ * @CODE@ Flag used to tag spell-relax-analysed strings (and only those). @R.SpaceCmp.ON@ !!≈ * @CODE@ Flag to tag compounds written with a space @D.SpaceCmp.ON@ !!≈ * @CODE@ Flag to tag compounds written with a space @C.SpaceCmp@ !!≈ * @CODE@ Flag to tag compounds written with a space ! ================================================= !! !!!Basic lexica, pointing to the other lexicon files ! ================================================= LEXICON Root !!= @CODE@ @U.Cap.Obl@ ProperNoun ; ! !@U.Cap.Opt@ ProperNoun ; ProperNoun-smi-nocomp ; NounRoot ; AdjectiveRoot ; VerbRoot ; VGen_verbs ; Adverb ; Particle ; Subjunction ; Conjunction ; Adposition ; Interjection ; Pronoun ; Numeral ; Acronym ; Punctuation ; Abbreviation ; LEXICON ProperNoun !!= * __@CODE@__ Prefix-Proper ; ProperNoun-smn ; @N.CmpHyph.TRUE@ ProperNoun-smi-nocomp ; ! Lexicon for short names - always require hyphen ProperNoun-smi ; ! ProperNoun-smi-nocomp ; LEXICON ENDLEX !! !!!Lexicon @LEXNAME@ !! And this is the @LEXNAME@ of everything: !! {{{ @D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ # ; !!≈ @CODE@ ! @D.Need3Part.ON@ # ; !3part !! }}} !! The {{@D.CmpOnly.FALSE@}} flag diacritic is ued to disallow words tagged !! with +CmpNP/Only to end here. !! The {{@D.NeedNoun.ON@}} flag diacritic is used to block illegal compounds.