!!!Inari Sámi morphological analyser !!!Multichar_Symbols definitions !!Parts of speech * +N +A +Adv +V * +Pron +CS +CC * +Adp +Po +Pr * +Interj +Pcle * +Num +ABBR +ACR +Coll +Arab +Rom !!Subclasses * +Pers +Dem +Interr +Indef * +Refl +Recipr +Rel +Ord +NomAg !!Grammatical properties !Person - number * +Sg +Pl +Du * +Sg1 +Sg2 +Sg3 * +Du1 +Du2 +Du3 * +Pl1 +Pl2 +Pl3 * +PxSg1 +PxSg2 +PxSg3 * +PxDu1 +PxDu2 +PxDu3 * +PxPl1 +PxPl2 +PxPl3 !Case * +Nom +Gen +Acc * +Ill +Ine +Ela * +Com +Ess +Par +Abe * +Loc * +Known mon , till we found a better tag !Adjectival forms * +Comp +Superl * +Attr !Adverb types * +Spat Spatial adverbs * +Temp Temporal adverbs !Tense - mood * +Ind +Pot +Cond +Imprt +ImprtII * +Prs +Prt !Indefinite verb forms * +Pass +Sup * +Inf +Ger +GerII * +ConNeg +Neg * +PrsPrc +PrfPrc * +VGen +VAbess * +Actio {{{ }}} All non-positional derivations should be preceded by this tag, to make it possible to target regular expressions at all derivations in a language-independent way: just specify +Der|+Der1 .. +Der5 and you are set. * +Der !Derivations !Other/unclassified derivations, can appear in all positions: * +Der/ag neeljičievâg neeljijienâg kuulmâloonjâg neeljičievâg neeljijienâg * +Der/ahasas 85-ahasâš škovlâahasâš * +Der/ivvaas * +Der/vualasas tutkâmvuálásâš !Clitics * +Foc * +Foc/gan * +Foc/gas * +Foc/ges * +Foc/gis * +Foc/gin * +Foc/han * +Foc/kin * +Foc/ba * +Foc/pa * +Foc/sun * +Foc/kis * +Foc/ban * +Foc/baa * +Foc/baan * +Foc/ge * +Foc/go * +Foc/kas * +Foc/nii * +Foc/uv !Usage tags * __ +Err/Orth __ substandard, not in normative fst * __ +Err/Lex __ substandard, not in normative fst, no normative lemma * __ +MWE __ - MultiWord Expression, used for abbreviation extraction for preprocess.sh * __ +Use/-PLX __ - do not include in Polderland spellers (most likely irrelevant for smn) * __ +Use/-Spell __ - do not include in speller (even though the entry is formally correct) * __ +Use/SpellNoSugg __ - Recognized, but not suggested in speller !!Semantic properties of names * +Prop +Sem/Ani +Sem/Atr * +Sem/Mal +Sem/Fem +Sem/Sur * +Sem/Plc +Sem/Org +Sem/Obj +Sem/Obj-el * +Sem/Measr +Sem/Money +Sem/Veh +Sem/Year !!Punctuation * +CLB +PUNCT +HYPH * +PAR +LEFT +RIGHT !!Morphophonemes * ^P ^K ^Č ^H ^T for pp:v etc. gradation * k4 l4 t4 p4 c4 t4 č4 = these are consonants that change in cg * '7 * i4 i6 = this is the postvocalic i consonant, realised as i * i6 j6 = these are fake vowel and consonant, to get rules to function for exeptions * i5 = comitative suffix-begin in loanwords * a5 ä5 á5 u5 o5 these vowels do not change * h5 j5 m5 ŋ5 t5 c5 d5 l5 t5 r5 č5 k5 these consonants do not change in WG * y5 these vowels do not change, e.g. pyerá * i2 u2 i3 â2 stemvowel changing to e, e.g. kyeli:kyeˊle * ⎈ used for dynamic compounds, U+1F631 !Archiphonemes * ^RC Root consonant dummy * ^RV Root vowel dummy * ^SC Suffix consonant dummy * ^SV Suffix vowel dummy * ^V = vowel copy !Triggers * ^CLEN Consonant lengthening in qual WG * ^CSH Consonant shortening (not WG) * ^FCD Final consonant deletion * ^EA is á and root vowel change in Ill Sg of i-stems * ^RLEN Root vowel lengthening (impl. WG) * ^RVSH Root vow shortening * ^SLEN Suffix vowel lengthening * ^SVLOW Suffix vowel lowering â > á and u > o * ^SVSH Second syllable vowel shortening * ^VLOW is Vowel lowering in 3rd sg of contract verbs tuhhid:tohhe * ^WG Weak grade trigger * ^ÁE á->e * ^ÁI á->i * ^VHIGH = hightening of vowels for verbs o to uu, a to oo * ^VBACK = back vowels for verbs, ä to a (when needed, normally 2syll a|â is enough * ^BLOCK = This symbol just to block otherwise triggering contexts !!Symbols that need to be escaped on the lower side (towards twolc): !!Variants !!Semantic tags * +Sem/Body denotes bodyparts * +Sem/Plc denotes places !!Compound tags * +Cmp compounds * +Cmp/Hyph compounds * +Cmp/SgNom compounds * +Cmp/PlNom compounds * +Cmp/Attr compounds * +Cmp/SgGen compounds * +Cmp/PlGen compounds * +Cmp/SplitR compounds * +Cmp/Sh compounds * __+CmpNP/All__ - ... in all positions, __default__, this tag does not have to be written * __+CmpNP/First__ - ... only be first part in a compound or alone * __+CmpNP/Pref__ - ... only __first__ part in a compound, NEVER alone * __+CmpNP/Last__ - ... only be last part in a compound or alone * __+CmpNP/Suff__ - ... only __last__ part in a compound, NEVER alone * __+CmpNP/None__ - ... does not take part in compounds * __+CmpNP/Only__ - ... only be part of a compound, i.e. can never be used alone, but can appear in any position The tagged part of the compound should make a compound using: * __+CmpN/SgN__ Singular Nominative * __+CmpN/SgG__ Singular Genitive * __+CmpN/PlG__ Plural Genitive Unmarked = Default, ie {{+CmpN/SgN}} for SMN. The second part of the compound may require that the previous (left part) is: * __+CmpN/SgNomLeft__ Singular Nominative * __+CmpN/SgGenLeft__ Singular Genitive * __+CmpN/PlGenLeft__ Plural Genitive !!Language tagged names * +OLang/ENG * +OLang/FIN * +OLang/NNO * +OLang/NOB * +OLang/SME * +OLang/SMA * +OLang/SWE * +OLang/UND * +OLang/RUS !!Flag diacritics We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again: | @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised | @R.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm. | @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first | @D.CmpPref.TRUE@ | Block such words from entering ENDLEX | @P.CmpPref.FALSE@ | Block these words from making further compounds | @D.CmpLast.TRUE@ | Block such words from entering R | @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding | @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding | @U.CmpNone.TRUE@ | Combines with the two previous ones to block compounding | @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R | @D.CmpOnly.FALSE@ | Disallow words coming directly from root. | @D.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns | @U.CmpHyph.FALSE@ | Flag to control hyphenated compounds like proper nouns | @U.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns | @C.CmpHyph@ | Flag to control hyphenated compounds like proper nouns | @P.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns | @N.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags. | @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. | @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. * @U.NeedsVowRed.OFF@ is used to force hyphenation/non-reduction: samediggi- * @U.NeedsVowRed.ON@ is used to force reduction w/o hyphen: samedigge#xxx * @C.NeedsVowRed@ Clearing this feature, so that it doesn't interfere with further compounding * @P.Px.add@ * @R.Px.add@ * @P.Px.block@ * @D.Px.block@ * @R.SpellRlx.ON@ Flag used to tag spell-relax-analysed strings (and only those). * @D.SpellRlx.ON@ Flag used to tag spell-relax-analysed strings (and only those). * @C.SpellRlx@ Flag used to tag spell-relax-analysed strings (and only those). * @R.SpaceCmp.ON@ Flag to tag compounds written with a space * @D.SpaceCmp.ON@ Flag to tag compounds written with a space * @C.SpaceCmp@ Flag to tag compounds written with a space !!!Basic lexica, pointing to the other lexicon files LEXICON Root * __LEXICON ProperNoun __ !!!Lexicon ENDLEX And this is the ENDLEX of everything: {{{ @D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ # ; }}} The {{@D.CmpOnly.FALSE@}} flag diacritic is ued to disallow words tagged with +CmpNP/Only to end here. The {{@D.NeedNoun.ON@}} flag diacritic is used to block illegal compounds.