!!!South Sámi morphological analyser


 !!!Multichar_Symbols definitions

 !!Tags for POS (Part-Of-Speech, Word class)
 * __+N__ = Noun
 * __+A__ = Adjective
 * __+Adv__ = Adverb
 * __+V__ = Verb
 * __+Pron__ = Pronomen
 * __+CS__ = Subjunction
 * __+CC__ = Conjunction
 * __+Po__ = Postposition
 * __+Pr__ = Preposition
 * __+Interj__ = Interjection
 * __+Pcle__ = Participle?
 * __+Num__ = Numerals

 * +Logo    


 !Tags for sub-POS
!!Proper nouns
 * __+Prop     __ =

!!Pronoun subtypes
 * __+Pers    __ = Personal
 * __+Dem     __ = Demonstrative
 * __+Interr  __ = Interrogative
 * __+Refl    __ = Reflexive
 * __+Recipr  __ = Reciprocal
 * __+Rel     __ = Relative
 * __+Indef   __ = Indefinite
 * __+Coll    __ = Collective numeral
 * __+Arab    __ = Arabic numeral
 * __+Rom    __ =

 || Usage tags || Explanation
 | __+Err/Orth__ | Substandard, unormert form av et ord
 | __+Use/Marg__ | Marginal, correcte eksisterende former,men
                som er sjeldne. vi kan fjerne disse ordene
                f.eks fra speller, fordi de er så sjeldne og
                lite i bruke at de lemma som ligger nært kan
                bli forvekslet.
 | __+Use/-Spell__ | Excluded from speller
 | __+Use/-PLX__ | Excluded in PLX speller
 | __+Use/SpellNoSugg__ | Recognized but not suggested in speller
 | __+Use/Circ__ | Circular path
 | __+Use/CircN__ | Circular number path?
 | __+Use/Ped__ | Remove from pedagogical speller
 | __+Use/NG__ | Do not generate \\ for isme-ped.fst and apertium
 | __+Use/MT__ | Generate for apertium only
 | __+Err/Lex__ | lemma med dens ordformer er utenfor normen.
                No normative lemma, it's grammatically correct.
 | __+Err/Der__ | Errors in derivations
 | __+Use/NotDNorm__ | For words without formal normalization.
                Divvun suggest that this shouldn't be normative.
 | __+Use/DNorm__ | For words without formal normalization.
                Divvun suggest that this should be normative. Included in speller.

 !!Dialect tags:

 | +Dial/-S | Not in the South ! se på disse i forhold til smj, som
            har +Dial/N for denne.
 | +Dial/-N | Not in the North ! se på disse i forhold til smj, som
            har +Dial/s for denne.
 | +Dial/-NOR | Words not in Norway
 | +Dial/-SW | Words not in Sweden
 | +Dial/SH | Short forms
 | +Dial/L | Long forms


!!Normative/prescriptive compounding tags
(to govern compound behaviour for the speller, ie what a compound SHOULD BE)

!The first part of the component may be ...
 | +CmpN/Sg | Sg
 | +CmpN/SgN | SgNominative
 | +CmpN/SgG | SgGenitive
 | +CmpN/PlG | PlGenitive


!This part of the component can ...
 * +CmpNP/All - ... be in all positions, __default__, this tag does not have to be written
 * +CmpNP/First - ... only be first part in a compound or alone
 * +CmpNP/Pref - ... only __first__ part in a compound, NEVER alone
 * +CmpNP/Last - ... only be last part in a compound or alone
 * +CmpNP/Suff - ... only __last__ part in a compound, NEVER alone
 * +CmpNP/None - ... not take part in compounds
 * +CmpNP/Only - ... only be part of a compound, i.e. can never
                be used alone, but can appear in any position

!The second part of the compound may take ...
 | +CmpN/SgLeft | Sg to the left
 | +CmpN/SgNomLeft | etc.
 | +CmpN/SgGenLeft | "
 | +CmpN/PlGenLeft | "

!This part of the compound may ...
 | +CmpN/Def | works along with Left compound-tagging
 | +CmpN/DefSgGen | works along with Left compound-tagging
 | +CmpN/DefPlGen | works along with Left compound-tagging

 | +Cmp/XForm | Alle Cmp som ikke har en klar klassifisering
 | +Cmp/AttrH | Alle Cmp som har en attr-h


!!Descriptive compounding tags
Tags for compound analysis - this is what a compound actually is. We use this
to research compounding patterns in the corpus.

 ; +Cmp/Sg       : Compounding using an unspecified singular stem
 ; +Cmp/SgNom    : Compounding using nominative singular
 ; +Cmp/SgGen    : Compounding using genitive singular
 ; +Cmp/PlGen    : Compounding using genitive plural
 ; +Cmp/Attr     : Compounding using attribute form
 ; +Cmp/eh : Compound stem in __–eh__, as in ''gaameh-gåaroje'',
            from ''gaamege''
 ; +Cmp/ege : Compound stem in __–ege__, as in ''gaamege-gåaroje''
 ; +Cmp/FinEDel : Deletion of final __e__, as in ''voelem-gaaroeh'',
            from ''voeleme''
 ; +Cmp/ShH : Compounding using a short stem + __h__: ''–biejjh–''
            (from ''biejjie''), cf ''reakedsbiejjhvadtese''
 ; +Cmp/Sh : Compounding using a short stem: ''–biejj–''
            (from ''biejjie'')
 ; +Cmp/SplitR : This is a split compound with the other part to the
            right: \\ "Arbeids- og inkluderingsdepartementet" =>
            ''Arbeids–'' = __+Cmp/SplitR__
 ; +Cmp/SplitL : This is a split compound with the other part to the
            left, this is the oposite of the previous case
 ; +Cmp : Dynamic compound - this tag should ''always'' be
            part of a dynamic compound. It is important for
            Apertium and the speller (to give extra weights to
            compounds, and useful in other cases as well.

!!Tags for Inflection

!Tags for Case and Number Inflection
!!Case and number
 * __+Sg    __ = Singular
 * __+Pl    __ = Plural
 * __+Du    __ = Dual

 * __+Nom   __ = Nominative
 * __+Acc   __ = Ackusative
 * __+Gen   __ = Genitive
 * __+Ine   __ = Inesive
 * __+Ela   __ = Elative
 * __+Ill   __= Illative
 * __+Com   __= Comitative
 * __+Ess   __ = Essive


!!Px
 * __  +PxSg1                            __ =  Possessives Singular
 * __  +PxSg2                            __ =  Possessives Singular
 * __  +PxSg3                            __ =  Possessives Singular
 * __  +PxDu1                            __ =  Possessives Dual
 * __  +PxDu2                            __ =  Possessives Dual
 * __  +PxDu3                            __ =  Possessives Dual
 * __  +PxPl1                            __ =  Possessives Plural
 * __  +PxPl2                            __ =  Possessives Plural
 * __  +PxPl3                            __ =  Possessives Plural

 | __+Prs__ | Presens
 | __+Prt__ | Preteritum

 | __+Sg1__ | Singular, 1.person
 | __+Sg2__ | Singular, 2.person
 | __+Sg3__ | Singular, 3.person
 | __+Du1__ | Dual    , 1.person
 | __+Du2__ | Dual    , 2.person
 | __+Du3__ | Dual    , 3.person
 | __+Pl1__ | Plural  , 1.person
 | __+Pl2__ | Plural  , 2.person
 | __+Pl3__ | Plural  , 3.person

 | +Neg | negation verb ij
 | +ConNeg | main verb complement to Neg, form identical to Imp
 | +VAbess | Verb Abessive

 ; +Inf : Infinitive and participles
 ; +PrfPrc :  Infinitive and participles
 ; +PrsPrc : Infinitive and participles
 ; +Ger : Gerundium
 ; +VGen : Verbgenitive

 ; +Ind : Indicative
 ; +Imprt : Imperative
 ; +Cond : Kondisjonalis, for one form: lidtjie.  To be looked at.+ lidtjim, + lidtjih
 ; +Act : -eme, could be chagned to +Actio

!Tags for adjectives

 ; +Comp +Superl : Comparation
 ; +Attr : Attribute form
 ; +Ord : Ordinal number

!!Other tags
 ; +ABBR +ACR : Abbreviation, acronym
 ; +TV +IV : Transitivity tags
 ; +Multi : Multiword phrase tag ?
 ; +Guess : for the name guesser ?
 ; +LOAN : ad hoc tag for development purposes ?
 ; +Cmp/Hyph : A tag to indicate that a hyphen was used when compounding?

!Tags for testing the frequency of certain phenomenas in our corpora

 ; +Test/LysI : form uses i
 ; +Test/MørkI : form uses ï
 ; +Test/HK : form uses consonant cluster hk
 ; +Test/GK : form uses consonant cluster gk
 ; +Test/JK : form uses consonant cluster jk
 ; +Uml : A tag to indicate realised or potential Umlaut
 ; +NoUml : A tag to indicate the lack of realised or potential Umlaut

!Tags for punctuation

 ; +CLB : XXX These should be documented better
 ; +PUNCT : XXX These should be documented better
 ; +LEFT : XXX These should be documented better
 ; +RIGHT : XXX These should be documented better


!Different focus particles

 ; +Foc : XXX Document better = Forsterkende particle?
 ; +Foc/ge : XXX Document better = Forsterkende particle
 ; +Foc/gan : XXX Document better = Forsterkende particle
 ; +Foc/gih : XXX Document better = Forsterkende particle
 ; +Foc/gænnah : XXX Document better = Forsterkende particle


!!Semantic tags to help disambiguation & synt. analysis:

 *  +Sem/Act          = Activity
 *  +Sem/Adr          = Webadr
 *  +Sem/Amount       = Amount
 *  +Sem/Ani          = Animate
 *  +Sem/Aniprod      = Animal Product
 *  +Sem/Body         = Bodypart
 *  +Sem/Body-abstr   = siellu, vuoig?a, jierbmi
 *  +Sem/Build        = Building
 *  +Sem/Build-part   = Part of Bulding, like the closet
 *  +Sem/Cat          = Category
 *  +Sem/Clth         = Clothes
 *  +Sem/Clth-jewl    = Jewelery
 *  +Sem/Clth-part    = part of clothes, boallu, sávdnji...
 *  +Sem/Ctain        = Container
 *  +Sem/Ctain-abstr  = Abstract container like bank account
 *  +Sem/Ctain-clth   =
 *  +Sem/Curr         = Currency like dollár, Not Money
 *  +Sem/Dance        = Dance
 *  +Sem/Dir          = Direction like GPS-kursa
 *  +Sem/Domain       = Domain like politics, reindeerherding (a system of actions)
 *  +Sem/Drink        = Drink
 *  +Sem/Dummytag     = Dummytag
 *  +Sem/Edu          = Educational event
 *  +Sem/Event        = Event
 *  +Sem/Feat         = Feature, like Árvu
 *  +Sem/Feat-phys    = Physiological feature, ivdni, fárda
 *  +Sem/Feat-psych   = Psychological feauture
 *  +Sem/Feat-measr   = Psychological feauture
 *  +Sem/Fem          = Female name
 *  +Sem/Food         = Food
 *  +Sem/Food-med     = Medicine
 *  +Sem/Furn         = Furniture
 *  +Sem/Game         = Game
 *  +Sem/Geom         = Geometrical object
 *  +Sem/Group        = Animal or Human Group
 *  +Sem/Hum          = Human
 *  +Sem/Hum-abstr    = Human abstract
 *  +Sem/Ideol        = Ideology
 *  +Sem/Lang         = Language
 *  +Sem/Mal          = Male name
 *  +Sem/Mat          = Material for producing things
 *  +Sem/Measr        = Measure
 *  +Sem/Money        = Has to do with money, like wages, not Curr(ency)
 *  +Sem/Obj          = Object
 *  +Sem/Obj-clo      = Cloth
 *  +Sem/Obj-cogn     = Cloth
 *  +Sem/Obj-el       = (Electrical) machine or apparatus
 *  +Sem/Obj-ling     = Object with something written on it
 *  +Sem/Obj-rope     = flexible ropelike object
 *  +Sem/Obj-surfc    = Surface object
 *  +Sem/Org          = Organisation
 *  +Sem/Part         = Feature, oassi, bealli
 *  +Sem/Perc-cogn    = Cloth
 *  +Sem/Perc-emo     = Emotional perception
 *  +Sem/Perc-phys    = Physical perception
 *  +Sem/Perc-psych   = Physical perception
 *  +Sem/Plant        = Plant
 *  +Sem/Plant-part   = Plant part
 *  +Sem/Plc          = Place
 *  +Sem/Plc-abstr    = Abstract place
 *  +Sem/Plc-elevate  = Place
 *  +Sem/Plc-line     = Place
 *  +Sem/Plc-water    = Place
 *  +Sem/Pos          = Position (as in social position job)
 *  +Sem/Process      = Process
 *  +Sem/Prod         = Product
 *  +Sem/Prod-audio   = Audio product
 *  +Sem/Prod-cogn    = Cognition product
 *  +Sem/Prod-ling    = Linguistic product
 *  +Sem/Prod-vis     = Visual product
 *  +Sem/Rel          = Relation
 *  +Sem/Route        = Route
 *  +Sem/Rule         = Rule or convention
 *  +Sem/Semcon       = Semantic concept
 *  +Sem/Sign         = Sign (e.g. numbers, punctuation)
 *  +Sem/Sport        = Sport
 *  +Sem/State        =
 *  +Sem/State-sick   = Illness
 *  +Sem/Substnc      = Substance, like Air and Water
 *  +Sem/Sur          = Surname
 *  +Sem/Symbol       = Symbol
 *  +Sem/Time         = Time
 *  +Sem/Tool         = Prototypical tool for repairing things
 *  +Sem/Tool-catch   = Tool used for catching (e.g. fish)
 *  +Sem/Tool-clean   = Tool used for cleaning
 *  +Sem/Tool-it      = Tool used in IT
 *  +Sem/Tool-measr   = Tool used for measuring
 *  +Sem/Tool-music   = Music instrument
 *  +Sem/Tool-write   = Writing tool
 *  +Sem/Txt          = Text (girji, lávlla...)
 *  +Sem/Veh          = Vehicle
 *  +Sem/Wpn          = Weapon
 *  +Sem/Wthr         = The Weather or the state of ground


 | +MWE | multi word expressions, goes to abbr

Use the following flag diacritics to control downcasing of derived proper
nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use
these flags. There exists a ready-made regex that will do the actual down-casing
given the proper use of these flags.

 | @P.Px.add@ | Giving possibility for Px-suffixes (all except from Nom 3.p)
 | @R.Px.add@ | Requiring P.Px.add-flag for Px-suffixes (all except from Nom 3.p)
 | @P.Nom3Px.add@ |  Giving possibility for Px-suffixes Nom 3.p
 | @R.Nom3Px.add@ | Requiring P.Nom3Px.add flag for Px-suffixes Nom 3.p


!!Derivation position in a derivation row
Affix and tag, frompos - topos


 | +Der1 | Position
 | +Der2 | Position
 | +Der3 | Position
| Der#1
 | +Der/htalle | VV - Passive, frekeventative
 | +Der/lg | VV - Passive
 | +Der/ijes | NA - Nomen agentis
 | +Der/ihks | VA - (Handlernomen- tilbøyelig til å utføre den handlingen som grunnordet angir)
 | +Der/les | VA - Intensive
 | +Der/ldihkie | VA -
 | +Der/ldahke | VA - Resultatnomen (?)
 | +Der/ldh | VA - Attributt
 | +Der/ht | VV - Causative
 | +Der/l | VV - Subitive
 | +Der/st | VV - Diminutive, Subitive
 | +Der/d | VV - Continuative, Konative, Frequentative, Refleksive, Momentan
 | +Der/Car | -hts, Caritive, was Der/heapmi in sme
+Der/dMOM  ?
+Der/dKON  ?
+Der/dREFL ?
+Der/dCONT ?
+Der/dCONTFREC ?
+Der/dFREC ?

 | +Der/htj | Dim-cont, Frequentative
 | +Der/Dimin | NN - Diminutive
 | +Der/Rec | NN - Forholdsformer
 | +Der/laakan | AAdv - adverb


| Der#2
 | +Der/vuota | AN - Noun
 | +Der/adte | VV - Frequentative, Kontinuativ
 | +Der/alla | VV - Frequentative
 | +Der/eds | NA - Attributt
| Der#3
 | +Der/PassL | VV - long only
 | +Der/NomAg | VN - Nomen Agentis
 | +Der/NomAct | VN - Nomen Actionis
 | +Der/ahtje | VV - Inchoative
 | +Der/InchL | VV - Inchoative
| Der#4   | ''So far +Der4 is not motivated for SMA.''

!Other, non-positional derivations

All non-positional derivations should be preceded by the following tag,
to make it possible to target regular expressions in all derivations in a
language-independent way:
just specify {{{[+Der|+Der1 .. +Der5]}}} and you are set.
 ; +Der : Tag to precede any non-positional derivation

 | +Der/PassS | VV - short passive only
 | +Der/A | NA - comparation of N's

!!Tags for originating language
The following tags are used to guide conversion to IPA: loan words
and foreign names are usually pronounced (approximately) as in the
originating (majority) language. Instead of trying to identify the
correct pronounciation based on fonotactics (orthotactics actually),
we tag all words that can't be correctly transcribed using the SME
transcriber with source language codes. Once tagged, it is possible
to split the lexical transducer in smaller ones according to langu-
age, and apply different IPA conversion to each of them.
The principle of tagging is that we only tag to the extent needed,
and following a priority:
# any untagged word is pronounced with SME orthographic conventions
# NNO and NOB have identical pronounciation, NNO is only used if
  different in spelling from NOB
# SWE has mostly the same pronounciation as NOB, and is only used
  if different in spelling from NOB
# Occasionally even SME (the default) may be tagged, to block other
  languages from being specified, mainly during semi-automatic
  language tagging sessions
All in all, we want to get as much correctly transcribed to IPA
with as little work as possible. On the other hand, if more words
are tagged than strictly needed, this should pose no problem as
long as the IPA conversion is correct - at least some words will
get the same pronounciation whether read as SME or NOB/NNO/SWE.
 * +OLang/SME - North Sámi
 * +OLang/SMA - South Sámi
 * +OLang/FIN - Finnish
 * +OLang/SWE - Swedish
 * +OLang/NOB - Norw. bokmål
 * +OLang/NNO - Norw. nynorsk
 * +OLang/ENG - English
 * +OLang/RUS - Russian
 * +OLang/UND - Undefined
 * +Area/SE  In Sweden
 * +Area/NO  In Norway


!!Triggers for morphophonological rules
 ; X2 : Trigger for ???
 ; X3 : Trigger for ???
 ; X4 : Trigger for ???
 ; E2 : insert e ! Fjern denne.
!!Symbols that need to be escaped on the lower side (towards twolc):
 ; »7 : Literal »
 ; «7 : Literal «
{{{
  %[%>%]  - Literal >
  %[%<%]  - Literal <
}}}
!!Lexeme disambiguation tags
 ; +Hom1 : Homonymy
 ; +Hom2 : Homonymy

!!Stem variant tags
 ; +v1 : variant 1
 ; +v2 : variant 2
 ; +v3 : variant 3
 ; +v4 : variant 4
 ; +v5 : variant 5


!!Umlaut and diphthong simplification triggers

 | %^DISIMP | diphthong simplification
 | %^COMPDISIMP | diphthong simplification in comparatives
 | %^COMPDISIMP2 | diphthong simplification in comparatives, type 2
 | %^COMPDISIMP3 | diphthong simplification
 | %^PLCDISIMP | diphthong simplification in ACCRA-names
 | %^NOMAGieDISIMP | diphthong simplification for NomAg ie stems
 | %^1UML | a-uml, like 1sg prs, perf.part of båetedh/V-I, and ill sg of -ie nouns
 | %^2UML | dark e, as 3sg prs & perf.part of tjearodh/V-II, and ill sg of -oe nouns
 | %^3UML | adj Umlaut oeh:an
 | %^3sUML | a-uml in 3sg prs of V-IV (roehtedh - ruahta)
 | %^3dUML | ie-uml in 1du & 3pl prs of V-IV (roehtedh - ruehtien)
 | %^iæUML | not used
 | %^iUML | i-uml in pret of V-I (båetedh - böötim)
 | %^PASSUML | Short passive Umlaut Rx->R5
 | %^didhUML | Der/d Umlaut for GUARKEDH-words
 | %^htjidhUML | Umlaut für der/htjidh derivations
 | %^adteUML | Umlaut für Der/adte and Der/alla derivations
 | %^aLATUS | Latus-Umlaut for -ie stems
 | %^uLATUS | Latus-Umlaut for -oe stems
 | %^ConsDel | Stem consonant deletion in front of Der/PassL
 | %^ILLELA | Stem vowel changes in Illative an Elative
 | %^PLGENPLCOM | Stem vowel changes in final from e -> i, and withoaut -j-
 | %^COMESS | Stem vowel changes in ACCRA-names

 | 😱 | Symbol used before # in dynamic compounds, and only there


!!Flag diacritics
We have manually optimised the structure of our lexicon using following
flag diacritics to restrict morhpological combinatorics - only allow compounds
with verbs if the verb is further derived into a noun again:
 | @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised
 | @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised
 | @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised
For languages that allow compounding, the following flag diacritics are needed
to control position-based compounding restrictions for nominals. Their use is
handled automatically if combined with +CmpN/xxx tags. If not used, they will
do no harm.
 | @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first
 | @D.CmpPref.TRUE@ | Block such words from entering ENDLEX
 | @P.CmpPref.FALSE@ | Block these words from making further compounds
 | @D.CmpLast.TRUE@ | Block such words from entering R
 | @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding
 | @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding
 | @U.CmpNone.TRUE@ | Combines with the two previous ones to block compounding
 | @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R
 | @D.CmpOnly.FALSE@ | Disallow words coming directly from root.
 | @U.CmpHyph.FALSE@ | Flag to control hyphenated compounds like proper nouns
 | @U.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns
 | @C.CmpHyph@ | Flag to control hyphenated compounds like proper nouns

Use the following flag diacritics to control downcasing of derived proper
nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use
these flags. There exists a ready-made regex that will do the actual down-casing
given the proper use of these flags.
 | @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj.
 | @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj.


!!!Lexicon Root
This is the beginning of everything. The __Root__ lexicon is reserved in the
LexC language, and must be the first lexicon defined.

Here is the list of lexica in the sma analyser


 *              NounRoot     ; 
 *              Verb         ; 
 *              Adjective    ; 
 *              Pronoun      ; 
 *              Adverb       ; 
 *              Subjunction  ; 
 *              Conjunction  ; 
 *              Particle     ; 
 *              Adposition   ; 
 *              Punctuation  ; 
 *              Interjection ; 
 *  +Use/CircN: Numeral      ; 
 *              Abbreviation ; 
 *              Acronym      ; 


 * __LEXICON ProperNoun   __


!!!Lexicon ENDLEX
And this is the ENDLEX of everything:
{{{
 @D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ # ;
}}}
The {{@D.CmpOnly.FALSE@}} flag diacritic is ued to disallow words tagged
with +CmpNP/Only to end here.
The {{@D.NeedNoun.ON@}} flag diacritic is used to block illegal compounds.