Divvun & Giellatekno - open source grammars for North Sámi.

                    !!!North Sámi morphological analyser


!!Multicharacter symbols


!Tags for POS


 * __ +N 	  __ - Noun
 * __ +A 	  __ - Adjective
 * __ +Adv 	  __ - Adverb
 * __ +V		  __ - Verb
 * __ +Pron 	  __ - Pronoun
 * __ +CS 	  __ - Subjunction
 * __ +CC 	  __ - Conjunction
 * __ +Adp 	  __ - Adposition, ie Post- and Prepostion
 * __ +Po 	  __ - Postpostion
 * __ +Pr 	  __ - Preposition
 * __ +Interj  __ - Interjection
 * __ +Pcle 	  __ - Particle
 * __ +Num 	  __ - Numeral

!Tags for sub-POS
 * __ +Prop 	 __ - Propernoun
 * __ +Pers 	 __ - Personal Pronoun
 * __ +Dem 	 __ - Demonstrative Pronoun
 * __ +Interr __ - Interrogative Pronoun
 * __ +Refl 	 __ - Reflexive Pronoun
 * __ +Recipr __ - Reciprocal Pronoun
 * __ +Rel 	 __ - Relative Pronoun
 * __ +Indef	 __ - Indefinitive Pronoun
 * __ +Coll	 __ - Collective numerals
 * __ +Arab	 __ - Arabic numeral
 * __ +Rom    __ - Roman numeral
 * __ +Pass   __ - hallat/haddat not in use
 * __ +Known  __ - man (different from maid)


!!Tags for Inflection

!Tags for Case and Number Inflection
 * __+Sg__ - Singular
 * __+Du__ - Dual
 * __+Pl__ - Plural

 * __+Ess	__ - Essive
 * __+Nom	__ - Nominative
 * __+Gen	__ - Genitive
 * __+Acc	__ - Accusative
 * __+Ill	__ - Illative
 * __+Loc	__ - Locative = Inessive and Ellative
 * __+Com	__ - Comitative
 * __+Com/Sh__ -  Comitative Plural Hyphened Shortform (w/o -guin), ie Beatnagii-, Biillai-, Bohccui- etc.

!Possessive tags

 * __+PxSg1__    Singular First Person
 * __+PxSg2__    Singular Second Person
 * __+PxSg3__    Singular Third Person
 * __+PxDu1__    Singular First Person
 * __+PxDu2__    Singular Second Person
 * __+PxDu3__    Singular Third Person
 * __+PxPl1__    Singular First Person
 * __+PxPl2__    Singular Second Person
 * __+PxPl3__    Singular Third Person

!Adjectival tags
 * __+Comp	__   Comparative
 * __+Superl	__   Superlative
 * __+Attr	__   Attributive
 * __+Card	__   Cardinal Number Not in use
 * __+Ord	__   Ordinal Number

!Moods
 * __+Ind	__ Indicative
 * __+Pot	__ Potential
 * __+Cond	__ Conditional
 * __+Imprt	__ Imperative

!Tenses
 * __+Prs	__ Present Tense
 * __+Prt	__ Past Tense, Preterite

!Verb person-number

 * __+Sg1	__ Singular First Person
 * __+Sg2	__ Singular Second Person
 * __+Sg3	__ Singular Third Person
 * __+Du1	__ Dual First Person
 * __+Du2	__ Dual Second Person
 * __+Du3	__ Dual Third Person
 * __+Pl1	__ Plural First Person
 * __+Pl2	__ Plural Second Person
 * __+Pl3	__ Plural Third Person

!Infinite verb forms

 * __+Inf	__ Infinitive
 * __+Ger	__ Gerund
 * __+ConNeg	__ Negation Form, ie Mana, Doalvvo, Juoge etc
 * __+ConNegII__ Alternative, Rather Declamatory Negation Form - Infrequent
 * __+Neg	__ Negation Verb, Ii and its forms, ie Ale, Alli, Allot, Ehpet, Eat etc.
 * __+ImprtII__ Alternative, Rather Declamatory Imperative Form - Infrequent not in use
 * __+PrsPrc	__ Present Participe
 * __+PrfPrc	__ Perfect Participe
 * __+Sup	__ Supine
 * __+VGen	__ VerbGenitive
 * __+VAbess	__ VerbAbbesive
 * __+Actio	__ Action Verb Form

!Other tags

 * __+ABBR		__ Abbreviation
 * __+ACR		__  Acronym
 * __+CLB		__  Clause border (full stop, comma..)
 * __+PUNCT		__  punctuation
 * __+LEFT		__  left paranthesis
 * __+RIGHT		__  right paranthesis
 * __+Dyn		__  Dynamically generated (acronyms)

 * __+TV 	__    Transitive Verb
 * __+IV__ Intransitive Verb
 * __+G3__ Grade 2-3 for homonymies with grade 1-2
 * __+G7__ Grade 3, no consonant gradation
 * __+NomAg__ Actor Noun From Verb - Nomen Agentis

!Question and Focus particles:
 * __+Qst		__   Question Particle
 * __+Subqst		__   Embedded Question Particle
 * __+Foc/naj	__
 * __+Foc/Neg-ge	__
 * __+Foc/Pos-ge	__
 * __+Foc/gen	__
 * __+Foc/ges	__
 * __+Foc/gis	__
 * __+Foc/ba	__
 * __+Foc/be	__
 * __+Foc/hal	__
 * __+Foc/han	__
 * __+Foc/bai	__
 * __+Foc/bas	__
 * __+Foc/bat	__
 * __+Foc/ban	__
 * __+Foc/son	__
 * __+Foc/bahal__
 * __+Foc/behal__
 * __+Foc/bahan__
 * __+Foc/behan__
 * __+Foc/bason__
 * __+Foc/beson__
 * __+Foc/mat__
 * __+Foc/mis__
 * __+Foc/s__


!Tags distinguishing different versions of the same lemma (before POS)
 * +v1
 * +v2
 * +v3
 * +v4
 * +v5
 * +v6
 * +v7
 * +v8
 * +v9
 * +v10
 * +v11
 * +v12
 * +v13
 * +v14
 * +v15
 * +v16
 * +v17
 * +v18
 * +v19
 * +v20
 * +v21
 * +v22
 * +v23
 * +v24

Note: These high +v... number are in use for one word only:
doavttergrádakursa

!Escaped chars

 * __ %    __
 * +Guess for the name guesser
 * __ +MWE	  __ - Multi-word expressions treated as such in the preprocessor

 * +PxCPlComRecipr used in pronoun-sme-morph.txt


!Usage tags

 * __+Err/Orth__ substandard, not in normative fst
 * __+Err/Orth-a-á__ substandard, not in normative fst
 * __+Err/Orth-nom-gen__ substandard, not in normative fst
 * __+Err/Orth-nom-acc__ substandard, not in normative fst
 * __+Err/Lex__ substandard, not in normative fst, no normative lemma
 * __+Err/DerSub__ substandard for derivation, not in normative fst, no normative lemma
 * __+Err/CmpSub__ substandard for compounding, not in normative fst (wrong form or POS in first part)
 * __+Err/MissingSpace__ indicates that there is a missing space, causing an orthographic error
 * __+Err/MissingHyph__ when there is no hyphen where it should have been
 * __+Err/Hyph__ when there is a hyphen where none should have been
 * __+Err/SpaceCmp__ used for compounds written apart - only retained in the HFST Grammar Checker disambiguation analyser
 * __+Err/Spellrelax__ used to tag spellrelaxed typos (tag is inserted via flag diacritics)
 * __+Use/Marg__ marginal
 * __+Use/-Spell__ Excluded in speller
 * __+Use/-PLX__ Excluded in PLX-speller
 * __+Use/SpellNoSugg__ recognized but not suggested in speller
 * __+Use/Circ__ circular paths (old ^C^)
 * __+Use/CircN__ circular paths for the numerals (old ^N^)
 * __+Use/NG__ not-generate, for ped generation isme-ped.fst
 * __+Use/MT__ Generate for MT only, for restricting analyses needed
              for MT generation not to pop up elsewhere
 * __+Use/NGminip__ Not for miniparadigm in VD dicts
 * __+Use/Disamb__ means that the following is only used in the analyser feeding the disambiguator
 * __+Use/GC__ only retained in the HFST Grammar Checker disambiguation analyser
 * __+Use/-PMatch__ Do not include in fst's made for hfst-pmatch
 * __+MWESplit__ Split point for MWE

!Dialect tags:

 * __ +Dial/-KJ  __ forms not in use in KJ (Kárásjohka)
 * __ +Dial/-GG  __ forms not in use in GG (Guovdageaidnu)
 * __ +Dial/-GS  __ forms not in use in GS (Gárasavvon) not in use
 * __ +South		__ foreløpig lagt til Sg Loc -n, som er en sub-form

!Tags for indicating the orthography used
 +Orth/Strd - Standard orthography
 +Orth/IPA - IPA transcription

The above should either be used in pairs, or not at all. That is, if a word
doesn't need an IPA stem (because the word in all its inflection can be
converted to IPA by the standard IPA conversion rules), then none of these
tags should be used.
On the other hand, if the word has a spelling that doesn't follow the
orthographic rules, and thus needs an exceptional IPA stem to get it right,
then the exceptional stem must be marked with the {{+Orth/IPA}}, and the
regular orthography stem must be marked with the tag {{+Orth/Strd}}. This is
so that we can exclude the one or the other from different fst's, but only
when the oposite stem variant is present.

!Multichars for marking start and end of IPA sequences
 * %{%<ipa#%} - ipa text to the left
 * %{#ipa%>%} - ipa text to the right
 * %<sent%> 		 apertium

!Compounding tags

The tags are of the following form:
* __+CmpNP/xxx__ - Normative (N), Position (P), ie the tag describes what
                   position the tagged word can be in in a compound
* __+CmpN/xxx__  - Normative (N) __form__ ie the tag describes what
                   form the tagged word should use when making compounds
* __+Cmp/xxx__   - Descriptive compounding tags, ie tags that ''describes''
                   what form a word actually is using in a compound

This entry / word should be in the following position(s):

 * __+CmpNP/All__ - ... in all positions, __default__, this tag does not have to be written
 * __+CmpNP/First__ - ... only be first part in a compound or alone
 * __+CmpNP/Pref__ - ... only __first__ part in a compound, NEVER alone
 * __+CmpNP/Last__ - ... only be last part in a compound or alone
 * __+CmpNP/Suff__ - ... only __last__ part in a compound, NEVER alone
 * __+CmpNP/None__ - ... does not take part in compounds
 * __+CmpNP/Only__ - ... only be part of a compound, i.e. can never
                    be used alone, but can appear in any position

If unmarked, any position goes.

The tagged part of the compound should make a compound using:

 * __+CmpN/SgN__ Singular Nominative
 * __+CmpN/SgG__ Singular Genitive
 * __+CmpN/PlG__ Plural Genitive
 * __+CmpN/PlN__ Plural Nominative, propers!

Unmarked = Default, ie {{+CmpN/SgN}} for SME.

The second part of the compound may require that the previous (left part) is:

 * __+CmpN/SgNomLeft__ Singular Nominative
 * __+CmpN/SgGenLeft__ Singular Genitive
 * __+CmpN/PlGenLeft__ Plural Genitive


Tags for descriptive compound analysis - this is what a compound actually is:

 * __+Cmp__ - Dynamic compound. This tag should always be part
                of a dynamic compound. It is important for
                Apertium, and useful in other cases as well.
 * __+Cmp/Attr__ - Attributive
 * __+Cmp/SgNom__ - Singular Nominative
 * __+Cmp/SgGen__ - Singular Genitive
 * __+Cmp/PlGen__ - Plural Genitiv
 * __+Cmp/SplitR__ - This is a split compound with the other part to
                the right: "Arbeids- og inkluderingsdepartementet"
                => Arbeids- = +Cmp/SplitR
 * __+Cmp/SplitL__ - This is a split compound with the other part to the left
 * __+Cmp/Sh__ - testing +Cmp/Sh
 * __+Cmp/Hyph__ - on dynamic compounds that have a hyphen
 * __+Cmp/NoHyph__ - On compounds that COULD have had a hyphen (and usually have), but doesn't
 * __+Cmp/SoftHyph__ - Tags compounds containing SOFT HYPHENS (U+00AD)
 * __+Cmp/Cit__ - Tags citation compounds, which can in principle
                cover any word. Requires a hyphen.

!Compounding tag ordering
To ease writing and maintaining regexes etc for manipulating and enforcing
compounding, it is important to keep the tags in a certain order.
The order is:
# __+CmpN/__ tags
# __+CmpNP/__ tags
# __+Cmp/__ tags - this is always true since the descriptive tags are always
  part of the continuation lexicons, and will be located after the POS tag.


!L2 errortags
 * +CGErr	
 * +IllErr
 * +IllVErr
 * +ComVErr
 * +DiphErr
 * +AErr	
 * +AiErr	


!Semantic tags to help disambiguation & synt. analysis: (before POS)

 *  +Sem/Act          = Activity
 *  +Sem/Adr          = Webadr
 *  +Sem/Amount       = Amount
 *  +Sem/Ani          = Animate
 *  +Sem/Aniprod      = Animal Product
 *  +Sem/Body         = Bodypart
 *  +Sem/Body-abstr   = siellu, vuoig?a, jierbmi
 *  +Sem/Build        = Building
 *  +Sem/Build-part   = Part of Bulding, like the closet
 *  +Sem/Cat	       = Category
 *  +Sem/Clth         = Clothes
 *  +Sem/Clth-jewl    = Jewelery
 *  +Sem/Clth-part    = part of clothes, boallu, sávdnji...
 *  +Sem/Ctain        = Container
 *  +Sem/Ctain-abstr  = Abstract container like bank account
 *  +Sem/Ctain-clth   = Soft container, like a rucksack
 *  +Sem/Ctain-Obj   = Soft container, like a rucksack
 *  +Sem/Curr         = Currency like dollár, Not Money
 *  +Sem/Date         = Date
 *  +Sem/Dance        = Dance
 *  +Sem/Dir	       = Direction like GPS-kursa
 *  +Sem/Domain       = Domain like politics, reindeerherding (a system of actions)
 *  +Sem/Drink        = Drink
 *  +Sem/Dummytag     = Dummytag
 *  +Sem/Edu          = Educational event
 *  +Sem/Event        = Event
 *  +Sem/Feat		   = Feature, like Árvu
 *  +Sem/Feat-phys    = Physiological feature, ivdni, fárda
 *  +Sem/Feat-psych   = Psychological feauture
 *  +Sem/Feat-measr   = Psychological feauture
 *  +Sem/Fem          = Female name
 *  +Sem/Food         = Food
 *  +Sem/Food-med     = Medicine
 *  +Sem/Fruit        = Fruits, vegetables, seeds, nuts
 *  +Sem/Furn         = Furniture
 *  +Sem/Game         = Game
 *  +Sem/Geom         = Geometrical object
 *  +Sem/Group        = Animal or Human Group
 *  +Sem/Hum          = Human
 *  +Sem/Hum-abstr    = Human abstract
 *  +Sem/Hum-prof     = Human professional
 *  +Sem/Ideol        = Ideology
 *  +Sem/Lang         = Language
 *  +Sem/Mal          = Male name
 *  +Sem/Mat          = Material for producing things
 *  +Sem/Measr        = Measure
 *  +Sem/Money        = Has to do with money, like wages, not Curr(ency)
 *  +Sem/Obj          = Object
 *  +Sem/Obj-clo      = Cloth
 *  +Sem/Obj-cogn     = Cloth
 *  +Sem/Obj-el       = (Electrical) machine or apparatus
 *  +Sem/Obj-ling     = Object with something written on it
 *  +Sem/Obj-rope     = flexible ropelike object
 *  +Sem/Obj-surfc    = Surface object
 *  +Sem/Org          = Organisation
 *  +Sem/Part         = Feature, oassi, bealli
 *  +Sem/Perc-cogn    = Cloth
 *  +Sem/Perc-emo     = Emotional perception
 *  +Sem/Perc-phys	   = Physical perception
 *  +Sem/Perc-psych   = Psychological perception
 * +Sem/Phonenr = Telephone number
 *  +Sem/Plant        = Plant
 *  +Sem/Plant-part   = Plant part
 *  +Sem/Plc          = Place
 *  +Sem/Plc-abstr    = Abstract place
 *  +Sem/Plc-elevate  = Place
 *  +Sem/Plc-line     = Place
 *  +Sem/Plc-water    = Place
 *  +Sem/Pos          = Position (as in social position job)
 *  +Sem/Process      = Process
 *  +Sem/Prod         = Product
 *  +Sem/Prod-audio   = Audio product
 *  +Sem/Prod-cogn	   = Cognition product
 *  +Sem/Prod-ling	   = Linguistic product
 *  +Sem/Prod-vis	   = Visual product
 *  +Sem/Rel          = Relation
 *  +Sem/Route        = Route
 *  +Sem/Rule         = Rule or convention
 *  +Sem/Semcon       = Semantic concept
 *  +Sem/Sign         = Sign (e.g. numbers, punctuation)
 *  +Sem/Sport        = Sport
 *  +Sem/State        =
 *  +Sem/State-sick   = Illness
 *  +Sem/Substnc      = Substance, like Air and Water
 *  +Sem/Sur          = Surname
 *  +Sem/Symbol       = Symbol
 *  +Sem/Time         = Time
 *  +Sem/Time-clock  = Time clock
 *  +Sem/Tool         = Prototypical tool for repairing things
 *  +Sem/Tool-catch   = Tool used for catching (e.g. fish)
 *  +Sem/Tool-clean   = Tool used for cleaning
 *  +Sem/Tool-it	   = Tool used in IT
 *  +Sem/Tool-measr   = Tool used for measuring
 *  +Sem/Tool-music   = Music instrument
 *  +Sem/Tool-write   = Writing tool
 *  +Sem/Txt          = Text (girji, lávlla...)
 *  +Sem/Veh          = Vehicle
 *  +Sem/Wpn          = Weapon
 *  +Sem/Wthr         = The Weather or the state of ground
 *  +Sem/Year         - year (i.e. 1000 - 2999), used only for numerals


!Multiple Semantic tags:
 * +Sem/Act_Fruit                      
 * +Sem/Act_Group					 Activity and Group
 * +Sem/Act_Plc					   A persons job is an activity, and a place as well
 * +Sem/Act_Route					 Activity and Route, ie johtolat
 * +Sem/Act_Tool-it
 * +Sem/Amount_Build				   Amount and Building
 * +Sem/Amount_Semcon				
 * +Sem/Ani_Body-abstr_Hum			
 * +Sem/Ani_Build					
 * +Sem/Ani_Build-part				
 * +Sem/Ani_Build_Hum_Txt			
 * +Sem/Ani-fish					
 * +Sem/Ani_Group					
 * +Sem/Ani_Group_Hum				
 * +Sem/Ani_Group_Prod-vis
 * +Sem/Ani_Hum					
 * +Sem/Ani_Hum_Plc				
 * +Sem/Ani_Hum_Time				
 * +Sem/Ani_Plc					
 * +Sem/Ani_Plc_Txt				
 * +Sem/Ani_Time					
 * +Sem/Ani_Veh					
 * +Sem/Aniprod_Hum				
 * +Sem/Aniprod_Obj-clo			
 * +Sem/Aniprod_Perc-phys			
 * +Sem/Aniprod_Plc				
 * +Sem/Aniprod_Plc_Route				
 * +Sem/Body-abstr_Feat-cogn
 * +Sem/Body-abstr_Prod-audio_Semcon
 * +Sem/Body_Body-abstr			
 * +Sem/Body_Clth					
 * +Sem/Body_Food					
 * +Sem/Body_Group_Hum				
 * +Sem/Body_Group_Hum_Time
 * +Sem/Body_Hum					
 * +Sem/Body_Mat					
 * +Sem/Body_Measr					
 * +Sem/Body_Obj_Tool-catch		
 * +Sem/Body_Plc					
 * +Sem/Body_Plc-elevate			
 * +Sem/Body_Time					
 * +Sem/Build-part_Plc				
 * +Sem/Build_Build-part			
 * +Sem/Build_Clth-part			
 * +Sem/Build_Edu_Org				
 * +Sem/Build_Event_Org
 * +Sem/Build_Obj 			
 * +Sem/Build_Org					
 * +Sem/Build_Route				
 * +Sem/Build-part_Cat                 
 * +Sem/Build-part_Cat_Ctain           
 * +Sem/Build-part_Cat_Ctain_Mat       
 * +Sem/Build-part_Ctain               
 * +Sem/Build-part_Ctain_Mat           
 * +Sem/Build-part_Ctain_Obj           
 * +Sem/Cat_Group_Hum	                
 * +Sem/Cat_Group_Hum_Plc	            
 * +Sem/Clth-jewl_Curr				
 * +Sem/Clth-jewl_Curr_Obj
 * +Sem/Clth-jewl_Curr_Obj_Org
 * +Sem/Clth-jewl_Fruit
 * +Sem/Clth-jewl_Money			
 * +Sem/Clth-jewl_Plant			
 * +Sem/Clth_Hum					
 * +Sem/Clth_Obj-clo
 * +Sem/Ctain-abstr_Org			
 * +Sem/Ctain-clth_Plant			
 * +Sem/Ctain-clth_Veh				
 * +Sem/Ctain_Feat-phys			
 * +Sem/Ctain_Furn					
 * +Sem/Ctain_Plc					
 * +Sem/Ctain_Tool					
 * +Sem/Ctain_Tool-measr			
 * +Sem/Curr_Org					
 * +Sem/Dance_Org					
 * +Sem/Dance_Prod-audio			
 * +Sem/Domain_Food-med			
 * +Sem/Domain_Hum                    
 * +Sem/Domain_Prod-audio			
 * +Sem/Drink_Plant                    
 * +Sem/Edu_Event					
 * +Sem/Edu_Geom                      
 * +Sem/Edu_Group_Hum				
 * +Sem/Edu_Hum                       
 * +Sem/Edu_Mat					
 * +Sem/Edu_Org					
 * +Sem/Event_Food					
 * +Sem/Event_Hum					
 * +Sem/Event_Plc					
 * +Sem/Event_Plc-elevate
 * +Sem/Event_Time					
 * +Sem/Feat-measr_Plc 			
 * +Sem/Feat-phys_Tool-write		
 * +Sem/Feat-phys_Veh				
 * +Sem/Feat-phys_Wthr				
 * +Sem/Feat-psych_Hum				
 * +Sem/Feat-psych_Plc
 * +Sem/Food_Obj-surfc
 * +Sem/Feat_Plant					
 * +Sem/Food_Perc-phys				
 * +Sem/Food_Plant					
 * +Sem/Food_Sign					
 * +Sem/Fruit_Hum                      
 * +Sem/Game_Obj-play				
 * +Sem/Geom_Hum_Plc
 * +Sem/Geom_Obj					
 * +Sem/Group_Hum					
 * +Sem/Group_Hum_Org				
 * +Sem/Group_Hum_Plc				
 * +Sem/Group_Hum_Prod-vis			
 * +Sem/Group_Org					
 * +Sem/Group_Prod-vis                 
 * +Sem/Group_Sign					
 * +Sem/Group_Txt					
 * +Sem/Hum_Lang					
 * +Sem/Hum_Lang_Plc				
 * +Sem/Hum_Lang_Time				
 * +Sem/Hum_Mat_Tool
 * +Sem/Hum_Obj					
 * +Sem/Hum_Org					
 * +Sem/Hum_Sign
 * +Sem/Hum_Plant					
 * +Sem/Hum_Plc					
 * +Sem/Hum_Tool					
 * +Sem/Hum_Tool-it                     = Human
 * +Sem/Hum_Veh					
 * +Sem/Hum_Wthr					
 * +Sem/Lang_Tool					
 * +Sem/Mat_Plant					
 * +Sem/Mat_Txt					
 * +Sem/Measr_Obj_Time	                    
 * +Sem/Measr_Sign	                     = Sign (e.g. numbers, punctuation)
 * +Sem/Measr_Time					
 * +Sem/Money_Obj					
 * +Sem/Money_Org					
 * +Sem/Money_Part
 * +Sem/Money_Txt					
 * +Sem/Obj-play					
 * +Sem/Obj-play_Sport				
 * +Sem/Obj_Semcon					
 * +Sem/Obj_Sign
 * +Sem/Obj_Veh					
 * +Sem/Clth-jewl_Org				
 * +Sem/Obj_Symbol
 * +Sem/Org_Rule					
 * +Sem/Org_Txt					
 * +Sem/Org_Veh					
 * +Sem/Part_Prod-cogn				
 * +Sem/Part_Substnc				
 * +Sem/Perc-emo_Wthr				
 * +Sem/Plant_Plant-part			
 * +Sem/Plant_Tool					
 * +Sem/Plant_Tool-measr			
 * +Sem/Plc-abstr_Rel_State		
 * +Sem/Plc-abstr_Route			
 * +Sem/Plc_Pos					
 * +Sem/Plc_Route					
 * +Sem/Plc_Semcon					
 * +Sem/Plc_State					
 * +Sem/Plc_Substnc				
 * +Sem/Plc_Substnc_Wthr			
 * +Sem/Plc_Time					
 * +Sem/Plc_Tool-catch				
 * +Sem/Plc_Txt				
 * +Sem/Plc_Wthr					
 * +Sem/Prod-audio_Txt				
 * +Sem/Prod-cogn_Txt				
 * +Sem/Semcon_Txt					
 * +Sem/Obj_State					
 * +Sem/Substnc_Wthr				
 * +Sem/Plc_Time_Wthr
 * +Sem/Time_Wthr					
 * +Sem/State-sick_Substnc
 * +Sem/Obj-ling_Obj-surfc             
 * +Sem/Org_Prod-audio
 * +Sem/Org_Prod-cogn
 * +Sem/Org_Prod-vis


 * +Allegro			 from LEXICON GOADE-IU-


{{{


}}}


All non-positional derivations should be preceded by this tag, to make it possible
to target regular expressions at all derivations in a language-independent way:
just specify +Der|+Der1 .. +Der5 and you are set.

 * +Der

!Other/unclassified derivations, can appear in all positions:

 * +Der/veara  NA#
 * +Der/viđá  NA#
 * +Der/viđi  NA#
 * +Der/has  only one in the code


!Miscellanious list

 * +Der/A		 Adjective derivated from Noun or Verb
 * +Der/Adv	 Adverb derivated from Adjective

!!Tags for originating language

The following tags are used to guide conversion to IPA: loan words
and foreign names are usually pronounced (approximately) as in the
originating (majority) language. Instead of trying to identify the
correct pronounciation based on fonotactics (orthotactics actually),
we tag all words that can't be correctly transcribed using the SME
transcriber with source language codes. Once tagged, it is possible
to split the lexical transducer in smaller ones according to langu-
age, and apply different IPA conversion to each of them.
The principle of tagging is that we only tag to the extent needed,
and following a priority:
# any untagged word is pronounced with SME orthographic conventions
# NNO and NOB have identical pronounciation, NNO is only used if
  different in spelling from NOB
# SWE has mostly the same pronounciation as NOB, and is only used
  if different in spelling from NOB
# Occasionally even SME (the default) may be tagged, to block other
  languages from being specified, mainly during semi-automatic
  language tagging sessions
All in all, we want to get as much correctly transcribed to IPA
with as little work as possible. On the other hand, if more words
are tagged than strictly needed, this should pose no problem as
long as the IPA conversion is correct - at least some words will
get the same pronounciation whether read as SME or NOB/NNO/SWE.

 * +OLang/SME - North Sámi
 * +OLang/SMJ - Lule Sámi
 * +OLang/SMA - South Sámi
 * +OLang/FIN - Finnish
 * +OLang/SWE - Swedish
 * +OLang/NOB - Norw. bokmål
 * +OLang/NNO - Norw. nynorsk
 * +OLang/ENG - English
 * +OLang/RUS - Russian
 * +OLang/UND - Undefined


!Triggers for morphophonological rules

 * X1  Diphthong Simplification, Metaphony
 * X2  Diphthong Simplification, Metaphony, Word Final Neutralization of g8, h8, m8
 * X3  Diphthong Simplification, Metaphony
 * X4  WeG, Vowel Shortening, Stem vowel alternations, Word Final Deletion of n8 m8 g8 h8
 * X5  WeG, Diphthong Simplification, Stem vowel alternations
 * X6  WeG, Diphthong Simplification, Metaphony, Word Final Deletion of n8 m8 g8 h8
 * X7  Vowel Shortening, Stem vowel alternations, Word Final Neutralization of g8, h8, m8
 * X8  WeG, Vowel Shortening, Metaphony, Stem Vowel alternations, Word Final Deletion of n8 m8 g8 h8
 * X9  WeG, Dipthtong simplification, Word Final Deletion of n8 m8 g8 h8
 * Y1  Lengthening of Central Consonants, Stem Vowel alternations,
 * Y2  Lengthening of Central Consonants, Stem Vowel alternations,
 * Y3  Lengthening of Central Consonants, Stem Vowel alternations,
 * Y4  Lengthening of Central Consonants, Stem Vowel alternations,
 * Y5  Lengthening of Central Consonants, Word Final Consonant Deletion, Diphthong Simplification, Stem vowel alternations
 * Y6  Lengthening of Central Consonants, Word Final Consonant Deletion, Diphthong Simplification, Stem vowel alternations
 * Y7  Lengthening of Central Consonants, Diphthong Simplification, Stem vowel alternations
 * Y8  Not in use
 * Y9  Lengthening of Central Consonants, Diphthong Simplification
 * Q1  Stem vowel alternations,
 * Q2  Diphthong Simplification, Stem vowel alternations,
 * Q3  Diphthong Simplification, Stem vowel alternations,
 * Q4  WeG, Stem vowel alternations,
 * Q5  WeG, Diphthong Simplification, Stem vowel alternations,
 * Q6  WeG, Vowel shortening,
 * Q7  WeG, Diphthong Simplification, Metaphony,
 * Q8  WeG, Diphthong Simplification, Stem vowel alternations,
 * Q9  Not in use
 * W1  WeG, Vowel Shortening
 * W2  Vowel Shortening,
 * W3  Stem vowel deletion in compounding,
 * W4  WeG, Word Final Cluster Simplification, Optional vowel-shortening, Word Final Deletion of n8 m8 g8 h8
 * W5  WeG, Diphthong Simplification, Stem vowel alternations
 * W6  Stem vowel alternations, WeG,
 * W7  Stem vowel alternations, WeG
 * W8  Stem vowel alternations,
 * W9  Not in use
 * %^DISIMP  diphthong simpification

!Morphophonemes and Sámi letters

 * b9  twol rule override, so that b doesn't turn into t infront of hash
 * e7  shortened i = "e with dot below" from the dictionary
 * e9  twol rule override, so that e doesn't turn into i infront of j
 * d9  twol rule override, so that d doesn't turn into t infront of hash
 * g8  Word Final Neutralization and Deletion
 * g9  twol rule override, so that g doesn't turn into t infront of hash
 * h7
 * h8  Word Final Neutralization and Deletion
 * h9  twol rule override, so that h doesn't turn into t infront of hash
 * i7  twol rule override, so that i doesn't turn into e in certain contextes
 * j9  twol rule override, so that j doesn't turn into i after i
 * k9  twol rule override, so that k doesn't turn into t infront of hash
 * m8  Word Final Neutralization and Deletion
 * m9  twol rule override, so that m doesn't turn into n infront of hash
 * n8  Word Final Neutralization and Deletion
 * n9  twol rule override,
 * o7  shortened u = "o with dot below" from the dictionary
 * o9  twol rule override,  so that o doesn't turn into u infront of j
 * p9  twol rule override, so that p doesn't turn into t infront of hash
 * s9  twol rule override, so that we can have two ss in front of hash
 * t9  twol rule override, so that we can have st in front of hash
 * u7
 * z9  twol rule override, to avoid Word Final Consonant Neutralization
 * ž9  twol rule override, to avoid Word Final Consonant Neutralization
 * '7  is the real apostroph, as opposed to the consonant gradation mark
 * š9  twol rule override, so that we can have two šš in front of hash
 * r9
 * æ7  in smi, for lulesámi
 * u6  twol rule override, so that u doesn't turn into o in certain contextes
 * æ9  in smi, for lulesámi


 😱 - a symbol used in front of {{#}} to block backtracking and
          mwe reanalysis in hfst-tokenise (e.g. in dynanic compounds).
          Makes it possible to distinguish lexical and dynamic compounds
          in rules. It is converted to zero together with {{#}}.

!Symbols that need to be escaped on the lower side (towards twolc):

* »
* «
* > (escaped with square brackets, to avoid collision with > as morpheme boundary)
* < (escaped with square brackets, to avoid collision with < as morpheme boundary)


!!Flag diacritics
We have manually optimised the structure of our lexicon using following
flag diacritics to restrict morhpological combinatorics - only allow compounds
with verbs if the verb is further derived into a noun again:
 | @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised
 | @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised
 | @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised

 | @P.Pmatch.Loc@ | Used on multi-token analyses; tell hfst-tokenise/pmatch where in the form/analysis the token should be split.
 | @P.Pmatch.Backtrack@ | Used on single-token analyses; tell hfst-tokenise/pmatch to backtrack by reanalysing the substrings before and after this point in the form (to find combinations of shorter analyses that would otherwise be missed)

For languages that allow compounding, the following flag diacritics are needed
to control position-based compounding restrictions for nominals. Their use is
handled automatically if combined with +CmpN/xxx tags. If not used, they will
do no harm.
 | @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first
 | @D.CmpPref.TRUE@ | Block such words from entering ENDLEX
 | @P.CmpPref.FALSE@ | Block these words from making further compounds
 | @D.CmpLast.TRUE@ | Block such words from entering R
 | @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding
 | @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding
 | @U.CmpNone.TRUE@ | Combines with the two previous ones to block compounding
 | @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R
 | @D.CmpOnly.FALSE@ | Disallow words coming directly from root.
 | @D.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns
 | @U.CmpHyph.FALSE@ | Flag to control hyphenated compounds like proper nouns
 | @U.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns
 | @C.CmpHyph@ | Flag to control hyphenated compounds like proper nouns
Use the following flag diacritics to control downcasing of derived proper
nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use
these flags. There exists a ready-made regex that will do the actual down-casing
given the proper use of these flags.
 | @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj.
 | @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj.


 * @U.NeedsVowRed.OFF@ is used to force hyphenation/non-reduction: samediggi-
 * @U.NeedsVowRed.ON@ is used to force reduction w/o hyphen: samedigge#xxx
 * @C.NeedsVowRed@ Clearing this feature, so that it doesn't interfere with further compounding

 * @P.Px.add@	
 * @R.Px.add@	
 * @P.Px.block@
 * @D.Px.block@

 * @R.SpellRlx.ON@ Flag used to tag spell-relax-analysed strings (and only those).
 * @D.SpellRlx.ON@ Flag used to tag spell-relax-analysed strings (and only those).
 * @C.SpellRlx@ Flag used to tag spell-relax-analysed strings (and only those).

 * @R.SpaceCmp.ON@ Flag to tag compounds written with a space
 * @D.SpaceCmp.ON@ Flag to tag compounds written with a space
 * @C.SpaceCmp@ Flag to tag compounds written with a space

!!Basic lexica, pointing to the other lexicon files


 * __LEXICON Root__ is the basic lexicon starting everything

 * __LEXICON ProperNoun   __

!!!Lexicon ENDLEX
And this is the ENDLEX of everything:
{{{
 @D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ ENDLEX2 ;
}}}
The {{@D.CmpOnly.FALSE@}} flag diacritic is ued to disallow words tagged
with +CmpNP/Only to end here.
The {{@D.NeedNoun.ON@}} flag diacritic is used to block illegal compounds.