!! Divvun & Giellatekno - open source grammars for Sámi and other languages ! Copyright © 2000-2010 The University of Tromsø & the Norwegian Sámi Parliament ! http://giellatekno.uit.no & http://divvun.no ! ! This program is free software; you can redistribute and/or modify ! this file under the terms of the GNU General Public License as published by ! the Free Software Foundation, either version 3 of the License, or ! (at your option) any later version. The GNU General Public License ! is found at http://www.gnu.org/licenses/gpl.html. It is ! also available in the file $GTHOME/LICENSE.txt. ! ! Other licensing options are available upon request, please contact ! giellatekno@hum.uit.no or divvun@samediggi.no ! ========================================================================== ! North Sámi morphological analyser ! ========================================================================== ! ! sme-lex.txt ! The file contains morphological affixes for the nouns, verbs and adjectives Multichar_Symbols ! Escaped chars % ! Tags for POS +N +A +Adv +V +Pron +CS +CC +Adp +Po +Pr +Interj +Pcle +Num ! Tags for sub-POS +Prop +Pers +Dem +Interr +Refl +Recipr +Rel +Indef +Coll ! Collective numerals +MWE ! Multi-word expressions treated as such in the preprocessor ! Usage tags +Err/Sub ! substandard, not in normative fst +Use/LexSub ! substandard, not in normative fst, no normative lemma +Use/Marg ! marginal (?) +Use/-Spell !Excluded in speller +Use/SpellNoSugg !recognized but not suggested in speller +Use/Circ ! circular paths (old ^C^) +Use/CircN ! circular paths for the numerals (old ^N^) +Use/NG ! not-generate, for ped generation isme-ped.fst +Use/NA ! not-analyse, for restricting analyses needed for ! MT generation not to pop up elsewhere +Use/NGminip ! Not for miniparadigm in VD dicts ! Dialect tags: +Dial/-KJ ! forms not in use in KJ (Kárásjohka) +Dial/-GG ! forms not in use in GG (Guovdageaidnu) +Dial/-GS ! forms not in use in GS (Gárasavvon) +South ! foreløpig lagt til Sg Loc -n, som er en sub-form !! !!Tags for indicating the orthography used +Orth/Strd !!≈ @CODE@ - Standard orthography +Orth/IPA !!≈ @CODE@ - IPA transcription !! !! The above should either be used in pairs, or not at all. That is, if a word !! doesn't need an IPA stem (because the word in all its inflection can be !! converted to IPA by the standard IPA conversion rules), then none of these !! tags should be used. !! !! On the other hand, if the word has a spelling that doesn't follow the !! orthographic rules, and thus needs an exceptional IPA stem to get it right, !! then the exceptional stem must be marked with the {{+Orth/IPA}}, and the !! regular orthography stem must be marked with the tag {{+Orth/Strd}}. This is !! so that we can exclude the one or the other from different fst's, but only !! when the oposite stem variant is present. !! !Multichars for marking start and end of IPA sequences %{%%} !!≈ * @CODE@ - ipa text to the right ! Normative/prescriptive compounding tags ! (to govern compound behaviour for the speller, ie what a compound SHOULD BE): ! The first part of the component may be .. +CmpN/Sg ! Sg +CmpN/SgN ! SgNominative +CmpN/SgG ! SgGenitive +CmpN/PlG ! PlGenitive ! This part of the component might be .. +CmpN/First ! first +CmpN/Last ! last +CmpN/None ! Can not take part in compounds +CmpN/Only ! Can only be part of a compound +CmpN/Pref ! prefix only ! The second part of the compound may take .. +CmpN/SgLeft ! Sg to the left +CmpN/SgNomLeft ! etc. +CmpN/SgGenLeft ! +CmpN/PlGenLeft ! ! This part of the compound may ... +CmpN/All ! all positions, _default_, this tag does not have to be written +CmpN/Def ! works along with Left compound-tagging +CmpN/DefSgGen ! works along with Left compound-tagging +CmpN/DefPlGen ! works along with Left compound-tagging ! Descriptive compounding tags: ! Tags for compound analysis - this is what a compound is: +AttrCmp +SgCmp +SgNomCmp +SgGenCmp +PlGenCmp ! Different case + number combinations +Cmp ! Dynamic compound - this tag should always be part of a dynamic compound. ! It is important for Apertium, and useful in other cases as well. +RCmpnd ! This is a split compound with the other part to the right: ! "Arbeids- og inkluderingsdepartementet" => Arbeids- = +RCmpnd +LCmpnd+ ! This is a split compound with the other part to the left +ShCmp !testing ShCmp +Hyph ! on compounds that have a hyphens +Use/NoHyph ! On compounds that SHOULD have had a hyphen, but doesn't +SHyph ! Tags compounds containing SOFT HYPHENS (U+00AD) !!!! === Tags for Inflection === ! Tags for Case and Number Inflection +Sg +Du +Pl ! Number +Ess +Nom +Gen +Acc +Ill +Loc +Com +Com/Sh ! Case ! Possessive tags +PxSg1 +PxSg2 +PxSg3 +PxDu1 +PxDu2 +PxDu3 +PxPl1 +PxPl2 +PxPl3 +Comp +Superl +Attr +Card +Ord +Ind +Prs +Prt +Pot +Cond +Imprt +Sg1 +Sg2 +Sg3 +Du1 +Du2 +Du3 +Pl1 +Pl2 +Pl3 +Inf +Ger +ConNeg +ConNegII +Neg +ImprtII +PrsPrc +PrfPrc +Sup +VGen +VAbess +Actio ! Other tags +ABBR +ACR +CLB +PUNCT +LEFT +RIGHT ^GUESSNOUNROOT +TV +IV ! Transitivity tags +Multi ! Multiword phrase tag +G3 ! Grade 3 for homonymies +G7 ! Grade 3, no CG +NomAg ! tidligere Actor +Guess ! for the name guesser +ComPxCPlCom +PxCPlComRecipr ! used in pronoun-sme-morph.txt ! Question and Focus particles: +Qst +Foc/naj +Foc/ge +Foc/gen +Foc/ges +Foc/gis +Foc/ba +Foc/be +Foc/hal +Foc/han +Foc/bat +Foc/son +Foc/bahal +Foc/behal +Foc/bahan +Foc/behan +Foc/bason +Foc/beson ! Tags distinguishing different versions of the same lemma (before POS) +v1 +v2 +v3 +v4 +v5 +v6 +v7 +v8 +v9 +v10 +v11 +v12 +v13 +v14 +v15 +v16 +v17 +v18 +v19 +v20 +v21 +v22 +v23 +v24 ! Note: These high +v... number are in use for one word only: ! doavttergrádakursa ! Semantic tags to help disambiguation & synt. analysis: (before POS) +Sem/Act !!= * @CODE@ = Activity +Sem/Amount !!= * @CODE@ = Amount +Sem/Ani !!= * @CODE@ = Animate +Sem/AniProd !!= * @CODE@ = Animal Product +Sem/Body !!= * @CODE@ = Bodypart +Sem/Body-abstr !!= * @CODE@ = siellu, vuoig?a, jierbmi +Sem/Build !!= * @CODE@ = Building +Sem/Build-part !!= * @CODE@ = Part of Bulding, like the closet +Sem/Cat !!= * @CODE@ = Category +Sem/Clth !!= * @CODE@ = Clothes +Sem/Clth-jewl !!= * @CODE@ = Jewelery +Sem/Clth-part !!= * @CODE@ = part of clothes, boallu, sávdnji... +Sem/Ctain !!= * @CODE@ = Container +Sem/Ctain-abstr !!= * @CODE@ = Abstract container like bank account +Sem/Ctain-clth !!= * @CODE@ = +Sem/Curr !!= * @CODE@ = Currency like dollár, Not Money +Sem/Dance !!= * @CODE@ = Dance +Sem/Dir !!= * @CODE@ = Direction like GPS-kursa +Sem/Domain !!= * @CODE@ = Domain like politics, reindeerherding (a system of actions) +Sem/Drink !!= * @CODE@ = Drink +Sem/Dummytag !!= * @CODE@ = Dummytag +Sem/Edu !!= * @CODE@ = Educational event +Sem/Event !!= * @CODE@ = Event +Sem/Feat !!= * @CODE@ = Feature, like Árvu +Sem/Feat-phys !!= * @CODE@ = Physiological feature, ivdni, fárda +Sem/Feat-psych !!= * @CODE@ = Psychological feauture +Sem/Fem !!= * @CODE@ = Female name +Sem/Food !!= * @CODE@ = Food +Sem/Food-med !!= * @CODE@ = Medicine +Sem/Furn !!= * @CODE@ = Furniture +Sem/Game !!= * @CODE@ = Game +Sem/Geom !!= * @CODE@ = Geometrical object +Sem/Group !!= * @CODE@ = Animal or Human Group +Sem/Hum !!= * @CODE@ = Human +Sem/Hum-abstr !!= * @CODE@ = Human abstract +Sem/Ideol !!= * @CODE@ = Ideology +Sem/Lang !!= * @CODE@ = Language +Sem/Mal !!= * @CODE@ = Male name +Sem/Mat !!= * @CODE@ = Material for producing things +Sem/Measr !!= * @CODE@ = Measure +Sem/Money !!= * @CODE@ = Has to do with money, like wages, not Curr(ency) +Sem/Obj !!= * @CODE@ = Object +Sem/Obj-clo !!= * @CODE@ = Cloth +Sem/Obj-el !!= * @CODE@ = (Electrical) machine or apparatus +Sem/Obj-ling !!= * @CODE@ = Object with something written on it +Sem/Obj-rope !!= * @CODE@ = flexible ropelike object +Sem/Obj-surfc !!= * @CODE@ = Surface object +Sem/Org !!= * @CODE@ = Organisation +Sem/Part !!= * @CODE@ = Feature, oassi, bealli +Sem/Perc-emo !!= * @CODE@ = Emotional perception +Sem/Perc-phys !!= * @CODE@ = Physical perception +Sem/Plant !!= * @CODE@ = Plant +Sem/Plant-part !!= * @CODE@ = Plant part +Sem/Plc !!= * @CODE@ = Place +Sem/Plc-abstr !!= * @CODE@ = Abstract place +Sem/Plc-elevate !!= * @CODE@ = Place +Sem/Plc-line !!= * @CODE@ = Place +Sem/Plc-water !!= * @CODE@ = Place +Sem/Pos !!= * @CODE@ = Position (as in social position job) +Sem/Process !!= * @CODE@ = Process +Sem/Prod !!= * @CODE@ = Product +Sem/Prod-audio !!= * @CODE@ = Audio product +Sem/Prod-cogn !!= * @CODE@ = Cognition product +Sem/Prod-ling !!= * @CODE@ = Linguistic product +Sem/Prod-vis !!= * @CODE@ = Visual product +Sem/Rel !!= * @CODE@ = Relation +Sem/Route !!= * @CODE@ = Route +Sem/Rule !!= * @CODE@ = Rule or convention +Sem/Semcon !!= * @CODE@ = Semantic concept +Sem/Sign !!= * @CODE@ = Sign (e.g. numbers, punctuation) +Sem/Sport !!= * @CODE@ = Sport +Sem/State !!= * @CODE@ = +Sem/State-sick !!= * @CODE@ = Illness +Sem/Substnc !!= * @CODE@ = Substance, like Air and Water +Sem/Sur !!= * @CODE@ = Surname +Sem/Symbol !!= * @CODE@ = Symbol +Sem/Time !!= * @CODE@ = Time +Sem/Tool !!= * @CODE@ = Prototypical tool for repairing things +Sem/Tool-catch !!= * @CODE@ = Tool used for catching (e.g. fish) +Sem/Tool-clean !!= * @CODE@ = Tool used for cleaning +Sem/Tool-it !!= * @CODE@ = Tool used in IT +Sem/Tool-measr !!= * @CODE@ = Tool used for measuring +Sem/Tool-music !!= * @CODE@ = Music instrument +Sem/Tool-write !!= * @CODE@ = Writing tool +Sem/Txt !!= * @CODE@ = Text (girji, lávlla...) +Sem/Veh !!= * @CODE@ = Vehicle +Sem/Wpn !!= * @CODE@ = Weapon +Sem/Wthr !!= * @CODE@ = The Weather or the state of ground ! Not sure which section this goes in: (before POS) +Allegro ! from LEXICON GOADE-IU- ! Tags for derivation: ! Old tags: ! +Der/adda +Der/ahtti +Der/alla +Der/asti +Der/easti +Der/aš +Der/d +Der/eamoš ! +Der/amoš +Der/eapmi +Der/geahtes +Der/gielat !better: +Der/NuA ! +Der/h +Der/heapmi +Der/hudda +Der/huhtti +Der/huvva +Der/halla +Der/j +Der/l ! +Der/laš +Der/las +Der/hat +Der/meahttun +Der/muš +Der/NomAct +Der/š +Der/st ! +Der/stuvva +Der/upmi +Der/supmi +Der/vuohta +Der/goahti ! +Der/lágan +Der/lágán +Der/lágaš +Der/jagáš ! +Dimin +Der/viđá +Der/viđi +Der/veara ! Old tags remaining to be checked for the new +Der123 tags: !+Der/eaddji ! XN = +Der/NomAg ! Explanation: ! Combinations 1, 2, 3, 12, 23, 13, 123 are ok, all other ones are blocked. ! The suffixes marked as +Der1+Der2 to the right cannot combine with Der2, they ! have already "saturated" their Der2-part. ! Phonotactically, Der1 are initial consonants C, Der2 are VCV, and ! Der2 are of a different kind, more like compounding. ! This whole Der123 business is to prevent back-derivation of ! boahtigoahtijuvvohallat and the like. ! Computationally, this is done as a filter composed on top of sme.save. ! Der#begin ! Derivation position in a derivation row: Affix and ! 1 2 3 4 POS type +Der1 +Der2 +Der3 +Der4 ! Der#1 +Der/t ! NN +Der/ár ! ACRO>N +Der/Dimin ! NN (was: Der/aš & Der/š) +Der/laš ! NA +Der/meahttun ! VA +Der/d ! VV +Der/h ! VV - -hit/Causative +Der/huhtti ! VV +Der/j ! VV +Der/l ! VV +Der/st ! VV +Der/las ! VA * +Der1+Der2 - can only combine with Der3 +Der/heapmi ! NA * +Der1+Der2 - can only combine with Der3 +Der/lágan ! AA * +Der1+Der2 - can only combine with Der3 +Der/halla ! VV * +Der1+Der2 - can only combine with Der3 +Der/huvva ! VV * +Der1+Der2 - can only combine with Der3 +Der/stuvva ! VV * +Der1+Der2 - can only combine with Der3 ! Der#2 +Der/NomAg +Der/NomAct ! VN Der/NomAct har to realisasjonar, med ulike restriksjonar, ! this is previous Der/eapmi +Der/adda ! VV +Der/ahtti ! VV +Der/alla ! VV +Der/asti ! VV +Der/at ! QA ! check this! +Der/easti ! VV +Der/gielat ! QA +Der/jagáš ! QA +Der/lágaš ! QA ! Der#3 +Der/PassL ! VV - long passive +Der/vuohta ! AN ! Der#4 +Der/goahti ! VV +Der/amoš ! VN +Der/eamoš ! VN +Der/geahtes ! VA +Der/muš ! VN +Der/supmi ! VN ! +Der/NomAct ! VN Der/NomAct har to realisasjonar, med ulike restriksjonar, ! this is previous Der/n. This realisation is Der4. ! Outcommented to not define the tag twice, but kept ! here for documentation purposes. +Der/upmi ! VN ! Der#other ! All non-positional derivations should be preceded by this tag, to make it possible ! to target regular expressions at all derivations in a language-independent way: ! just specify [+Der|+Der1 .. +Der5] and you are set. +Der ! Other/unclassified derivations, can appear in all positions: +Der/veara ! NA# +Der/viđá ! NA# !+Der/viđi ! NA# outcommented, since we have the noun vih0ti, gen:viđi +Der/PassS ! VV - short passive ! See lexicons NAMAT and SAS for these: +Der/agat +Der/ahkásaš +Der/asat +Der/beaivásaš +Der/bealat +Der/bealjat +Der/borat +Der/buddásaš +Der/bánat +Der/diibmosaš +Der/dábat +Der/dáfot +Der/dáhtot +Der/dásat +Der/dássásaš +Der/dávddat +Der/dávttat +Der/fárddat +Der/gaccat +Der/garat +Der/gearddat +Der/geardásaš +Der/geažat +Der/gieđat +Der/gieškkat +Der/gilggat +Der/girjjat +Der/guvllot +Der/heakkat +Der/hájat +Der/hámat +Der/ivnnat +Der/jagat +Der/jahkásaš +Der/jearggat +Der/jienat +Der/jierpmat +Der/joccat +Der/juolggat +Der/juvllat +Der/kilosaš +Der/kultuvrrat +Der/lahkat +Der/lahtot +Der/lanjat +Der/leakkat +Der/liikkat +Der/linjjat +Der/lunddot +Der/luohkkálaš +Der/luohkálaš +Der/luottat +Der/láiddat +Der/mannosaš +Der/mearkkat +Der/mielat +Der/mohkat +Der/muđot +Der/máhtat +Der/mállet +Der/mánnosaš +Der/namat +Der/nađat +Der/nierat +Der/njunat +Der/njálmmat +Der/nuolus +Der/náittot +Der/nálat +Der/oaivvat +Der/oasat +Der/olat +Der/orddat +Der/pláhtat +Der/rattat +Der/ravddat +Der/rávnnjat +Der/seagat +Der/seaibbat +Der/seainnat +Der/siessat +Der/siiddot +Der/soajat +Der/soarttat +Der/sogat +Der/sorttat +Der/stábat +Der/stávval +Der/suorat +Der/suorggat +Der/suorpmat +Der/suttat +Der/sánat +Der/sávnnjat +Der/uvssat +Der/uvssot +Der/vahkkosaš +Der/vahkosaš +Der/varat +Der/vigat +Der/viidosaš +Der/vuovttat +Der/vuđot +Der/váillat +Der/váimmot +Der/válddat +Der/váttot +Der/áigásaš +Der/áissat +Der/ávjjot +Der/čalmmat +Der/čeavžžat +Der/čiegahas +Der/čiegat +Der/čielggat +Der/čoalat +Der/čoarvvat +Der/čuolmmat +Der/čuvddat +Der/šlájat +Der/A +Der/Adv !! !!Tags for originating language !! !! The following tags are used to guide conversion to IPA: loan words !! and foreign names are usually pronounced (approximately) as in the !! originating (majority) language. Instead of trying to identify the !! correct pronounciation based on fonotactics (orthotactics actually), !! we tag all words that can't be correctly transcribed using the SME !! transcriber with source language codes. Once tagged, it is possible !! to split the lexical transducer in smaller ones according to langu- !! age, and apply different IPA conversion to each of them. !! !! The principle of tagging is that we only tag to the extent needed, !! and following a priority: !! # any untagged word is pronounced with SME orthographic conventions !! # NNO and NOB have identical pronounciation, NNO is only used if !! different in spelling from NOB !! # SWE has mostly the same pronounciation as NOB, and is only used !! if different in spelling from NOB !! # Occasionally even SME (the default) may be tagged, to block other !! languages from being specified, mainly during semi-automatic !! language tagging sessions !! !! All in all, we want to get as much correctly transcribed to IPA !! with as little work as possible. On the other hand, if more words !! are tagged than strictly needed, this should pose no problem as !! long as the IPA conversion is correct - at least some words will !! get the same pronounciation whether read as SME or NOB/NNO/SWE. !! +OLang/SME !!≈ * @CODE@ - North Sámi +OLang/FIN !!≈ * @CODE@ - Finnish +OLang/SWE !!≈ * @CODE@ - Swedish +OLang/NOB !!≈ * @CODE@ - Norw. bokmål +OLang/NNO !!≈ * @CODE@ - Norw. nynorsk +OLang/ENG !!≈ * @CODE@ - English +OLang/UND !!≈ * @CODE@ - Undefined ! Valency tags +% +% +% ! case +% +% ! infinitive +% +% +% +% ! adposition +% ! clause +% +% ! cases +% +% ! actio ! Triggers for morphophonological rules X1 X2 X3 X4 X5 X6 X7 X8 X9 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 W1 W2 W3 W4 W5 W6 W7 W8 W9 %^DISIMP ! Morphophonemes and Sámi letters b9 e7 e9 d9 g8 g9 h7 h8 h9 i7 j9 k9 m8 m9 n8 n9 o7 o9 p9 s9 t9 u7 z9 ž9 '7 š9 r9 æ7 u6 æ9 ! B9 E7 E9 D9 G8 G9 H7 H8 H9 I7 J9 K9 M8 M9 N8 N9 O7 O9 P9 ! S9 T9 U7 Z9 Ž9 Š9 R9 ! Symbols that need to be escaped on the lower side (towards twolc): »7 ! » «7 ! « %[%>%] ! > %[%<%] ! < ! æ7 for Valkeapæ7æ7, a sub form not to be triggered by Valkeapää ! Flag diacritics @P.NeedNoun.ON@ @D.NeedNoun.ON@ @C.NeedNoun@ ! (Dis)allowing compounds ! @P.Need3Part.ON@ @D.Need3Part.ON@ @C.Need3Part@ !3Part @U.Cap.Obl@ @U.Cap.Opt@ ! Allowing downcasing of ! derived names: deatnulaš. ! Actually, downcasing should be obligatory @U.NeedsVowRed.OFF@ ! This is used to force hyphenation/non-reduction: samediggi- @U.NeedsVowRed.ON@ ! This is used to force reduction w/o hyphen: samedigge#xxx @C.NeedsVowRed@ ! Clearing this feature, so that it doesn't interfere with further compounding ! Basic lexica, pointing to the other lexicon files ! ================================================= LEXICON Root +Use/Circ: Prefixes ; @U.Cap.Obl@ ProperNoun ; ! These flags are for @U.Cap.Opt@ ProperNoun ; ! downcasing the propernouns @U.Cap.Obl@ Prefix-Proper ; @U.Cap.Opt@ Prefix-Proper ; NounRoot ; Eahpe_Noun ; ! ProperNounFirstPart ; ! Merged with ProperNoun, but kept as group Adjective ; Eahpe_Adjective ; Verb ; Eahpe_Verb ; Copula ; Negativeverb ; Adverb ; Particles ; Subjunction ; Conjunction ; Adposition ; Interjection ; Pronoun ; ! +Use/CircN: Numeral ; ! Circular tag for numerals Numeral ; ! we need numerals for Apertium. Acronym ; Abbreviation ; Punctuation ; ! MiddleNouns ; ! gaskbeaivi, this is the "new" R, not Rreal NomActVEARA ; LEXICON ENDLEX @D.NeedNoun.ON@ # ; ! @D.Need3Part.ON@ # ; !3part ! This ENDLEX business in order to remove illegal compounds.