The Multichar_Symbols section contains all grammatical tags, and all multicharacter members of the alphabet (the latter set is taken from the grammar file).
The Root lexicon points to the lexica of the parts of speech.
NounRoot has the lexica BOAZU, FALIS, GADDI, GAHPIR, GISTTA, GOAHTI, JOHTOLAT, MALIS, SEAMU, STAHTA, VIVVA. The lexica represent the following inflectional types:
The sublexica, alphabetically ordered
BEANA = trisyllabic animated gradating 0-nouns
BOAZU = animate contracted 0-nouns
FALIS = contracted animate C-nouns.
GADDI = bisyllabic V-nouns with comparative forms
GAHPIR = trisyllabic, non-gradating C-nouns.
GISTTA =
GOAHTI = inanimate bisyllabic V-nouns.
IIJA = bisyllabic, non-gradating a-nouns, wit an a-illative.
JOHTOLAT =
MALIS = trisyllabic inanimate gradating C-nouns.
MATTAR = trisyllabic animate gradating C-nouns.
OLLUVUOHTA = exceptional vuohta-nouns.
SEAMU =
SUOLU = inanimate contracted 0-nouns.
STAHTA = bisyllabic, non-gradating a-nouns, with an a/i-illative.
VIVVA = animate bilsyllabic V-nouns.
The sublexica, ordered by inflectional type
VIVVA = bilsyllabic animate V-nouns
GOAHTI = bisyllabic inanimate V-nouns
GADDI = bisyllabic V-nouns with comparative forms
IIJA = bisyllabic, non-gradating a-nouns, with an a-illative
STAHTA = bisyllabic, non-gradating a-nouns, with an a/i-illative
BOAZU = contracted animate 0-nouns
SUOLU = contracted inanimate 0-nouns
FALIS = contracted animate C-nouns
BEANA = trisyllabic animated gradating 0-nouns
SEAMU = trisyllabic inanimate gradating 0-nouns
MATTAR = trisyllabic animate gradating C-nouns
MALIS = trisyllabic inanimate gradating C-nouns
GAHPIR = trisyllabic, non-gradating C-nouns
OLLUVUOHTA = exceptional vuohta-nouns
GISTTA = The Noun gistta, gist -
JOHTOLAT =
In the noun lexicon, the declension types are distributed as follows (21.10.01):
Bisyllabic V-nouns 1568 VIVVA animate 9048 GOAHTI inanimate 2 GADDI w/comparative forms a-nouns 0 IIJA non-gradating w/ a-ill 113 STAHTA non-gradating w/ a/i-ill Contracted 0-nouns 35 BOAZU animate 38 SUOLU inanimate C-nouns 188 FALIS animate Trisyllabic 0-nouns 49 BEANA animate gradating 423 SEAMU inanimate gradating C-noun 99 MATTAR animate gradating 1065 MALIS inanimate gradating 2208 GAHPIR non-gradating Miscellania 1749 JOHTOLAT 239 DIMINC diminuitives 94 LASIS 75 MUSH 40 MAGASH (all marked as "(pl.r.)") 27 EGEZHAGAT 4-syllabic hk:g 11 GARGIA loanwords, video, etc. 3 SATTU (inconsistently marked) 3 OANADUS (abbreviations, look into this group) 1 EANU eatnu
NIILLAS = trisyllabic, non-gradating C-proper names.
PIERA = bisyllabic a-proper names without gradation; a/i-illative.
HEANDARAT
In the lexicon file adj-sme-lex.txt, the sublexica are distributed in the following way (23.10.01):
359 BOAKKAS 353 JEAGOHEAPMI 269 BEAKKAN 146 GARAS 124 GAPPUS 114 LAIKI 110 AKTIIVA 106 NUORRA 31 JUHKKIS 26 EATTAS 22 GUOHCA 18 LODJI 13 GEARGGUS 13 DILDDAS 12 SEARRA 6 BIEKKUS 5 NUOLUS 5 NJUORAS 4 HEAHKAS 3 LIEKKUS 3 ASEHIS 1 GUOROS
VerbRoot contains 12 sublexica, each of the three stem types are represented by 4 verb lexica:
The last type is not expanded in the lexicon.
Bisyllabic verbs:
ARVI arvit sataa !Bisyllabic Impersonal Verbs
DIEHTI diehtit tietaa !Bisyllabic Verbs with Personal Passive
BOAHTI boahtit tulla !Bisyllabic Verbs without Personal Passive
CUOHCIT c1uohcit osua
Contracted verbs:
BORGE borget tehda pyry !Contracted Impersonal Verbs
DOHPPE dohppet tarttua !Contracted Verbs with Personal Passive
GILLE gillet viitsia !Contracted Verbs without Personal Passive
GEARRAA gearra1t
Trisyllabic verbs:
CUORPMAST c1uorpmastit sataa rakeita !Trisyllabic Contracted verbs
MUITAL muitalit !Trisyllabic Verbs with Personal Passive
ALIST alistit !Trisyllabic Verbs without Personal Passive
BORGGIST borggistit
The following table gives an overview:
even odd contracted ------------------------------------------------- impers ARVI CUORPMAST BORGE pers +ppass DIEHTI MUITAL DOHPPE pers -ppass BOAHTI ALIST GILLE even odd contracted ------------------------------------------------- impers RAIN MAKE-STORM HAIL pers +ppass KNOW TELL GRIP pers -ppass COME BE-BOTHERED
The stems are distributed numerically as follows (the -it class includes both even-syllable and odd-syllable verbs):
-at 2964 even-syll -it 924 -ut 826 total 4714 3-syllabic -it 5426 -a1t 301 -et 1091 -ot 209 total 1601
The with / without Personal Passive distinction shows up in one sublexicon. DOHPPE has PASSIVE, where GILLE has SG3PASS. So, this is (probably) a transivity difference, cf. also diehtit vs. boahtit. It seems thus that the difference is one of transitivity: 0, 1 and 2 valence.
At present, the file verb-sme-lex.txt comtains all the verbs. In the beginning of the file, all sublexica are exemplified. Then follows the bulk of the verbs, twosyllabic even, manysyllabic even, odd and contracted verbs. These verbs are all given the sublexica DIEHTI, MUITAL and DOHPPE, i.e., they are given the transitive sublexicon, the maximal paradigm.
TODO: Assign corrct transitivity/sublexicon marking to the bulk of the verbs. Also, the undefined sublexica should be investigated.
Pekka gives them the following comment to the dec 01 files:
Tiedosto sisältää verbiartikkelien hakusanat. Joissakin tapauksissa on myös vaihtoehtoja (x ~ y) ja variantin kohdalla on viite päähakusanaan (x gc1. y). Pituusviivalliset vokaalimerkit on korvattu x2-yhdistelmillä, pisteelliset x3-yhdistelmillä. Kolmannen kestoasteen merkkinä on ', jonka koodi DOS:ssa on 173. Joidenkin hakusanojen yhteydessä on suluissa tietoa rektiosta, esim. liikot (+ lok.); tämä tieto ei ole toistaiseksi systemaattista.
borggistit
c1uohcit
c1uorpmastit
borgistit
muitalit
gearra1t
The problem is that the respective sublexica are not defined. They should thus be defined.
011611
Go through the file "feilmelsingar..." in notatar. Several of the sublexica are simply just errouneously written. (SeAMU, etc.)
apply down> diehtit+V+Pass+Act dihttoiuvvon dihttojuvvon apply down> diehtit+V+Pass+Ger dihttoiuvvodettiin dihttojuvvodettiinHere is another, of the same kind:
apply down> boahtit+V+Pass+Inf bohttoit apply down> c1uorvut+V+Pass+Inf c1urvoit
It is not identical, though, since here the i should be deleted, not turned into j.
A further one of the same kind:
apply down> bidjat+V+Ind+Prs+Sg1 bidian bijan
AGADJECT contains deadjectival adjectives with derivational suffix -ag/-og.
Weak grade not rec. for ma1hli, duihmi, c1a1ihmi, -hl-, -hm-, -hn- also in weak grade.
The noun olmmos1 errouneously has strong grade in the nominative singular.
Unexpectedly, a ' symbol turns up in the underlying form:
apply up> gistta gistta+N+Sg+Nom gis'ta+N+Sg+Nom
This error is now fixed!
JOHTOLAT
has two Gen forms, one errouneous,and has -ai instead of -ii in the
Illative. The other case endings work fine.
LASIS
is not found in the lexicon list at all. TODO: Write a lexicon for LASIS
TODO: Check with the oroginal lexicon, to ensure that nothing crucial has been lost in the conversion process.
The MUITAL problem is known. Here is another one: The program analyses aviissa, but not aviisa. Instead, we get a correct analysis of avijsa. Note also the Gen/Nom distinction here, that probably contains the solution to the riddle.
apply up> a1jggi apply up> a1iggi a1igi+N+Sg+Acc a1igi+N+Sg+Gen apply up> a1jgi a1igi+N+Sg+NomThus, Gen/Acc are ok, but not Nom.
another one, this one is curious, as it treats the word as acompound, with "de" as the second component:
apply up> guvsside guksi+N+Sg+Gen#de+N+Sg+Acc guksi+N+Sg+Gen#de+N+Sg+Gen guksi+N+Sg+Gen#de+N+Sg+Nom apply up>Words missing?
kursa
eiseva1lddit
Strange word:
apply up> olbmos1 olbmos1+Sg+Acc olbmos1+Sg+Gen olbmos1+Sg+Nom olmmos1+N+Sg+Nom olmmos1+N+s1+Sg+Acc olmmos1+N+s1+Sg+Gen olmmos1+N+s1+Sg+NomThere is a word "de" in the lexicon (a noun): There thus is an errer somewhere.
Missing from olbmos1:
olbmos1 - olmmoz1in. All from olmma1i work.
Many adjecxtives are missing, appr. half the ones listed by Nickel.
duhtavas1, lihkolas1 asehas1, oanehas1, vuollegas1, boaris, ra1hkis, ja1llu uhcci, unna, bastil, ruoksat, alit, allat, gassat, govdat, lossat.
Correct: apply down> giella+N+Pl+Com+PxSg3 gielaidisguin apply down> giella+N+Pl+Com+PxPl3 gielaideasetguin Errouneous: apply down> beana+N+Pl+Com+PxPl3 beatnagiiddiset apply down> beana+N+Pl+Com+PxSg3 beatnagiiddisAlso the contracted words luomi and gahpir behaved the same way as beana. It thus seems this is an error for all contracted nouns.
TODO: Go through the Px paradigm, and see if beana shows errors in other parts of the paradigm, and if there are other words that have problems in the Comitative Plural paradigm.
PrfPtc + N is not recognised as a compound. "mujtalanvejolas2vuod1at" (check for va1ikkuhanvejolas1vuod1at".