!!!Background The file itself is located in {{langs/smj/src/phonology/smj-phon.twolc}}. The file is modeled upon the corresponding file for North Sámi, but has been revised and differs from it on several issues. The grammatical sources are Spiik 1989: Lulesamisk grammatik and Nystø and Johnsen 2001: Sámásta 2. !!!File structure The rule file has the sections Alphabet, Sets, Definition and Rules. The rules are ordered thematically, with 3 main sections: Consonant alternations (except cg), vowel alternations, and consonant gradation. !!!The Alphabet section !!The real Lule Sámi Alphabet All Lule Saami letters are listed. The Lule Sámi ENG sound is represented as ñ. Lule Sámi letter repertoire is not fully standardised. In the source code we write (and you shall write!) æ; ø; ŋ, but the parser tolerates input written with the the letters ä; ö; ń, ñ (cf. the 4 rules in the file smj/src/orthography/spellrelax.regex). The 3rd degree mark ' is never realized, hence declared as ':0. ':0 = Gradation mark %/ = Literal /, not the TWOLC reserved symbol '7:' = From sme. h2, g2 etc. are consonants deleted in the Nom. m3, d3 etc. (?) are consonants that undergo certain processes word-finally. This issue should be looked into. Perhaps the two sets can be unified. The reason why there are more distinctions than for sme, is that the cns deletion process is more phonological in sme. !!The Dummy symbols The Dummy symbols are taken from the sme file for convenience, only a small part of them are actually used, they are defined in the Sets section along the way, included there as soon as they are used. The set of actually used Dummy symbols is thus the set declared in "Dummy". The Dummy symbols trigger morphophonological rules. X is used for nouns and adjectives, Y for verbs and Q for processes common to all The symbols themselves are used in the following way: ; __X1:0__ : Deletes final consonants in short essive of odd syllables ; __X2:0__ : WeG and neutralization of g8, etc. (hivsik-hivsiga) ; __X3:0__ : Weg and deletion of g8, etc. (bena-bednaga) ; __X4:0__ : e:á and e:å in illatives and px. a:á and o:u in Px and ill of a-stem actors and o-stems ; __X5:0__ : e:á, e:å and o:u in odd-syllable nouns, but also for some even nouns (o:u f.eks) ; __X6:0__ : Deviant III-I consonant gradation (in contracted stems, guobbmu:guomoj) ; __X7:0__ : WeG and e:á, e:å, o:á, o:u in front of diminutives, e:å in -lasj der ; __X8:0__ : Stem vowel alternations in Px ; __X9:0__ : Stem-vowel and central consonant shortening in first part(s) of compounds ; __Q1:0__ : The general weak grade trigger. Stem vowel change e:i and o:u in fron of j. Dipht. simpl. Any environment #only# demanding WeG shall use Q1. ; __Q2:0__ : Vowel harmony: 2nd syll e realized as å whenever 1st syll is å. ; __Q3:0__ : WeG in contracted, also does not trigger Dipht simpl. ; __Q4:0__ : Stem vowel change e:i and o:u in fron of j. Dipht. simpl. Like Q1 but strong grade. ; __Q5:0__ : e:á stem vowel change for word diehtet. Weak grade. ; __Q6:0__ : e:á stem wovel change for word diehtet. Strong grade. ; __Q7:0__ : e:á stem vowel change for word diehtte. Extra strong grade ; __Q8:0__ : Stem vowel deletion, impII of verbs. ; __Q9:0__ : TBW ; __Y1:0__ : Stem vowel deletion, imp 3sg, 3du, 2pl, 3pl of verbs ; __Y2:0__ : "Indicative Present Singular 3rd Final Vowel in verbs" ; __Y3:0__ : PrsPrc ; __Y4:0__ : e > u in front of dersuff, o > u and e > á in front of dersuffix -alla ; __Y5:0__ : e > a, i > á, o > u, e > å in verb derivation ; __Y6:0__ : "Consonant insertion as II-III strengthening gradation", verbs +PrsPrt and +Imprt+Du2 ; __Y7:0__ : "Consonant insertion as II-III strengthening gradation", nouns and propernouns ; __Y8:0__ : "Stem vowel deletion in even-syllable verbs, imp 1du, 1pl" ; __Y9:0__ : "Stem vowel deletion in short passives of even-syllable verbs ; __Z1:0__ : TBW !!Morpheme boundaries: ; __ « __ : Derivational prefix ; __ » __ : Derivational suffix ; __ %< __ : Inflectional prefx ; __ %> __ : Inflectional suffix ; __ # __ : Word boundary for both lexicalised and dynamic compounds ; __ %^ __ : (exceptional) soft hyphenation point ; __ % __ : a space !!!The Sets section These are the sets: ;Vow : the vowels ;Cns : the consonants ;StemCns : consonants that may occur in stem-final position ;DelCns : the consonants that are deleted in nominative ;Dummy : the set of dummy symbols, they are there to trigger certain morphophonological symbols ;WeG : the dummy symbols that trigger weak grade !!!The Definitions section In this section, the consonants are defined. This includes consonant clusters in the various grades and consonant alternations. !!G3 vs G2 The alternation patterns according to Spiik's alternations series. | S7 | kkn:k0n | series 1 | S8 | f'f:f0f | series 2 | S9 | jgg:j0g | series 3 | S4 | hkk:h0k | series 4 | S5 | xy:zy (no zeros) | series 5 | S6 | xx:yy (no zeros) | series 6 | S7 | xy:zy (no zeros) | series 7 | S8 | ----- (no cg) | series 8 ; LowerG2 : A definition of Grade2 consonant sequences referring mostly to the surface level ; LowerG1 : A definition of Grade 1 consonant sequences ; LowerG12 : A definition of Grade 1 or 2 consonant sequences ; G32 : A definition of Grade 3 or 2 consonant sequences ; G3 : A definition of Grade 3 consonant sequences !!!The Rules section !!Overview The rules section has the following chapters: Consonant alternations in certain pos, vowel lengthening, diphthong simplification, stem vowel alternations, consonant gradation rules !!Consonant alternations in certain pos All rules deal with word-final position. * __*a __ (is not standard language) * __*b __ (is not standard language) __Word Final Devoicing of Certain Single Consonants d9 etc. __ * __iemed9# __ * __iemet# __ __Word final weakening -tj and -ttj to -sj part 1__ __Word final weakening -tj and -ttj to -sj part 2__ * __jågåtj# __ * __jågåsj# __ * __gål'leX7tj# __ * __gål0lå0sj# __ __Word Final Deletion of n8 m8 g8 h8__ * __loavddag8X3# __ * __l0åv0da00# __ __Word Final Neutralization of g8, h8, m8__ __Deleting Final h9 in Short Essive of Uneven Syllables__ __Deleting Final l9 in Short Essive of Uneven Syllables__ __Deleting Final m9 in Short Essive of Uneven Syllables__ __Deleting Final n9 in Short Essive of Uneven Syllables__ __Deleting Final r9 in Short Essive of Uneven Syllables__ * __málest# __ * __máles0# __ !!Vowel lengthening The second syllable vowel a is lengthened to á whenever the stem consonants are in grade 1 and the first syllable vowel is short. Short vowels cannot preceed and follow a single intervocalic consonant. __Compulsatory lengthening in grade I even-syllables__ * __gussaQ1# __ * __gu0sá0# __ * __skihpaQ1s# __ * __ski0bá0s# __ !!Diphtong simplification The diphthong simplification handles oa:å and æ:e. Phonologically, these are identical processes, but since the dipthong is written by two letters in the former case and by one letter in the latter, the alternations must be handled separately. This section also handles ie:æ, these are in principle the same as oa:å, but the alternation does not occur in so many contexts. __oa:å Diphtong Simplification Part I __ __oa:å Diphtong Simplification Part II__ * __toahkkeY6X5jn __ * __toahkki00jn __ * __*toahkkeY6X5jn __ (is not standard language) * __*t0åhkki00jn __ (is not standard language) * __boalloX4j __ * __b0ållu0j __ * __roavggoX4j __ * __roavggu0j __ * __*roavggoX4j __ (is not standard language) * __*r0åvggu0j __ (is not standard language) * __toas'soQ1X5jn __ * __t0ås0su00jn __ * __*toas'soQ1X5jn __ (is not standard language) * __*toas0su00jn __ (is not standard language) * __*moas'soX5jn __ (is not standard language) * __*m0ås0su0jn __ (is not standard language) * __moas'soX5jn __ * __moas0su0jn __ * __goar'roY6X5jn __ * __goar0ru00jn __ * __goarroY6X5jn __ * __goarru00jn __ * __*goar'roY6X5jn __ (is not standard language) * __*g0år0ru00jn __ (is not standard language) * __*goar'roY2 __ (is not standard language) * __*g0år0ru0 __ (is not standard language) * __goarroY2 __ * __g0årru0 __ * __doad0jeY6 __ * __doaddje0 __ * __*doad0jeY6 __ (is not standard language) * __*d0åddje0 __ (is not standard language) * __goar'roY5d9it __ * __g0år0ru0dit __ * __*goar'roY5d9it __ (is not standard language) * __*goar0ru0dit __ (is not standard language) * __toab0moY6X4j __ * __toabbmu00j __ * __toabmoX4j __ * __t0åbmu0j __ * __*toa0mboY6X4j __ (is not standard language) * __*t0åbbmu00j __ (is not standard language) * __toabmoX7dallat __ * __t0å0mu0dallat __ * __*toabmoX7dallat __ (is not standard language) * __*toa0mu0dallat __ (is not standard language) * __oaddoY6X4j __ * __oaddu00j __ * __boassjkoQ1X5jn __ * __b0å0sjku00jn __ * __*boassjkoQ1X5jn __ (is not standard language) * __*boas0jku00jn __ (is not standard language) * __boajsstoQ1X5jn __ * __b0åj0stu00jn __ * __*boajsstoQ1X5jn __ (is not standard language) * __*boaj0stu00jn __ (is not standard language) * __boaggoQ1X5jn __ * __b0åkku00jn __ * __*boaggoQ1X5jn __ (is not standard language) * __*boakku00jn __ (is not standard language) __ examples:__ __ examples:__ __ examples:__ __ examples:__ __ examples:__ __ examples:__ __ examples:__ __ examples:__ __ examples:__ __ examples:__ __æ:e Diphthong Simplification 1 __ __æ:e Diphthong Simplification 2__ * __hærránis __ * __hærránis __ * __hærránis#gæhttjalibme> __ * __hærránis#gæhttjalibme> __ * __pasiænnta>Q1 __ * __pasien0ta>0 __ * __patænnta>Q1 __ * __paten0ta>0 __ * __kvotiænnta>Q1 __ * __kvotien0ta>0 __ * __kliænnta>Q1 __ * __klien0ta>0 __ * __Lævnnja>Q1 __ * __Lev0nja>0 __ __ie:æ Diphthong Simplification Part I __ * __ielvveY9ut __ * __0ælvv00ut __ * __iehttseY1up __ * __0æhtts00up __ * __giesseQ8us __ * __g0ess00us __ __ie:æ Diphthong Simplification Part IIa__ Norwegian æ is an option... __ie:ä Diphthong Simplification Part IIb__ Swedish ä is an option... __ie:ä Diphthong Simplification Part IIc__ And there are no other options. This rule to block the e, and allow for the æ and ä. * __jeht0sa>Y6 __ * __jæhttse>0 __ * __jeht0sa>Y6 __ * __jähttse>0 __ * __gierre»X7dalla>t __ * __g0æ0rá»0dalla>t __ * __boarkkaQ1 __ * __b0år0ka0 __ * __loavddag8X3# __ * __l0åv0da00# __ __Vowel-change oa:å for verbs part I__ __Vowel-change oa:å for verbs part II__ * __hå0llaY2 __ * __hoallá0 __ * __gå0d0naY6 __ * __goaddne0 __ * __*hållaY2 __ (is not standard language) * __*hållá0 __ (is not standard language) * __gå0ht0saY6 __ * __goahttse0 __ !!Stem vowel alternations This section is divided according to stem vowels: a-, e-, o-, å-stems. !a-stem alternations For a-stems, there is a:e and a:i. Each alternation is triggered by a combination of phonological content and dummy symbols. __a:e in Present Participle of even-syllable verbs__ * __bassa>Y6 __ * __basse>0 __ __a:i in Prs Prc of even-syllable verbs__ * __bas'saY6jt# __ * __bas0si0jt# __ __a-stem vowel deletion__ !e-stem alternations For e-stems, there is e:i, e:á, e:å, e:u and e:a. Each alternation is triggered by a combination of phonological content and dummy symbols. __e:i in e-stems__ * __manasseQ4j __ * __manassi0j __ * __biesseQ1j __ * __bie0si0j __ * __boaht0eY6j __ * __boahtti0j __ * __gálleQ1tj# __ * __gá0li0sj# __ * __gálleQ1tjav# __ * __gá0li0tjav# __ * __gálleQ1tjin# __ * __gá0li0tjin# __ * __gálleQ1tjihpit# __ * __gá0li0tjihpit# __ * __gálleQ1tjibá# __ * __gá0li0tjibá# __ * __gálleQ1tjip# __ * __gá0li0tjip# __ * __gálleQ1tja# __ * __gá0li0tja# __ The following two rules constitute a <= / => rule pair. __e:á in certain stem types 1__ * __bálggeX4v __ * __bálggá0v __ * __gálleY3m# __ * __gállá0m# __ * __gálleQ2v# __ * __gá0lá0v# __ * __báhkoX7tj# __ * __bá0gu0sj# __ * __goahteX7tj# __ * __goa0dá0sj# __ * __*goahteX7tj# __ (is not standard language) * __*go00dá0sj# __ (is not standard language) __e:á in certain stem types 2__ * __bárnneX4m __ * __bárnná0m __ * __*bárnneX4m __ (is not standard language) * __*bárnne0m __ (is not standard language) __e:å in certain stem types with å as root vowel__ * __gådeQ2v __ * __gådå0v __ * __jåhteQ2v __ * __jå0då0v __ * __gådeY2 __ * __gådå0 __ * __jåhteY2 __ * __jåhtå0 __ * __jåhteY3m __ * __jåhtå0m __ * __låhkkeY7tj# __ * __låhkkå0sj# __ __e-stem vowel deletion__ * __ielvveY9ut __ * __0ælvv00ut __ !i-stem alternations For i-stems, there is i:á. The alternation is triggered by a combination of phonological content and dummy symbols. __i:á in Verb Derivation__ !o-stem alternations The duplicates of the three lines of the two following rules are there to resolve the => conflict between the two rules. __o:u in certain stem types 1__ __o:u in certain stem types 2__ __u:o in contracted nouns__ __o-stem vowel deletion__ !For å-stems there is å:e and å:i and vowel deletion. Each alternation is triggered by a combination of phonological content and dummy symbols. __å:e in Present Participle of even-syllable verbs__ __å:i in Actor nouns of even-syllable verbs__ __å-stem vowel deletion__ !alternations valid for several stem types __Stem vowel deletion in even-syllable verbs, imp 3sg, 3du, 2pl, 3pl__ * __ielvveY1up __ * __0ælvv00up __ * __giessaY1up __ * __giess00up __ * __bårråY1up __ * __bårr00up __ !!Consonant gradation rules The consonant gradation rules differ considerably from the corresponding rules for North Sámi. Instead of generalizing oversets of consonants (Cx:Cy <=> ...), each rule contains the alternation for one consonant only, and to the right of the <=> arrow is listed all the contexts where the relevant alternation appears. The disadvantage with this method is that the same context must be written several times, if e.g. both p, t and k are deleted in the same contexts, each of these contexts must be written several times, one for each consonant. The advantage is that there are no conflicts during compilation, compilation takes 10 seconds rather than 3 minutes. The earlier North-Sámi-style rule set was ordered according to CG pattern. This pattern is still visible in the new rules, via the reference S1-3 etc. (Spiik's Series 1, 3-letter pattern, etc) behind each subrule. This actually opens up for a migration to an xfst rule file instead of the current twolc format, since what xfst really cannot do is generalize over sets (Cx:Cy etc.). This is an issue for future revisions to decide. The rules are divided in two subsections, deletion rules and change (alternation) rules. !Deletion rules The b, d, g deletion rules are similar, via the optional ( b ) etc. in front of the "_" symbol, both bm:m and bbm:bm alternations are covered. The contexts differ to a certain extent. For b and d, the III-I special gradation bbm:m is covered by two separate rules, and a special Dummy (X6), not part of the ordinary WeG set. Note that one of the rules for t:0 refers to #: as part of its context. As soon as clitics are added to the word form, this rule will thus not be triggered. Look into this when the clitics are added. __Consonant gradation b:0__ __Consonant gradation d:0__ * __bednag8>X3 __ * __be0na0>0 __ __Consonant gradation g:0__ __Consonant gradation k:0__ __Consonant gradation l:0__ __Consonant gradation m:0__ __Consonant gradation n:0__ __Consonant gradation p:0__ __Consonant gradation s:0__ * __russjpeQ1 __ * __ru0sjpe0 __ * __*russjpeQ1 __ (is not standard language) * __*russjpe0 __ (is not standard language) __Consonant gradation ŋ:0__ __Consonant gradation f:0__ __Consonant gradation r:0__ __Consonant gradation v:0__ __Consonant gradation j:0__ __Consonant gradation t:0__ * __oajváladtj# __ * __oajvála0sj# __ __Gradation Series 4, II-I, tj and ts__ !Change rules The Cx:Cy format was kept for hk:g, hp:b, ht:d, since the left context h:0 was unique, and no compilation conflict thus arose. The bb:pp, gg:kk, dd:tt alternations were split into three rules, since keeping them in one Cx:Cy rule created compilation conflicts. Also, d:t contain a rule not found for the other two... __Gradation Series 4, II-I__ __bb:pp__ * __oabbáQ1 __ * __oappá0 __ __gg:kk__ * __vággeQ1 __ * __vákke0 __ * __*vággeQ1 __ (is not standard language) * __*vágge0 __ (is not standard language) __g:k change for clitic -ge__ __dd:tt and dtj, dts__ __Gradation Series 7, III-II, ks(t), kt, ktj, kts__ Exceptional II-III inverse gradation in present participles This gradation is only for II-I syllable verbs that get III as present participles. * bbm - bm - m * ddn - dn - n * ddnj- dnj- nj * ggŋ - gŋ - ŋ * ddj - dj - dj * hkk - hk - g * hpp - hp - b * htt - ht - d * httj- htj- tj * htts- hts- ts Strategy: Do insertion rule for the initial element. __Consonant insertion as II-III strengthening gradation with bm, gŋ__ __Consonant insertion as II-III strengthening gradation with dn/j + as I-III strengthening gradation with d__ __Consonant insertion as II-III strengthening gradation with hk, hp,__ __Consonant insertion as II-III strengthening gradation with htt(j/s)__ Debugging of twol-rules All rule conflicts have been successfully resolved. The rule file should be kept that way. Look out for conflicts in the compilation process, and resolve them as they appear!