This document is for discussion notes ===================================== This command gives transcription + POS: cat corp/pedkorpus.txt | preprocess | lookup -flags mbTT -utf8 bin/pos-sme.fst | tr ' ' '+' | cut -d"+" -f1,3 | uniq | lookup -flags mbTT -utf8 bin/phon-sme.fst | cut -f2 | l Discussion 22th aug =================== text -> /syntactic analysis/ -> intonation marking -> phon repr -> sound gen lexical lookup: find base form, pick enriched graphemes, use them in conversion to phon repr e with dot below = e7 in our internal repr LU čájehit+V+Ind+Prs+Sg1 sme.fst + sme-dis.rle --------------------- LL čáje7hán => tʃæjəhæn PU čáje7hán -------- PL čájehán compiles into LU čájehit+V+Ind+Prs+Sg1 --------------------- PL čájehán => tʃæjehæn (given the transducer phon-sme.fst => e7 disappears, we must rewrite our stuff in order to put it into "Diphthong Simplification in i-Stems before Suffixes Beginning with j:" Vx:0 <=> Vow: _ Cns:+ i ( %>: ) ( »: ) X5: ; where Vx in (e o a) ; ! goah'tiX5jd:go0điid "Diphthong lengthening in Simplification in i-Stems before Suffixes Beginning with j:" Vx:Vy <=> _ Vz: Cns:+ i ( %>: ) ( »: ) X5: ; where Vx in (o i u) ; ! goah'tiX5jd:go0điid where Vy in (o9 i9 u9) ; ! goah'tiX5jd:go0điid where Vz in (a e o) ; ! goah'tiX5jd:go0điid oa => o9 ie i9 uo u9 johka jo0ga gođii go:ðij leksikalsk transducer, fjern LU, då står vi att med ein einnivåmodell som inneheld o9 sp legg vi på tonivåmodellen LU goahti+N+Sg+Gen LL goahtiX4 PU goahtiX4 twol PL goađi 66+67 = leks. trans. 69-71 = twol 1 - fjern oppsida av leks. (LU:LL => LL) 2 - komp. twol (PU:PL) 3 - kompos. ein-nivå-leks med twol .o. (LL/PU -> LL:PL 4 - kompos. 3 med genererande IPA-twol: IPA:PL IPA twol LL IPA:LL