Meeting on smenob transition April 8th, 2014 Fran, Lene, Sjur, Trond !!!LEXICAL SELECTION ny fil: apertium-sme-nob.sme-nob.lex gammel fil: dev/archive/apertium-sme-nob.sme-nob.lex Now we use SELECT/REMOVE rules instead of SUBSTITUTE rules. The input looks like: {{{ $ echo "biebmu" | apertium -d . sme-nob-biltrans ^biebmu<@HNOUN>/føde<@HNOUN>/mat<@HNOUN>$^./.$ And the output: $ echo "biebmu" | apertium -d . sme-nob-lex ^biebmu<@HNOUN>/mat<@HNOUN>$^./.$ }}} It is much easier to debug this way. Although it means rewriting the old rules, and specifically writing default rules. {{SELECT ("mat"i) IF (0 (""i)) ; }} the source language lemma ( "biebmu") is the WORDFORM of the biltrans output. word form = biebmu<@HNOUN> (tags are invisible) readings: # "føde" n m unc sg nom @HNOUN # "mat" n m unc sg nom @HNOUN {{{ $ echo "biebmu" | apertium -d . sme-nob-biltrans ^biebmu<@HNOUN>/føde<@HNOUN>/mat<@HNOUN>$^./.$ Output of biltrans in CG-style output: "" n sg nom @HNOUN "føde" n m unc sg nom @HNOUN "mat" n m unc sg nom @HNOUN "<.>" clb "." sent clb }}} Lene vil oppdatere lex-fila frå dev/archive til gjeldande fil. !!!REGRESSIONS {{{ sme-nob boradišgohten. - jeg begynte å spise. + #spise. $ echo "boradišgohten" | apertium -d . sme-nob-biltrans ^boradit<@+FMAINV>/spise<@+FMAINV>$^./.$ Original: ^boradit+goahti<@+FMAINV>$ }}} So, how do we change der_goahti to +goahti ? {{{ boradišgohten boradišgohten boradit+V+TV+Der/goahti+V+Ind+Prt+Sg1 borakeahttá borakeahttá borrat+V+TV+VAbess $ echo "boradišgohten" | apertium -d . sme-nob-morph ^boradišgohten/boradit/borrat$^./.$ $ echo "^boradit+goahti<@+FMAINV>$" | apertium-pretransfer | lt-proc -b sme-nob.autobil.bin ^boradit/spise$ ^goahti<@+FMAINV>/begynne<@+FMAINV>$ $ echo "^boradit+goahti<@+FMAINV>$" | apertium-pretransfer | lt-proc -b sme-nob.autobil.bin | apertium-transfer -b apertium-sme-nob.sme-nob.t1x sme-nob.t1x.bin ^verb<@+FMAINV>{^begynne$}$ ^part{^å$}$ ^verb{^spise$}$ -> begynte å spise }}} !!!Focus words {{{ +Foc/ge +ge +Foc/gen +gen +Foc/ges +ges +Foc/gis +gis +Foc/naj +naj +Foc/ba +ba +Foc/be +be +Foc/hal +hal +Foc/han +han +Foc/bat +bat +Foc/son +son +Foc/naj+Qst +naj +Qst+Foc/son +son }}} The + makes them into separate words (for the lexical transfer?) # translate tags to plus notation # disambiguate # split words # send to biltrans and get nob words for the + "words" {{{ $ echo "^boađátge/boahtit+ge$" | cg-conv -a -l "" "boahtit" v iv ind prs sg2 "ge" pcle Unhammer, did you make any changes to the CG in sme-nob in order to deal with subreadings ? can't remember … }}} From SMA: {{{ LEXICON FINAL1 ENDLEX ; +Foc/ge+Use/Circ:#ge ENDLEX ; ! +Foc/gan+Use/Circ:#gan ENDLEX ; ! +Foc/gih+Use/Circ:#gih ENDLEX ; ! +Foc/gænnah+Use/Circ:#gænnah ENDLEX ; ! }}} !!!Todo Sjur@Apertium: * fjern derivasjons-strengar m. språkparspesifikke tilpassingar * endra visse taggar for visse språkpar * pkgconfig-skript for GTD-språka Legge til ord frå nobsme/inc/False* til smenob # Dele smenob-katalogen i # Ha i separat katalog loansrc # Legge til alle smenob/src til bidix # Legge atil alle smenob/loansrc til bidix, men merka r="LR" i sme-nob.dix Propernouns # Legge til sme-nob (og sme-fin og sme-swe) frå geo/xml_src # smi-propernouns.lexc # commons smi