14.12.2020 15:05-16:30 Mika, Linda, Tommi python3 -m onmt.bin.preprocess -train_src src-train.txt -train_tgt tgt-train.txt -valid_src src-val.txt -valid_tgt tgt-val.txt -save_data data/vocabulary [2020-12-14 15:32:12,475 INFO] Extracting features... [2020-12-14 15:32:12,477 INFO] * number of source features: 0. [2020-12-14 15:32:12,477 INFO] * number of target features: 0. [2020-12-14 15:32:12,477 INFO] Building `Fields` object... [2020-12-14 15:32:12,478 INFO] Building & saving training data... [2020-12-14 15:32:14,500 INFO] * tgt vocab size: 35820. [2020-12-14 15:32:14,557 INFO] * src vocab size: 24997. [2020-12-14 15:32:14,741 INFO] Building & saving validation data... mkdir data mkdir brnn #bi-directional recurrent neural network python3 -m onmt.bin.preprocess -train_src src-train.txt -train_tgt tgt-train.txt -valid_src src-val.txt -valid_tgt tgt-val.txt -save_data data/vocabulary python3 -m onmt.bin.train -data data/vocabulary -save_model brnn/nmt-model -encoder_type brnn -seed 3435 lastmodel=$( ls -t brnn/*.pt | head -1 ) python3 -m onmt.bin.translate -model $lastmodel -src source_test.txt -output brnn_pred.txt -replace_unk * every 10.000 steps there will be results in the brnn-folder * and it will be there even if I kill it * src-test.txt --- add sentences here to make sure they won't mess with us what we experimented with: 160 compound errors/sentences: - "Oslos bohtet {gáhtta bargit}¥{gáhttabargit} muitalit midjiide narkotihkkageavaheddjiid beaivválaš dilis." - "Dán rádjái eai leat beare stuora {ruhta supmit}¥{ruhtasupmit} juolluduvvon guollefoanddas Finnmárkui." - "Oktiibuot 13 Norgga doaktára leat leamaš mielde dáid {iskkadan bargguin}¥{iskkadanbargguin}." - "Mot sáhttá diet unna olbmáš duon {skohter reagas}¥{skohterreagas} nagodit oažžut olles Skandinavia álbmoga lihkadusaide." - "In leat gal jurddašan nu olu dan birra ahte mun lean nuoramus {stivra lahttu}¥{stivrralahttu}, muhto várra lea nu ahte mun sáhtán buktit stivrii juoidá." script to extract the correct format for the neural network (aligned) $ make-corpora.py -i syn-compound.yaml -o syn-compound.text $ head *syn-compound.text ==> src-syn-compound.text <== Oslos bohtet gáhtta bargit muitalit midjiide narkotihkkageavaheddjiid beaivválaš dilis. Dán rádjái eai leat beare stuora ruhta supmit juolluduvvon guollefoanddas Finnmárkui. Oktiibuot 13 Norgga doaktára leat leamaš mielde dáid iskkadan bargguin. Mot sáhttá diet unna olbmáš duon skohter reagas nagodit oažžut olles Skandinavia álbmoga lihkadusaide. In leat gal jurddašan nu olu dan birra ahte mun lean nuoramus stivra lahttu, muhto várra lea nu ahte mun sáhtán buktit stivrii juoidá. Sis leat stuora váttisvuođat go ii leat boastaboksa nummar reivves, muhto dušše boastabáikki nummar. Vaikko dat šaddá ge divraseappot go kr. 3,70, mii lea dábálaš reive porto Norggas. Leansmánni Knut Gullberg ii leat vuos sihkar ovddiduvvo go eaiggáda vuostá ráŋggáštus ášši. Guovdageainnu/Máze nominašuvdna čoahkkin lea duvdilan Ole Henrik Magga nuppi sadjái sámediggeválgalisttus. Bargiidbellodaga joavkojođiheaddji Magnhild Mathisen dovddaha ahte sii meannudedje preassadoarjja ášši beare oktageardánit. ==> tgt-syn-compound.text <== Oslos bohtet gáhttabargit muitalit midjiide narkotihkkageavaheddjiid beaivválaš dilis. Dán rádjái eai leat beare stuora ruhtasupmit juolluduvvon guollefoanddas Finnmárkui. Oktiibuot 13 Norgga doaktára leat leamaš mielde dáid iskkadanbargguin. Mot sáhttá diet unna olbmáš duon skohterreagas nagodit oažžut olles Skandinavia álbmoga lihkadusaide. In leat gal jurddašan nu olu dan birra ahte mun lean nuoramus stivrralahttu, muhto várra lea nu ahte mun sáhtán buktit stivrii juoidá. Sis leat stuora váttisvuođat go ii leat boastaboksanummar reivves, muhto dušše boastabáikki nummar. Vaikko dat šaddá ge divraseappot go kr. 3,70, mii lea dábálaš reive porto Norggas. Leansmánni Knut Gullberg ii leat vuos sihkar ovddiduvvo go eaiggáda vuostá ráŋggáštusášši. Guovdageainnu/Máze nominašuvdnačoahkkin lea duvdilan Ole Henrik Magga nuppi sadjái sámediggeválgalisttus. Bargiidbellodaga joavkojođiheaddji Magnhild Mathisen dovddaha ahte sii meannudedje preassadoarjjaášši beare oktageardánit. [ [ {"error":"gáhtta bargit", "correct":"gáhttabargit"}, ], ] Error mark-up: https://giellalt.uit.no/proof/spelling/testdoc/error-markup.html earáláhkái --- Err/Unspace > eará láhkái lang-sme/tools/grammarcheckers/tests/syn-compound.yaml https://github.com/giellalt/lang-sme/tree/develop/tools/grammarcheckers/tests https://gtsvn.uit.no/freecorpus