Neural Network 14:00-15:15 Linda, Tommi, Chiara * Tommi is preprocessing the corpus * Mika wants to use the jason format instead having two corpora that are aligned (error - non-error) * Mika is using CSC cloud computer for experimenting/testing * maybe we can get an account in Oslo * Mika suggested the following set-up: https://opennmt.net/OpenNMT-py/quickstart.html * https://github.com/mikahama/compound-errors * Chiara remembers that Måns used another free setup * both pdf and code are here: main/art/2018/maskinlaringskurs * As part of the course in Oslo, we used the infrastructure https://www.sigma2.no/about-sigma2 * Tommi did word-based and character-based experiments - we get mostly nonsense results * Tommi thinks we should get some sensible results though ** fix the script (maybe the alignment is off) gtfree.bash or text2chars.py ** getting more data (Tommi is using Mikas script) * Tommi will use Mikas compound splitting script on 50 sentences from boundcorpus and send them to me together with the original text, and I will go through it to see if there are problems * Mika added Chiara to the project on github Examples from yaml tests: - "Oslos bohtet {gáhtta bargit}¥{gáhttabargit} muitalit midjiide narkotihkkageavaheddjiid beaivválaš dilis." - "Dán rádjái eai leat beare stuora {ruhta supmit}¥{ruhtasupmit} juolluduvvon guollefoanddas Finnmárkui."