Neural Network

14:00-15:15

Linda, Tommi, Chiara

* Tommi is preprocessing the corpus
* Mika wants to use the jason format instead having two corpora that are aligned (error - non-error)
* Mika is using CSC cloud computer for experimenting/testing
* maybe we can get an account in Oslo
* Mika suggested the following set-up: https://opennmt.net/OpenNMT-py/quickstart.html
  * https://github.com/mikahama/compound-errors
* Chiara remembers that Måns used another free setup
* both pdf and code are here: main/art/2018/maskinlaringskurs
* As part of the course in Oslo, we used the infrastructure https://www.sigma2.no/about-sigma2
* Tommi did word-based and character-based experiments - we get mostly nonsense results
* Tommi thinks we should get some sensible results though
** fix the script (maybe the alignment is off) gtfree.bash or text2chars.py
** getting more data (Tommi is using Mikas script)
* Tommi will use Mikas compound splitting script on 50 sentences from boundcorpus and send them to me together with the original text, and I will go through it to see if there are problems
* Mika added Chiara to the project on github


Examples from yaml tests:

  - "Oslos bohtet {gáhtta bargit}¥{gáhttabargit} muitalit midjiide narkotihkkageavaheddjiid beaivválaš dilis."
  - "Dán rádjái eai leat beare stuora {ruhta supmit}¥{ruhtasupmit} juolluduvvon guollefoanddas Finnmárkui."