Neural Network 11.01.2021 Linda, Tommi, Chiara, Mika * Tommis script to split - already splitting better than before * Mika has been waiting for more data * Mika: how easy is it to do part-of-speech tagger in sme? - it may improve the results * Add POS tags to json files: { "correct": "vuosttašriegádanvuoigatvuođa", "error": [ "vuosttaža", "riegádeapmi", "vuoigatvuođa" ], "pos" : [ "A", "N", "N" ] }, Plan: * Tommi runs the script on the whole corpus to put it into the json format - push it to github (possilby split into several files) * Linda will search for bugs once more * Mika will produce the data for the NMT models from the JSON * count on crashes (several ones) on CSC when training the data * non-optimal results -> we would have to make some hand-produced * what we wanna do is filter out potential compound errors by means of the rule-based handwritten grammarchecker - filter out the following tag: &msyn-compound * what corpus should we leave for evaluation? - 30% or 15% - we should have a count of the corpus * select the folders to split out on Tuseday Plan for the article: * is there a sample article: ** Introduction ** related work ** datasection ** method section - two methods - rule-based approach vs. explain neural networks ** results and conclusion Bidirectional LSTM Tagger for Latvian Grammatical Error Detection - https://doi.org/10.1007/978-3-030-27947-9_5 Grammar Error Correction in Morphologically Rich Languages: The Case of Russian https://www.aclweb.org/anthology/Q19-1001.pdf