!!!General tasks 2015 In the autumn we will make a plan for the MT work. In this document is only for dictionary and FST * Project leader (__Trond__) * Linguistic work coordination (__ML__) * Dictionary coordination (__Trond__) * FST coordination (__Lene__) * MT implementation (__Francis__) * Dictionary transfer (__Ciprian__) * Data for NDS dictionaries in both directions (__Ciprian__) * SMN Korp (__Ciprian__) !! August Workers in August * Trond * Lene * ML * Erika * Ciprian partly !!Works to be done: !Time allocation * Bidix * Twolc * Verbs * Nouns * Oulu presentation !!Tasks ! verbs * linguistics: ** [Finish the verbal inflection overview|VerbalInflection.html] ** (yamls as needed) * Principles for yamls: * Following the grouping in the grammar ** Bisyllabics: all stem vowels + root vowels in order to cope with lowering ** Trisyllabics: types are there already ** Contracted: in principle as for the bi/tri above * lexc + twolc <== thereafter do analysis ! nouns * Finish the fst to let yamls through * Lexicon: ** Appr 200 __nounstems__ left ! adjectives <== get the landscape clear * Linguistics: ** attr forms ** map the stems to the nominal classes, prefix A_contlex ** Allocate 1122 __adjstems__ ! other, closed POS <== get the POS right * check fst up against grammar !Lexicon * Work with missing lists ** situation 7.8.15: coverage = 68%, including names, corpus = 1,1 mill) {{{ For reference: Command: cat misc/boundsmn.txt |preprocess|grep '[a-z]'|wc -l cat misc/boundsmn.txt |preprocess|grep '[a-z]'|usmn|grep '?'|wc -l }}} !Dictionaries: * Finish bidix (Cip, Miina) * Redo finsmn NDS * Oulu presentation !! smn-fin-smn dictionary launching * Present the dictionary to the users (__ML, Miina__) ** Present the dictionary at the schools and universities *** Ivalo, Inari, Oulun Yliopisto, Helsingin Yliopisto * Present the dictionary to the public (__ML, Miina__) in September ** Saami parliament ** Press: YLE, Ávvir, Inarilainen, Lapinkansa, Helsingin Saanomat (make a happening!) !! Dictionary: smn-fin-smn - deadline August 25? A test version of the dictionary is online TODO: * improve the dictionary interface ** localisation in Inari Saami (__ML__) ** improve information in the interface (__Erika__) ** improve presentation of paradigms and morphological information (__Erika__) ** add context to paradigms (__Erika__) ** look at homonyms (get the correct translation and the correct paradigm) (__Erika__) * Improve the columns (__Ciprian, Miina, ML, Trond__) !! Dictionary: sme-smn transfer - deadline for 1-2 is August 5?, for the cifu presentation The tool itself could be launched much later, as another tool than smn-fin-smn TODO: # correct columns in input excel dict files (__Ciprian; ML, Miina, Trond__) # make transfer sme-fin + fin-smn (__Ciprian__) # improve coverage ## Find holes (lacunas) in the dictionary (__Trond__) ## add missing translations (__ML, Miina__) # put the dictionary online (__Ryan__) # improve the interface (__ML__) # launch the dictionary, when? how? !!!Testing FST # Automatic testing (make check) ## yaml-files ## generating of lemmas ## generating of miniparadigms # Analysis ## Analysis of texts (__Erika__) ## Coverage: creating missing lists, adding words to analyser # Testing of analysator and dictonary (__ML, Miina__) !!!Morphology !!nouns.lexc - first priority TODO: * test setup ** make yaml-files (__ML, Miina__) ** add mini-paradigms in the stem file (__ML, Miina__) * stems file (__ML, Miina, Erika__) ** correct stems and give correct contlex ** classify ''nounstems'' (over 1750 unclassified entries) * affix-file (__Lene, ML__) * twolc work (__Lene__) !!verbs.lexc - second priority TODO: * test setup ** make yaml-files (__ML, Miina, Trond__) ** add mini-paradigms in the stem file (__ML, Miina__) * affix-file (__Trond, ML__) !!adjectives.lexc - third priority TODO: * test setup ** make yaml-files (__ML, Miina?, Trond__) ** add mini-paradigms in the stem file (__ML, Miina?__) ** ensure nom sg test routines * affix-file (__Lene, ML, Trond__) * twolc work (__Lene__) * correct stems and give correct contlex (__ML, Miina?__) !!smn-propernouns.lexc: TODO: * Classifisy the place names in the file, and make new lexicons (copy from nouns) (__Erika__) ** jävri AIGI > JAVRI, N+Prop+Sem/Plc * Add smn person names (ask Mattus) (__Lene__) * Classifisy person names and make lexicons (__Erika__) * Redirect smi-propernouns.lexc to smn (__Lene__) * Adjust affixes/propernouns.lexc (__Lene, Erika__) !! abbreviations, acronyms - copy from sme * Adjust to smn (__Erika__) !! numerals, pronouns * Test and correct (__Erika__) * add more pronouns (take from text books, analysis) (__Erika__) !! adverbs, adpositions, conjunctions, subjunctions, particles, interjections * Check PoS (__Erika__) ** move words to other files ** add more lemmas (take from text books, analysis) !! punctuation.lexc - should be ok !!!Dependencies !!POS internal dependencies For all FST work the following dependencies hold (for words without morphology several steps may be skipped): # Linguistic ground work # Yaml files and other test setup # Plan of attack # lexc and twolc work for the words in the yamlfiles # yaml testing and refinement until yamlfiles are 100% # go through lexicon file for all members of the contlex !!Dependencies between POS within the FST * N before A * N before Prop * N, V, A before derivation * N before Px Otherwise there are no dependencies between the POS. !!Dependencies between FST and dict and MT # FST good enough to generate a substantial part of N, V, A paradigms # an useful Neahttadigisánit with click-in-text # FST with all POS done (but errors and holes here and there) # good NDS with paradigm generation !!Dependencies between FST and MT # FST good enough to generate a substantial part of N, V, A paradigms # alpha version of MT # FST with all POS done (but errors and holes here and there) # start working on MT transfer rules Bidix and FST are not dependent upon each other, but it is easier to collect data to bidix with a good FST for text analysis. !!!General tasks 2015 In the autumn we will make a plan for the MT work. In this document is only dictionary and FST * Project leader (__Trond__) * Linguistic work coordination (__ML__) * Dictionary coordination (__Trond__) * FST coordination (__Lene__) * MT implementation (__Francis__) * Dictionary transfer and other pipelines, ja smn Korp (__Cip__) !!! smn-fin-smn dictionary launching * Present the dictionary to the users (__ML, Miina__) ** Present the dictionary at the schools and universities *** Ivalo, Inari, Oulun Yliopisto, Helsingin Yliopisto * Present the dictionary to the public (__ML, Miina__) in September ** Saami parliament ** Press: YLE, Ávvir, Inarilainen, Lapinkansa, Helsingin Saanomat (make a happening!) !! Dictionary: smn-fin-smn - deadline August 25? A test version of the dictionary is online TODO: * improve the dictionary interface ** localisation in Inari Saami (__ML__) ** improve information in the interface (__Erika__) ** improve presentation of paradigms and morphological information (__Erika__) ** add context to paradigms (__Erika__) ** look at homonyms (get the correct translation and the correct paradigm) (__Erika__) * Improve the columns (__Cip, Miina, ML, Trond__) !! Dictionary: sme-smn transfer - deadline for 1-2 is August 5?, for the cifu presentation The tool itself could be launched much later, as another tool than smn-fin-smn TODO: # correct columns in input excel dict files (__Ciprian; ML, Miina, Trond__) # make transfer sme-fin + fin-smn (__Ciprian__) # improve coverage ## Find holes (lacunas) in the dictionary (__Trond__) ## add missing translations (__ML, Miina__) # put the dictionary online (__Ryan__) # improve the interface (__ML__) # launch the dictionary, when? how? !!!Testing FST # Automatic testing (make check) ## yaml-files ## generating of lemmas ## generating of miniparadigms # Analysis ## Analysis of texts (__Erika__) ## Coverage: creating missing lists, adding words to analyser # Testing of analysator and dictonary (__ML, Miina__) !!!Morphology !!nouns.lexc - first priority TODO: * test setup ** make yaml-files (__ML, Miina__) ** add mini-paradigms in the stem file (__ML, Miina__) * stems file (__ML, Miina, Erika__) ** correct stems and give correct contlex ** classify ''nounstems'' (over 1750 unclassified entries) * affix-file (__Lene, ML__) * twolc work (__Lene__) !!verbs.lexc - second priority TODO: * test setup ** make yaml-files (__ML, Miina, Trond__) ** add mini-paradigms in the stem file (__ML, Miina__) * affix-file (__Trond, ML__) !!adjectives.lexc - third priority TODO: * test setup ** make yaml-files (__ML, Miina?, Trond__) ** add mini-paradigms in the stem file (__ML, Miina?__) ** ensure nom sg test routines * affix-file (__Lene, ML, Trond__) * twolc work (__Lene__) * correct stems and give correct contlex (__ML, Miina?__) !!smn-propernouns.lexc: TODO: * Classifisy the place names in the file, and make new lexicons (copy from nouns) (__Erika__) ** jävri AIGI > JAVRI, N+Prop+Sem/Plc * Add smn person names (ask Mattus) (__Lene__) * Classifisy person names and make lexicons (__Erika__) * Redirect smi-propernouns.lexc to smn (__Lene__) * Adjust affixes/propernouns.lexc (__Lene, Erika__) !! abbreviations, acronyms - copy from sme * Adjust to smn (__Erika__) !! numerals, pronouns * Test and correct (__Erika__) * add more pronouns (take from text books, analysis) (__Erika__) !! adverbs, adpositions, conjunctions, subjunctions, particles, interjections * Check PoS (__Erika__) ** move words to other files ** add more lemmas (take from text books, analysis) !! punctuation.lexc - should be ok !!!Dependencies !!POS internal dependencies For all FST work the following dependencies hold (for words without morphology several steps may be skipped): # Linguistic ground work # Yaml files and other test setup # Plan of attack # lexc and twolc work for the words in the yamlfiles # yaml testing and refinement until yamlfiles are 100% # go through lexicon file for all members of the contlex !!Dependencies between POS within the FST * N before A * N before Prop * N, V, A before derivation * N before Px Otherwise there are no dependencies between the POS. !!Dependencies between FST and dict and MT # FST good enough to generate a substantial part of N, V, A paradigms # an useful Neahttadigisánit with click-in-text # FST with all POS done (but errors and holes here and there) # good NDS with paradigm generation !!Dependencies between FST and MT # FST good enough to generate a substantial part of N, V, A paradigms # alpha version of MT # FST with all POS done (but errors and holes here and there) # start working on MT transfer rules Bidix and FST are not dependent upon each other, but it is easier to collect data to bidix with a good FST for text analysis.