!!!General tasks 2015
In the autumn we will make a plan for the MT work. In this document is only for dictionary and FST

* Project leader (__Trond__)
* Linguistic work coordination (__ML__)
* Dictionary coordination (__Trond__)
* FST coordination (__Lene__)
* MT implementation (__Francis__)
* Dictionary transfer (__Ciprian__)
* Data for NDS dictionaries in both directions (__Ciprian__)
* SMN Korp (__Ciprian__)

!! August
Workers in August
* Trond
* Lene 
* ML
* Erika
* Ciprian partly

!!Works to be done:

!Time allocation
* Bidix
* Twolc
* Verbs
* Nouns
* Oulu presentation

!!Tasks
! verbs
* linguistics: 
** [Finish the verbal inflection overview|VerbalInflection.html]
** (yamls as needed)
* Principles for yamls:
* Following the grouping in the grammar
** Bisyllabics: all stem vowels + root vowels in order to cope with lowering
** Trisyllabics:  types are there already
** Contracted: in principle as for the bi/tri above
* lexc + twolc <== thereafter do analysis

! nouns
* Finish the fst to let yamls through
* Lexicon:
** Appr 200 __nounstems__ left

! adjectives <== get the landscape clear
* Linguistics:
** attr forms
** map the stems to the nominal classes, prefix A_contlex
** Allocate 1122 __adjstems__

! other, closed POS <== get the POS right
* check fst up against grammar

!Lexicon
* Work with missing lists 
** situation 7.8.15: coverage = 68%, including names, corpus = 1,1 mill)

{{{
For reference: Command:
cat misc/boundsmn.txt |preprocess|grep '[a-z]'|wc -l
cat misc/boundsmn.txt |preprocess|grep '[a-z]'|usmn|grep '?'|wc -l
}}}

!Dictionaries: 
* Finish bidix (Cip, Miina)
* Redo finsmn NDS
* Oulu presentation


!! smn-fin-smn dictionary launching
* Present the dictionary to the users (__ML, Miina__)
** Present the dictionary at the schools and universities
*** Ivalo, Inari, Oulun Yliopisto, Helsingin Yliopisto 
* Present the dictionary to the public (__ML, Miina__) in September
** Saami parliament
** Press: YLE, Ávvir, Inarilainen, Lapinkansa, Helsingin Saanomat (make a happening!)

!! Dictionary: smn-fin-smn - deadline August 25?
A test version of the dictionary is online

TODO:
* improve the dictionary interface
** localisation in Inari Saami (__ML__)
** improve information in the interface (__Erika__)
** improve presentation of paradigms and morphological information (__Erika__)
** add context to paradigms  (__Erika__)
** look at homonyms (get the correct translation and the correct paradigm) (__Erika__)
* Improve the columns (__Ciprian, Miina, ML, Trond__)

!! Dictionary: sme-smn transfer - deadline for 1-2 is August 5?, for the cifu presentation

The tool itself could be launched much later, as another tool than smn-fin-smn 

TODO:
# correct columns in input excel dict files (__Ciprian; ML, Miina, Trond__)
# make transfer sme-fin + fin-smn (__Ciprian__)
# improve coverage
## Find holes (lacunas) in the dictionary (__Trond__)
## add missing translations  (__ML, Miina__)
# put the dictionary online (__Ryan__)
# improve the interface  (__ML__)
# launch the dictionary, when? how?

!!!Testing FST
# Automatic testing (make check)
## yaml-files
## generating of lemmas
## generating of miniparadigms
# Analysis
## Analysis of texts (__Erika__)
## Coverage: creating missing lists, adding words to analyser 
# Testing of analysator and dictonary (__ML, Miina__)


!!!Morphology
!!nouns.lexc - first priority

TODO:
* test setup
** make yaml-files (__ML, Miina__)
** add mini-paradigms in the stem file (__ML, Miina__)
* stems file  (__ML, Miina, Erika__)
** correct stems and give correct contlex
** classify ''nounstems''  (over 1750 unclassified entries)
* affix-file (__Lene, ML__)
* twolc work (__Lene__)

!!verbs.lexc - second priority
TODO:
* test setup
** make yaml-files (__ML, Miina, Trond__)
** add mini-paradigms in the stem file (__ML, Miina__)
* affix-file (__Trond, ML__)

!!adjectives.lexc - third priority
TODO:
* test setup
** make yaml-files (__ML, Miina?, Trond__)
** add mini-paradigms in the stem file (__ML, Miina?__)
** ensure nom sg test routines
* affix-file (__Lene, ML, Trond__)
* twolc work (__Lene__)
* correct stems and give correct contlex (__ML, Miina?__)


!!smn-propernouns.lexc:
TODO:
* Classifisy the place names in the file, and make new lexicons (copy from nouns) (__Erika__)
** jävri AIGI > JAVRI, N+Prop+Sem/Plc
* Add smn person names (ask Mattus) (__Lene__)
* Classifisy person names and make lexicons (__Erika__)
* Redirect smi-propernouns.lexc to smn  (__Lene__)
* Adjust affixes/propernouns.lexc (__Lene, Erika__)

!! abbreviations, acronyms - copy from sme
* Adjust to smn (__Erika__)

!! numerals, pronouns
* Test and correct (__Erika__)
* add more pronouns (take from text books, analysis) (__Erika__)

!! adverbs, adpositions, conjunctions, subjunctions, particles, interjections
* Check PoS (__Erika__)
** move words to other files
** add more lemmas (take from text books, analysis)

!! punctuation.lexc - should be ok

!!!Dependencies

!!POS internal dependencies

For all FST work the following dependencies hold
(for words without morphology several steps may be skipped):

# Linguistic ground work
# Yaml files and other test setup
# Plan of attack
# lexc and twolc work for the words in the yamlfiles
# yaml testing and refinement until yamlfiles are 100%
# go through lexicon file for all members of the contlex

!!Dependencies between POS within the FST

* N before A
* N before Prop
* N, V, A before derivation
* N before Px

Otherwise there are no dependencies between the POS.

!!Dependencies between FST and dict and MT

# FST good enough to generate a substantial part of N, V, A paradigms
# an useful Neahttadigisánit with click-in-text
# FST with all POS done (but errors and holes here and there)
# good NDS with paradigm generation

!!Dependencies between FST and MT


# FST good enough to generate a substantial part of N, V, A paradigms
# alpha version of MT
# FST with all POS done (but errors and holes here and there)
# start working on MT transfer rules

Bidix and FST are not dependent upon each other, but it is easier to 
collect data to bidix with a good FST for text analysis.

!!!General tasks 2015
In the autumn we will make a plan for the MT work. In this document is only dictionary and FST

* Project leader (__Trond__)
* Linguistic work coordination (__ML__)
* Dictionary coordination (__Trond__)
* FST coordination (__Lene__)
* MT implementation (__Francis__)
* Dictionary transfer and other pipelines, ja smn Korp (__Cip__)


!!! smn-fin-smn dictionary launching
* Present the dictionary to the users (__ML, Miina__)
** Present the dictionary at the schools and universities
*** Ivalo, Inari, Oulun Yliopisto, Helsingin Yliopisto 
* Present the dictionary to the public (__ML, Miina__) in September
** Saami parliament
** Press: YLE, Ávvir, Inarilainen, Lapinkansa, Helsingin Saanomat (make a happening!)

!! Dictionary: smn-fin-smn - deadline August 25?
A test version of the dictionary is online

TODO:
* improve the dictionary interface
** localisation in Inari Saami (__ML__)
** improve information in the interface (__Erika__)
** improve presentation of paradigms and morphological information (__Erika__)
** add context to paradigms  (__Erika__)
** look at homonyms (get the correct translation and the correct paradigm) (__Erika__)
* Improve the columns (__Cip, Miina, ML, Trond__)

!! Dictionary: sme-smn transfer - deadline for 1-2 is August 5?, for the cifu presentation

The tool itself could be launched much later, as another tool than smn-fin-smn 

TODO:
# correct columns in input excel dict files (__Ciprian; ML, Miina, Trond__)
# make transfer sme-fin + fin-smn (__Ciprian__)
# improve coverage
## Find holes (lacunas) in the dictionary (__Trond__)
## add missing translations  (__ML, Miina__)
# put the dictionary online (__Ryan__)
# improve the interface  (__ML__)
# launch the dictionary, when? how?

!!!Testing FST
# Automatic testing (make check)
## yaml-files
## generating of lemmas
## generating of miniparadigms
# Analysis
## Analysis of texts (__Erika__)
## Coverage: creating missing lists, adding words to analyser 
# Testing of analysator and dictonary (__ML, Miina__)


!!!Morphology
!!nouns.lexc - first priority

TODO:
* test setup
** make yaml-files (__ML, Miina__)
** add mini-paradigms in the stem file (__ML, Miina__)
* stems file  (__ML, Miina, Erika__)
** correct stems and give correct contlex
** classify ''nounstems''  (over 1750 unclassified entries)
* affix-file (__Lene, ML__)
* twolc work (__Lene__)

!!verbs.lexc - second priority
TODO:
* test setup
** make yaml-files (__ML, Miina, Trond__)
** add mini-paradigms in the stem file (__ML, Miina__)
* affix-file (__Trond, ML__)

!!adjectives.lexc - third priority
TODO:
* test setup
** make yaml-files (__ML, Miina?, Trond__)
** add mini-paradigms in the stem file (__ML, Miina?__)
** ensure nom sg test routines
* affix-file (__Lene, ML, Trond__)
* twolc work (__Lene__)
* correct stems and give correct contlex (__ML, Miina?__)


!!smn-propernouns.lexc:
TODO:
* Classifisy the place names in the file, and make new lexicons (copy from nouns) (__Erika__)
** jävri AIGI > JAVRI, N+Prop+Sem/Plc
* Add smn person names (ask Mattus) (__Lene__)
* Classifisy person names and make lexicons (__Erika__)
* Redirect smi-propernouns.lexc to smn  (__Lene__)
* Adjust affixes/propernouns.lexc (__Lene, Erika__)

!! abbreviations, acronyms - copy from sme
* Adjust to smn (__Erika__)

!! numerals, pronouns
* Test and correct (__Erika__)
* add more pronouns (take from text books, analysis) (__Erika__)

!! adverbs, adpositions, conjunctions, subjunctions, particles, interjections
* Check PoS (__Erika__)
** move words to other files
** add more lemmas (take from text books, analysis)

!! punctuation.lexc - should be ok

!!!Dependencies

!!POS internal dependencies

For all FST work the following dependencies hold
(for words without morphology several steps may be skipped):

# Linguistic ground work
# Yaml files and other test setup
# Plan of attack
# lexc and twolc work for the words in the yamlfiles
# yaml testing and refinement until yamlfiles are 100%
# go through lexicon file for all members of the contlex

!!Dependencies between POS within the FST

* N before A
* N before Prop
* N, V, A before derivation
* N before Px

Otherwise there are no dependencies between the POS.

!!Dependencies between FST and dict and MT

# FST good enough to generate a substantial part of N, V, A paradigms
# an useful Neahttadigisánit with click-in-text
# FST with all POS done (but errors and holes here and there)
# good NDS with paradigm generation

!!Dependencies between FST and MT


# FST good enough to generate a substantial part of N, V, A paradigms
# alpha version of MT
# FST with all POS done (but errors and holes here and there)
# start working on MT transfer rules

Bidix and FST are not dependent upon each other, but it is easier to 
collect data to bidix with a good FST for text analysis.