Workplan
People, spring 04
The staff is roughly Trond, 100 %, Saara 50
%, Tomi 20 %, Lena 10 % (?), and Pekka
- Trond
- Running the project, working mainly on disambiguation, also on
the morphological parser and on planning the corpus
- Saara
- Maintaining the project's infrastrucure, designing the pre- and
postprocessors, the cgi-bin scripts, the corpus setup, and issues such
as localisation and a bug database,
- Tomi
- Evaluating the project, writing an evaluation
report. Maintaining the morphological parser
- Lena
- Testing, working on the lexicon
- Pekka
- Strategic planning, consultant on tricky linguistic questions
Milestones
The official milestones
In our application we stated the following milestone list for the
project:
Start Finish
1 Språkuavhengig preprosessering 2004 1 2004 1
2 Infrastruktur for disambiguering 2004 1 2004 2
3 Korpusgrensesnitt - prototyp 2004 1 2004 4
4 Grunnarbeid for nordsamisk 2004 1 2004 4
5 Nordsamisk disambiguering - prototyp 2004 1 2005 2
6 Revidere morfologiske analyseprogram 2004 1 2006 4
7 Grunnarbeid for lulesamisk 2004 3 2005 4
8 Lulesamisk disambiguering - prototyp 2004 4 2005 4
9 Parallelltekstkorpora - prototyp 2005 1 2005 2
10 Korpusgrensesnitt - beta 2005 1 2005 4
11 Nordsamisk disambiguering - beta 2005 3 2005 4
12 Parallelltekstkorpora - beta 2005 3 2006 1
13 Lulesamisk disambiguering - ferdig 2005 4 2006 2
14 Nordsamisk disambiguering - ferdig 2006 1 2006 4
15 Korpusgransesnitt - ferdig 2006 1 2006 4
16 Parallelltekstkorpora - ferdig 2006 2 2006 4
Comments to the issues that start in the spring of 2004
- The language-independent preprocessor (Saara)
- This goal is fulfilled, as we have a revised language
independent preprocessor (preprocess) and a
morphology-to-disambiguation processor (lookup2cg). There istill is
work to do on language specific preprocessing (not mentioned in the
list). This work will in practice run in parallel with other work
- Infrastructur for disambiguation (Trond)
- This was in place already in 2003
- Corpus interface (Saara, Trond)
- We are beginning this work now (scheduled finished at 20004 4)
- Disambiguation prototype for sme (Trond)
- The work is under way.
Other issues
- Derivation in the lookup2cg preprocessor (Saara, Trond)
- Problem: The sme.fst output for derivation is not optimal for
disambiguation (words get assigned POS twice). Goal: Make an optimal
version, either by reversing the morphophonological processes and
build a new baseform, or by introducing a special set of embedded POS
tags. If we decide on the latter, it should be done before may 04. The
former may take more time.
- A systematic testing the morphology of the parser (Tomi, Lena, Trond)
- This should be done before autumn 2004, when we speed up the work on the disambiguator
- Gather corpus texts (Trond)
- We should have achieved a large amount of texts by 2005 1, when
the corpus interface is finished. The work on gathering texts will
continue
Trond Trosterud
Last modified: Wed Mar 31 11:01:05 2004