List of things to do

Intro

This is not a complete list of the work still to do, or a strategic plan. Rather, it is a list of things to do next. The reason why it is made is that more people are getting involved in this project, and we must be even more explicit. The point with this document is thus not the issues themselves (they are described elsewhere), but the lists. The list contains linguistic issues only. A todo-list for technical issues, including our web environment, is found here.

Disambiguation

  1. The preprocessor must fix the derivation issue, so that the embedded POS tags do not interfer with the disambiguation rules
  2. Make correct corpora
  3. Go systematically through the rules

Corpora

  1. Set up the IMS Corpus workbench interface on cochise
  2. Make an XML specification, based on TEI etc. for the metadata of the corpus texts
  3. Include a corpus text, as a test case
  4. Collect texts

The morphological parsers

sme

  1. Linguistic problems
  2. Place names
  3. Testing on running texts
  4. New test texts should be added to the test diary (from Davvi Girji, NSI, Samediggi)
  5. The rule file should be read through
  6. The verbal sublexica should be reassigned (Biret?)

smj

General work:

  1. The smj project must be coordinated with work at Árran.
  2. We need access to dictionaries, in order to complete the parser
  3. Place names must be added to the propernoun file
  4. Then we need test corpora, starting with the NT and novels
  5. The parser should be made part of pedagogical softpare projects
The parser:

  1. Vocabulary testing on texts
  2. Grammar testing, the paradigms


The other Sami languages

We will not work on other languages than North and Lule Sami during this project period. When we start up with the other Sami languages again, the following may be seen as a starting point.

sma

  1. Lauri Karttunen's comments must be addressed
  2. We need more corpora
  3. We need to cooperate with the dictionary projects
  4. Place names must be added to the propernoun file
  5. Then the next step should then be to introduce more parts of speech in the system, perhaps in the following order:

    1. the closed classes (almost complete)
    2. verbs (there is a draft version in place)
    3. adjectives (their predicative declension should be pointed to the nouns; the attributive forms need linguistic ground work)
    4. derivation and compounding
    5. South Sami place names
    6. loan words

smn

  1. Write documentation
  2. Get a grammatical description
  3. Write the basic twol file + the grammar file
  4. Get dictionary & complete the lexica
  5. Get texts to test against
  6. Find cooperation partners

sms

  1. Write documentation
  2. Get a grammatical description
  3. Write the basic twol file + the grammar file
  4. Get dictionary & complete the lexica
  5. Get texts to test against
  6. Find cooperation partners

Last modified: Thu Apr 22 15:44:17 2004