Goal:

Make a multipurpose smenob.xml dictionary:
- For use on the net
- For basis for machine translation
- For glossing of analysis

Work at the moment:

The dictionary, dtd and css are in
smenob.xml
smenob.dtd
smenob.css

The incoming words are in the following files:
    2778 inc-missing-adj
    3379 inc-missing-adv
   10289 inc-missing-nouns
      80 inc-missing-pron
   10900 inc-missing-verbs
   27426 total

These should be translated and thereafter added, in the following way:

Preamble: There are 27000 untranslated words. We will thus have to make prioritites, as to what to translate first, and what later. Here are the principles for what to prioritise:

a. Translate whatever can be done semiautuomatically (all words in -logiija should be copied and translated to -logi, etc., for several classes of loan words; compounds with -láhka, -giella, etc. could get the last compound automatically translated, and then the first part done manually)
b. Translate all the closed classes (all except noun, verb, adj) manually
c. go relatively quickly through lists and translate easy ones
d. check against frequency lists and translate common ones

Conversion principles

Words should be pos tagged (Sámi) and pos and gender tagged (Norwegian). The pos tagging of the inc files is now like this:
smeword<tab>poscode<tab>

the task is then to add a nob translation
smeword<tab>poscode<tab>nobtranslation
and thereafter add it to smenob.xml with Børres script

In order to do that:

1. identify a part of some of the inc-missing files which can be translated (semi)automatically
2. cut it out of the inc-missing file, and glue it into inc-today-a AND inc-today-b files (SubEthaEdit is a nice editor for this).
3. Leave inc-today-a as is
4. Translate the Sámi of inc-today-b into Norwegian
5. Change pos mark, if neccessary
6. At the end of the day, run Børres script for today-to-xml-conversion (note that there shall be exactly the same amount of lines in the a- and b-document!!
7. Empty the inc-today-files
8. Call it a day, and go home.


risten.no
=========
The words from risten.no was added, according to the following procedure:
1. extract sme-pos-nob-pos pairs
2. add them to smenob.xml
3. make a transducer smedic.fst of smenob.xml (extract lemma, xfst < read words
4. run noun-sme-lex.txt etc. against this transducer, and make new, leaner, inc-mising-POS files
5. carry on the manual work (1-8 above)