The Bokmål parser

Intro, to be deleted when the lexicon is exchanged

NOTE! The Bokmål parser is made for two purposes:

  1. In order to function as the Norwegian side of an intelligent bilingual Norwegian - Finnish or Norwegian - Sámi dictionary. The morphological part of the nob parser was made with this goal in mind.
  2. In order to function as a stoplist for the Sámi parser (the parser should not care about genuinely Norwegian words). The lexical part of the parser was made with this goal in mind, by taking a list of nob wordforms and just pouring it into the lexica (words in -e = verbs, in -bar adjectives, etc.)
Now, the second goal has been superseeded by a different stoplist of foregin words, with no grammatical analysis whatsoever. Thus, when a Bokmål dictionary with the inflectional codes found in the morphological part of the analyser is available, the present noun, verb, adjective and adverb files should simply be deleted (as should this intro).

The grammatical part or the nob parser

POS lexica and sublexica

The parser is based upon the grammatical appendix of Turid Farbregd's Finsk-norsk ordbok (FNO), which, again, is an adaption of the system in Bokmålsordboka.

Status quo

The parser is not complete. Here is the TODO list:

  1. Complete the declension classes from the NFO, and test the result
  2. Get the FNO
  3. Transform the FNO into an xml database, where the open-POS declension classes are exported into lexc files via a Makefile, at the same time remove the current lexicon files
  4. Cover the irregular forms in FNO
  5. Write entries for the closed POS
Trond Trosterud trond.trosterud@hum.uit.no
Last modified: Sun Oct 31 08:09:22 2004