The Bokmål parser
Intro, to be deleted when the lexicon is exchanged
NOTE! The Bokmål parser is made for two purposes:
- In order to function as the Norwegian side of an intelligent
bilingual Norwegian - Finnish or Norwegian - Sámi dictionary. The morphological part of the nob parser was made with this goal in mind.
- In order to function as a stoplist for the Sámi parser (the
parser should not care about genuinely Norwegian words). The lexical
part of the parser was made with this goal in mind, by taking a list
of nob wordforms and just pouring it into the lexica (words in -e =
verbs, in -bar adjectives, etc.)
Now, the second goal has been superseeded by a different stoplist of
foregin words, with no grammatical analysis whatsoever. Thus,
when a Bokmål dictionary with the inflectional codes found in the
morphological part of the analyser is available, the present noun,
verb, adjective and adverb files should simply be deleted (as should
this intro).
The grammatical part or the nob parser
POS lexica and sublexica
The parser is based upon the grammatical appendix of Turid Farbregd's
Finsk-norsk ordbok (FNO), which, again, is an adaption of the system in
Bokmålsordboka.
Status quo
The parser is not complete. Here is the TODO list:
- Complete the declension classes from the NFO, and test the result
- Get the FNO
- Transform the FNO into an xml database, where the open-POS
declension classes are exported into lexc files via a Makefile, at the same time remove the current lexicon files
- Cover the irregular forms in FNO
- Write entries for the closed POS
Trond Trosterud trond.trosterud@hum.uit.no
Last modified: Sun Oct 31 08:09:22 2004