General information for new users

The document index.html (the one that contained the link to this page) is index to the docuumentation of this Sámi language technology project, conducted at the University of Tromsø. Read the documentation, it is useful. At present, the goal is to build a morphological parser for Northern Sámi, and to build basic parsers for Southern and Lule Sámi. The other latin-based Sámi written languages will also be looked into. We have also started creating a morphological disambiguator for Northern Sámi, using Constraint Grammar technology. The project application contains some general background information on the project (note that present resources enables us to do appr. 1/4 of what is sketched in that document).

Directory structure

The project is located in the directory gt/ (an acronym for giellateknologiija, language technology). These are the subdirectories (the abbreviations for the different languages are in accordance with the ISO standard for language codes):

Each language directory has the following subdirectories:

The gt/ directory is copied to the home directory of each user by the cvs program.

Project history

The linguistic ground work for the Northern Sámi project was done by Pekka Sammallahti in 1993. His original 1993 files were twolrules-saame.txt (the twol rules), lexicon-saame.txt (a preliminary lexicon file), LEXITWOL.doc (a slightly different version of the same file, with more lexicon explanations, the two were unified into the present files), ADJ-TWOL.doc and NOMENAT.doc, the nouns and adjectives. Pekka's input can be found in the catalogue 93-originals/ (they are not included in the cvs catalogue, ask Trond for reference).

In december 2001 Pekka handed over raw dictionary files for nouns and adjectives, and for verbs, adverbs, and closed parts of speech. These files were tranlated over in the lexc format.


Trond Trosterud
Last modified: Fri Feb 21 10:20:32 GMT 2003