Flags diacritics for Sami compounds

Introduction

Flag diacritics are used in the Sami morphological parser in order to remove illegal compounds. The use of flag diacritics is documented in chapter 8 of the Xerox book. The present page documents the flag diacritics format, and the use of them in the parser.

Flag diacritics format

There are four types of flag diacritics, all of them with the format @operator.feature.value@ or @operator.feature@:

U or Unification flags, @U.feature.value@: U is the unification operator, and the form is accepted if, for the relevant feature, the two flags in the derivation string have the same value.
P or Positive (Re)Setting, @P.feature.value@: Sets or resets the feature to the given value.
N or Negative (Re)Setting, @N.feature.value@: Sets or resets the feature to the negation of the given value.
R or Require Test, @R.feature.value@: For this diacritic, a test is performed, and it succeeds iff feature is currently set to value, otherwise the path is blocked.
D or Disallow Test, @D.feature.value@: A test is performed that succeds if feature is neutral or set to a value that is incompatible with value.
C or Clear Feature, @C.feature@: For this flag, the value of feature is reset to neutral.
U or Unification Test, @U.feature.value@: If feature is currently neutra, this diacritic causes feature to be set to value. Else if feature is currently set, then the test succeeds iff value is compatible with the current value of feature.

The problem

Without flag diacritics, compounds with derived nouns are errouneously blocked, or, if they are accepted,

Too strict: only N + N accepted, N + [V-suf]N is not
Too sloppy: also N + V accepted, not only N + [V-suf]N  
Correct: accept compound only if 2nd part is N at end of derivation

A solution

The Nominative and Genitive sublexica of all NounRoot entries are lead to the R lexicon (as earlier).
From there, they go to NounRoot again without flag diacritics. They are also led to VerbRoot and AdjectiveRoot, but equipped with a compound flag diacritic. Then, all affixes that turn adjectives and verbs into nouns are equipped with a corresponding diacritic.
Compounds with only an unsaturated diacritic are removed, whereas compounds with a saturated diacritic are accepted.

A sketch

Here, the P and R diacritics are used, as shown with the R lexicon and two lexica for deverbal nouns, that takes verbal stems as input. The P diacritic sets the value of cmpnd to N, and the R diacritic requires a test.

LEXICON R
 # NounRoot ;
 #@P.cmpnd.N@ 	VerbRoot ;
 #@P.cmpnd.N@ 	AdjectiveRoot ;

LEXICON DeverbalNounsBOAHTI
 +N+Actor:¤@R.cmpnd.N@  DEVNVCASE ;

LEXICON DeverbalNounsDOHPPE-
 +n+N+Actio:m@R.cmpnd.N@ BOAHTIN ;
 +mus1+N:mus1s1@R.cmpnd.N@ MUSH ;
 +meahttun+A:X7#meahttum MEAHTTUN ;

Trond Trosterud

Last modified: Mon Jul 29 12:12:39 GMT 2002