How to write disambiguation files

Constraint grammar

The main introduction to CG-2 is Tapanainen 1996. Karlsson & al 1992 gives a good introduction to CG-1, and also the most thorough presentation of the philosophy behind the constraint grammar framework.

The projects uses the CG-2 formalism, and this formalism is presentation below. The concrete implementation is vislcg.

The structure of the disambiguation file

The disambiguation file has the suffix .rle, in our case it is called sme-dis.rle, smj-dis.rle, etc. The file consists of the following sections (two additional sections MAPPINGS and CORRECTIONS may also be used, they follow the CONSTRAINTS sections):

The delimiters

There are three delimiters, ".", "?" and "!". The section thus contains the following line only:

DELIMITERS = "<.>" "<!>" "<?>" ;

The lists and sets

This section is introduced with the heading SETS. Note that this heading must be removed in order to run the parser with Connexor's parser mdis.

The tags are introduced to the parser as lists and sets. Tags or tag combinations that are not introduced here must be referred to within parentheses. Lists and sets are defined according to the following principles:

  1. Every grammatical tag used in the rules as such should be declared as a one-membered LIST carrying its own name. This in order to be able to refer to them without using parenteses. Thus, we now have SELECT N IF (-1 V); instead of SELECT (N) IF (-1 (V)).
  2. Tag combinations, such as (N Sg) or (Pron Pers Sg1 Nom) are not defined as single-membered LISTs, but referred to as tags (hence in parentheses). Note: Should frequent tag combinations get a list and thereby a shorter name? e.g. LIST MUN = (Pron Pers Sg1 Nom); ?
  3. Sets of different tags and/or combinations are declared as sets, e.g. SET ADVLCASE = Ill | Loc | Com | Ess. Note: Should they be declared as lists, e.g. LIST ADVLCASE = Ill Loc Com Ess ;? In this case they are of course a list of tags, and not a set of (atomic) sets. Which method is preferrable?
  4. Finally, the sets on which there are other operations than OR (such as + and -) are declared as SETs.

The constraints

The format of the constraint rules


Trond Trosterud
Last modified: Wed Sep 24 02:27:02 2003