How to write disambiguation files
Constraint grammar
The main introduction to CG-2 is Tapanainen 1996. Karlsson & al 1992
gives a good introduction to CG-1, and also the most thorough
presentation of the philosophy behind the constraint grammar
framework.
The projects uses the CG-2 formalism, and this formalism is
presentation below. The concrete implementation is vislcg.
The structure of the disambiguation file
The disambiguation file has the suffix .rle, in our case it is called
sme-dis.rle, smj-dis.rle, etc. The file consists of the following
sections (two additional sections MAPPINGS and CORRECTIONS may also be
used, they follow the CONSTRAINTS sections):
- DELIMITERS
- SETS
- CONSTRAINTS (there are several CONSTRAINT sections
The delimiters
There are three delimiters, ".", "?" and "!". The section thus
contains the following line only:
DELIMITERS = "<.>" "<!>" "<?>" ;
The lists and sets
This section is introduced with the heading SETS. Note that this
heading must be removed in order to run the parser with Connexor's
parser mdis.
The tags are introduced to the parser as lists and sets. Tags or tag
combinations that are not introduced here must be referred to within
parentheses. Lists and sets are defined according to the following
principles:
- Every grammatical tag used in the rules as such should be declared as
a one-membered LIST carrying its own name. This in order to be able to
refer to them without using parenteses. Thus, we now have
SELECT N IF (-1 V); instead of SELECT (N) IF (-1 (V)).
- Tag combinations, such as (N Sg) or (Pron Pers Sg1 Nom) are not defined
as single-membered LISTs, but referred to as tags (hence in parentheses).
Note: Should frequent tag combinations get a list and thereby a shorter
name? e.g. LIST MUN = (Pron Pers Sg1 Nom); ?
- Sets of different tags and/or combinations are declared as sets, e.g.
SET ADVLCASE = Ill | Loc | Com | Ess.
Note: Should they be declared as lists, e.g. LIST ADVLCASE = Ill Loc Com
Ess ;? In this case they are of course a list of tags, and not a set of
(atomic) sets. Which method is preferrable?
- Finally, the sets on which there are other operations than OR (such as +
and -) are declared as SETs.
The constraints
The format of the constraint rules
Trond Trosterud
Last modified: Wed Sep 24 02:27:02 2003