xfst[]: define initcap a (->) A, b (->) B, c (->) C, d (->) D, e (->) E, f (->) F, g (->) G, h (->) H, i (->) I, j (->) J, k (->) K, l (->) L, m (->) M, n (->) N, o (->) O, p (->) P, q (->) Q, r (->) R, s (->) S, t (->) T, u (->) U, v (->) V, w (->) W, x (->) X, y (->) Y, z (->) Z || .#. _ ;This string has been put in the file case.regex, and compiled to caseconv.fst in xfst. As a result all initial caps are downcased, but upon generation all words are given an alternative reading with an initial capital letter. This is not what we want.
First, 'upper' is defined as the set of all capital letters, including the northern Sámi digraphs C1, D1 etc. Then, allacaps is defined as the set of relations 'a (->) A' etc. for all small/capital pairs that occur in the context '.#. upper* _ upper* .#.', i.e. between strings of upper case letters only.
The resulting binary files allcaps.fst is compiled by the Makefile. In principle, the parser sme.fst could have been composed with allcaps.fst into a single transducer (sme.fst .o. allcaps.fst), but this is not done, since the resulting transducer would have been very large indeed (cf. discussion on this siiue in the book). Instead, the issue is handled in a lookup script file. Ath present, this file looks as follows (cf. the discussion on lookup script files in the book):
analyzer /home/trond/gt/sme/bin/sme.fst allcaps /home/trond/gt/sme/bin/allcaps.fst allcaps analyzer
The lookup script should be used as follows (when standing in sme/):
.. | lookup -flags mbTT -f src/cap-sme | ... Note that the files have absolute, and not relative reference (relative reference would here have been ../bin/sme.fst etc.). For another user than trond to get this to work, the user name trond in the path must be exchanged, e.g. to /home/lena/gt/sme/bin/sme.fst etc. For this reason, the cap-sme file is not included in the cvs repository yet 8this is why the link to it does not work). Xerox has been notified, and has answered, cf.
(quote)
Päiväys: onsdag, 12. februar 2003 18:39:06 +0100
paths in lookup scripts.
Vastaus: tamas.gaal@xrce.xerox.com
Trond,
There is some possibility of using Unix environment variables in lookup, see
http://www.xrce.xerox.com/competencies/content-analysis/fssoft/docs/lookup-97/lookup97.html It may not solve your problem - but please read it first: towards the end, there is reference to environment variables like
setenv LOOKUP_SCRIPT_BASE ...
If it is not of enough then the interface should be improved. While
it is not a complicated matter, we are short of able people now so you
may have to use the full pathnames in your scripts until it gets
improved.
(end of quote)
Trond Trosterud trond.trosterud@hum.uit.no Last modified: Mon Nov 1 21:34:10 2004