A documentation of the sme Makefile

The Makefile itself

The Makefile is organized as follows: It contains of a number of blocks, each block builds one binary file. Each binary file is dependent upon a number of other files. The first line of each block, the dependency line, mentions the target file, a colon, and the address of the dependency file. On the next line comes the file with address, thereafter again a colon, and after the colon the dependency files are listed. The dependency line can go over several lines with the backslash character (\). In the first block, the dependency lines look as follows:

isme.fst: ../bin/isme.fst
../bin/isme.fst: ../bin/sme.fst ../bin/sme.save ../bin/tok.fst \
        ../bin/allcaps.fst

After the dependency line comes the actions themselves. In the Makefile, each line is initiated with a TAB character. Commands are initiated with the "@" character, and arguments delimited with quotes. Quotes within the quotes are preceded by \.

The commands are initiated by an informative banner printed to the screen. Then comes the actions. They are printed to a temporary script file (in the first block, the file is "isme-fst-script", by the @printf command. Then the relevant program (e.g. the program xfst in the first block of the sme Makefile) reads the scriptfile and executes it. Finally, the temporary script file is removed by the @rm command. The makefiles for the other languages are made in the same way.

Note that the source files are taken from the src directory (and referred to by filenames only, since the Makefile itself is in the src directory), whereas the binary files are taken from the bin directory, and hence prefixed with '../bin/'.

lookup: the lookup scripts

No lookup scripts are written. They should be, to make it easier to analyse files.

The preprocessor

The sme Makefile contains target for generating a file abbr.txt which contains a list of abbreviations used in the preprocessing phase. The file is generated by script abbr-extract which is located in gt/script directory. It gets as a command line parameter the main abbreviation file and then a list of files from where multiword expressions should be searched for. Basically:

abbr.txt: ../bin/abbr.txt
../bin/abbr.txt: ../../script/abbr-extract abbr-sme-lex2.txt \
		propernoun-sme-lex.txt closed-sme-lex.txt adv-sme-lex.txt

		abbr-extract --abbr_lex=abbr-sme-lex2.txt \
		--lex=propernoun-sme-lex.txt,closed-sme-lex.txt,adv-sme-lex.txt \
		--output=../bin/abbr.txt

If one ever should need to manage without make...

In case the actual commands themeselves are sometimes needed: This is a list of the commands that were needed to build a morphological parser before the time of the makefile.

exchange "sme" for other lg (smj, sms)

Compiling the parser
====================

in twolc (open by typing "twolc")
---------------------------------
read-grammar twol-sme.txt
compile
save-binary twol-sme.bin

in lexc (open by typing "lexc")
-------------------------------
compile-source *-sme.txt
or: run-script skript1 (smj has script file "lskr", sms has no script file)
read-rules twol-sme.bin
compose-result
save-result sme.save

in xfst (open by typing "xfst")
-------------------------------
load stack caseconv.fst 
load stack sme.save 
compose net
save stack sme.fst

The tok.fst tokenizer is also built in xfst:
--------------------------------------------
read-regex < case.regex
save stack caseconv.fst

Last modified: Thu Sep 19 20:58:22 CEST 2002