This page documents the scripts and the Makefile used as test tools.
There are five perl scripts, all located in
$CVSROOT/gt/script/testing/
, and a Makefile, one copy for
each language, located in $CVSROOT/gt/smX/testing/
(where
smX
is the ISO code of your favorite Sami language). The
South Sami (sma) Makefile is used as the development version, and serves
as the original, from which the others are copied.
Below is only described the calling and the return values of the different scripts, for details, see the scripts themselves, they are pretty simple, and fairly well commented (and if not, complain to me).
merge-codesNforms.pl
merge-codesNword.pl
make-gen-test.pl
make-gen-test-facit.pl
make-ana-test.pl
Makefile
merge-codesNforms.pl
To create a base file for making test cases by combining a tag list and a word form list. This way we only have to write the tag list once for each POS.
ARG1:
input file with inflectional tags, one tag on
each line; normally one of the files listed below (the
filenames are not hardcoded, but given by the Makefile):
noun-codes.txt
verb-codes.txt
adj-codes.txt
ARG2:
input file with inflected word forms, in the
same order as the tags; two or more alternate word forms on the
same line, separated by a comma ONLY
A repeating, tab-separated list of fields (three fields), each such triple separated with a newline:
Field 1:
the baseform of the word
Field 2:
a morphological tag
Field 3:
the word form(s) corresponding to the tag; in the case of
two or more alternative word forms, they are separated by
a comma ONLY (no space).
Used in front of one of:
to create the actual test cases, and the corresponding facit files.
merge-codesNword.pl
To create the input file for generating a paradigm by combining a tag list and a base form of a given word.
ARG1:
input file with inflectional tags, one tag on
each line; normally one of the files listed below (the
filenames are not hardcoded, but given by the Makefile):
noun-codes.txt
verb-codes.txt
adj-codes.txt
ARG2:
a word in its base form. The word has to
belong to one of the major POSes N, A or V.
A list of baseform plus codes corresponding to the whole paradigm. There is one such combination on each line.
The output can be directly used as input for xfst
, to
generate the word forms that make up the paradigm.
make-gen-test.pl
To extract from a created testbase file the separate parts needed as input data for testing word form generation.
A testbase file created with 1.
merge-codesNforms.pl
, with the three fields baseform,
inflectional codes, and word form(s) corresponding to the inflectional
codes.
Test file for word form generation testing: one line for each
inflection, consisting of baseform and inflectional codes appended. This
is the input format required by the Xerox xfst
tool.
Use as input to the Xerox xfst
tool (done in the
Makefile).
make-gen-test-facit.pl
To create the expected output from a generation test run, such that the actual test results can be compared with it. Based on the comparison, one can make further reports on the success of the test run.
Testbase file as created above.
A list of word forms in the same format as produced by the Xerox tools, extracted from the testbase file. One word form on each line.
Use the output of this script to diff against the actual test result (done in the Makefile). Any differences indicate possible errors in the morphological description.
make-ana-test.pl
To create a test file (or a facit file) for morphological analysis
by spitting out all the possible word forms with the corresponding
analysis at the end, formated almost as the output from the Xerox
xfst
tool. Some further postprocessing is needed both for
making the test case, and for creating the facit file. This is done in
the Makefile.
A testbase file as created above.
A two-field, tab-separated list:
In cases where there are more than one alternative wordform, they have been split onto separate lines.
Use to create the basis for word form analysis testing. Further sorting and cutting (field 1 as test data, field 2 as facit data) is needed, and is done in the Makefile.
Makefile
Whereas the perl scripts above are pretty short and simple, the Makefile used to automatise testing is pretty long and complex. Thus, the documentation is split into the following sections:
Below is outlined the flow of action for the test bed. The example file is from South Sami, but the flow itself is language independent. The flow diagram illustrates word form generation.
----------- =========== The corresponding "Files", | Scripts | & || Tools || make target ================================================================== "noun-codes.txt" "n-even-col6-ie-full.txt" \ | ------------------------ | merge-codesNforms.pl | ------------------------ || \/ "n-even-col6-ie-full.testbase" n-%.testbase || || || \/ || ------------------------ || | make-gen-test.pl | || ------------------------ || || || \/ || "n-even-col6-ie-full.gtest" %.gtest || || || \/ || ------------------------------------ || | n-even-col6-ie-full-gtest-script | || ------------------------------------ || || || \/ || ============== || || xfst || || ============== || || || \/ || "n-even-col6-ie-full.gresult" %.gresult || || \/ || -------------------------- || | make-gen-test-facit.pl | || -------------------------- || || || \/ || "n-even-col6-ie-full.gfacit" || %.gfacit || || \/ \/ ================= || diff || ================= || \/ "n-even-col6-ie-full.greport" %.greport || all *.greport files - \ \ \ \/ / / / ================= || cat || ================= || \/ "n-g.summary" n-g.summary
The above scheme is repeated more or less identical for word form
analysis, with the exception that there is no separate
-facit.pl
script - the same script is used for producing
both test input and test facit, with the help of some postprocessing in
the Makefile.
The scheme for paradigm generation is much simpler, and it should be possible to read the Makefile directly. If not, complain to me!
The following built-in variables are used:
make
program used. This
is useful f.ex. when starting a make
command in
another directory from within a Makefile, to ensure they are
using the same make
.
The following variables defined by me are used:
make
documentation. That's why.
make
documentation.
When defined, the variable names are written as such, when referenced, they are encapsulated in parenthesis, and prefixed with a dollar sign. Example: $(TEMP) is how the variable TEMP is referenced.
The main sections of the Makefile are the following: