A flowchart of the parsing process

        Action taken..              ..by the means of the command:
        **************            ******************************

    |--------------------|
    | take incoming text |        sme$ cat corp/filename.txt | 
    |--------------------|
             \/
 |--------------------------|
 | preprocessing it:        |
 | moving one word per line,|     preprocess --abbr=bin/abbr.txt |
 | finding sentence bound.  |
 |--------------------------|
             \/
|-----------------------------|
| morphological analysis:     |
| give each word all possible |   lookup -flags mbTT bin/sme.fst |
| analyses                    |
|-----------------------------|
             \/
|-----------------------------|
| processing the output into  |
| a format that fits the dis- |   ../script/lookup2cg |
| ambiguator, w/a perlscript  |
|-----------------------------|
             \/
|-----------------------------|
| disambiguating the analysis:|
| picking only the relevant   |   mdis --grammar src/sme-dis.rle
| morphological analyses.     |
|-----------------------------|

Further versions will hopefully assign syntactic tags, and disambiguate them, in the same way as described for other languages in the literature.

In order for the command to work, one must stand in the sme (etc.) directory. The files are in different directories, for the following reasons:

Hmm, one could perhaps claim that this is somewhat confusing...


Trond Trosterud
Last modified: Thu Apr 29 09:09:00 2004