This file documents installation of the Sami analysers. Prerequisites: ============== You must have a Unix system (Linux or Mac), and a terminal supporting UTF-8. Mac users must make sure they have standard developer tools installed, from the developer tools on the Mac OS system CD. In order to compile the analyser you need compilers from __one__ of these two sites (both are not needed, they compile the same analysers): * xerox tools: http://fsmbook.com * hfst tools: http://sourceforge.projects.net/hfst/ The xerox tools are available as binary compilers, for non-commercial use. You will need both lexc, xfst, lookup and twolc. All the programs must be installed in a folder in your path. The hfst tools are open source. Follow the instructions in the downloaded folders At present (2010), the xerox tools are better tested and supported, and easier to install (no compilation needed). Hfst is open source and without restrictions, and contrary to xerox, has support for introduction of weighted transducers. For syntactic analysis you will need the Constraint Grammar compiler vislcg3. It can be obtained from http://visl.sdu.dk/vislcg3/. Note that Mac users must install ICU (http://...) If you do not need syntax you may ignore this compiler, and ignore the corresponding error message which will come during compilation. Installation: ============= Open a terminal window. Stand in gt/ (this folder). T With xerox compilers, write the command: make GTLANG=LANG With htst compilers, write the command: make -f Makefile.hfst After the compilation process, the analysers can be found in st/LANG/bin/ Compiled files: =============== Here are the compiled files: Files for use: -------------- Analysers from xerox compilation * LANG.fst = Faroese analyser * iLANG.fst = Faroese generator Analysers from hfst compilation * LANG-gen.hfst * LANG.hfst * LANG.hfst.ol Syntax files * LANG-dep.bin * LANG-dis.bin For a list of auxiliary files, see below *). Usage notes: ============ (standing in LANG/ (one level up): morphological analysis: ----------------------- xerox: cat textfile | preprocess --abbr=bin/abbr.txt \ lookup bin/LANG.fst hfst: cat textfile | preprocess --abbr=bin/abbr.txt \ hfst-optimized-lookup bin/LANG.hfst.ol syntactic analysis: pipe the output from morphology, and do this: | lookup2cg | vislcg3 -g bin/LANG-dis.bin \ | lookup2cg | vislcg3 -g bin/LANG-dep.bin \ LANG-dis.bin gives syntax and LANG-dep.bin gives dependency. A better dependency analysis is given by using the common dep file:: | lookup2cg | vislcg3 -g bin/LANG-dis.bin \ | lookup2cg | vislcg3 -g ../../gt/smi/src/smi-dep.rle \ *) List of auxiliary files in the bin/ catalogue: ================================================= Auxiliary files from xerox compilation * LANG.save = Faroese analyser, without initial capital letters * twol-LANG.bin = Faroese analyser Auxiliary files from xerox compilation * lexc-LANG.hfst * twol-LANG.hfst General files * abbr.txt = list of abbreviations for use in preprocessor * allcaps.fst * inituppercase.fst * tagfix.fst * tok.fst