Project tools (utility programs, conversion scripts, Xerox tools, etc)

The project uses a mixture of tools and scripts from Xerox, and our own tools and scripts.

Our home-made tools, and adjustments of public tools

  1. The cgi-bin setup for making the parsers accessible on the web
  2. The web interface to our web demo
  3. Conversion scripts
  4. Testing tools
  5. Emacs for lexicon expansion
  6. Special emacs modes

Xerox tools for morphological analysis

The project uses the following Xerox tools: twolc (for morphophonology), lexc (for morphology), xfst (for compiling the final transducer) , and lookup (for analysis and generation). There is some local documentation available for lexc, and in the future perhaps also for other tools, but the main documentation for the tools will always be the one from Xerox.

The link list below refers to the Xerox documentation pages for these tools, these and other links are found here:

  1. twolc, for phonological and morphophonological rules
  2. lexc, for representing the Sami stems and the affix lexica
  3. xfst, the finite-state transducer tool, for integratingthe different parts of the program, and for compiling the preprocessor
  4. tokenize, for tokenization and processing (note that we do not use tokenize for preprocessing at the moment, but perl)
  5. lookup, an interface to the morphological analyser. NB! cf. our lookup notes

The programs are located in /hmm/bin/, and activated by printing lexc RETURN etc. The tools are documented in Karttunen / Beesley Finite-State Morphology: Xerox Tools and Techniques. An earlier version of the book is found here. The tools may also be installed on your own machine, be it on Mac OSX, Linux or Windows. One version of the software is found on the CD accompanying the book, for the latest version, ask Trond for reference.

Disambiguation tools

  1. Morphological disambiguation
  2. lookup2cg, a script to transform Xerox output to CG input


Last modified: Mon Sep 6 13:15:44 2004