A plan for the work

One parser is needed, it should be made for the Cyrillic alphabet. Texts based upon the Latin transcription found in corpora in the west, should be converted into a cyrillic version before analysis, and then transferred back at output.

While waiting for Cyrillic-based tools (emacs etc), a version will be made with Latin symbols in a one-to-one relation to the Cyrillic alphabet.

Get localisation infrastructure in place for Cyrillic
1. Unicode-enabled Xerox tools
2. Fonts
3. Keyboard
4. Linux tools (emacs, command line tools, etc.)
Decide upon transcription for the Latin version
Set up relevant sets of phonemes, for latin and cyrillic version
Decide on a grammatical tag set
Write a sketch of the grammar, on paper
Write the morphological transducer
Get a lexicon

Trond Trosterud

Last modified: Thu Dec 25 22:31:34 2003