A plan for the work
One parser is needed, it should be made for the Cyrillic
alphabet. Texts based upon the Latin transcription found in corpora in
the west, should be converted into a cyrillic version before analysis,
and then transferred back at output.
While waiting for Cyrillic-based tools (emacs etc), a version will be
made with Latin symbols in a one-to-one relation to the Cyrillic
alphabet.
- Get localisation infrastructure in place for Cyrillic
- Unicode-enabled Xerox tools
- Fonts
- Keyboard
- Linux tools (emacs, command line tools, etc.)
- Decide upon transcription for the Latin version
- Set up relevant sets of phonemes, for latin and cyrillic version
- Decide on a grammatical tag set
- Write a sketch of the grammar, on paper
- Write the morphological transducer
- Get a lexicon
Trond Trosterud
Last modified: Thu Dec 25 22:31:34 2003