The files here are (where LANG is a variable fao, kal, sma, sme, smj, ...) LANG_bible.txt --- candidate bible texts to LANGcorp.txt LANG_facta.txt --- candidate facta texts to LANGcorp.txt LANG_ficti.txt --- candidate fiction texts to LANGcorp.txt LANGcorp.dep.corr.txt --- a goldstandard dep analysis of LANGcorp.syn.corr.txt LANGcorp.syn.corr.txt --- a goldstandard syntactic analysis of LANGcorp.txt LANGcorp.txt --- an equal share of sentences from bible, facta, ficti. Note that for the development phrase there also is a file kalcorp.morph.txt It is generated as follows: cat kalcorp.txt | preprocess | ukal | lookup2cg > kalcorp.morph.txt .. just because I got tired of waiting for ukal. Thus, new kalcorp.syn.corr.txt can be generated by cat kalcorp.morph.txt | vislcg3 -g ~/gtsvn/st/kal/src/kal-dis3.rle ----- The dep and syn files are __first__ generated: (for sme, sma, smj, it is gt, and not st) cat LANGcorp.txt | preprocess --abbr=$GTSVN/st/LANG/bin/abbr.txt | lookup $GTSVN/st/LANG/bin/LANG.fst | lookup2cg | vislcg3 -g $GTSVN/st/LANG/src/LANG-dis.rle > LANGcorp.syn.corr.txt The resulting LANGcorp.syn.corr.txt should then be manually corrected. Then the dep.corr is made, as follows: vislcg3 -g $GTSVN/st/LANG/src/LANG-dep.rle > LANGcorp.dep.corr.txt or even (using the sme-dep.rle as common dep file): cat LANGcorp.syn.corr.txt | vislcg3 -g $GTSVN/gt/sme/src/sme-dep.rle > LANGcorp.dep.corr.txt Then again manual correction. Errors may be spotted as follows: cat LANGcorp.syn.corr.txt | vislcg3 -g $GTSVN/gt/sme/src/sme-dep.rle > compareLANG.txt diff LANGcorp.dep.corr.txt compareLANG.txt