This is a shor desription of how to build a TMX file for CAT Autsch using the gt_sd parallel corpus. in freecorpus 1. get the latest version of the free corpus: svn up 2. extract parallel files for a certain language pair: java net.sf.saxon.Transform -it main extract_parallel_corpus.xsl lang1=nob lang2=sme inDir=converted 3. index sentences corpus-analyze.pl --all --output="./paracosent/$current_lang/$base_n.sent.xml" --only_add_sentences --lang=$current_lang "$file" 4. align sentences (in gtsvn/tools/alignment-tools/tca2) perl sentalign_parallel-corpus.pl --dir=DIR --lang1=LANG1 --lang2=LANG2 --anchor=ANCHOR_FILE Ex.: perl sentalign_parallel-corpus.pl --dir=0_sme2nob_test --lang1=sme --lang2=nob --anchor=anchor_files/anchor-smenno.txt 5. build the TMX file (see gtsvn/mt/autshumato-ite/conv_test): 5.1 perl sentaligned2tmx.pl --aligned=result_tca2 --lang1=nob --lang2=sme 5.2 put all output into a single TMX file java -Xmx2048m net.sf.saxon.Transform -it main unifiyTMX.xsl inDir=tmx-out ============================================