This directory contains some scripts for working with lexical data for oahpa. 1. convert data from csv into oahpa-lexicon xml - 'c' in 'csv' is a very general label here (comma separated values) - this is the reason why the scripts are named 'uusv': underscore underscore separated values field separator = __ (two underscores) separator between items of same type (translations, semantic classes) = , (comma) java -Xmx2024m net.sf.saxon.Transform -it:main uusv2oahpa_xml_extened.xsl inFile=wordlist.csv src_lang=crk tgt_lang=eng ==> result files are generated in the directory defined in the variable outDir (here "xml-out") input format: LEMMA __ POS __ TRANSLATION_1,TRANSLATION_2,TRANSLATION_n __ SEMCLASS_1,SEMCLASS_2, SEMCLASS_n NB 1: the first translation gets the attribute stat="pref" NB 2: all translation get the same set of semantic classes NB 3: if a lemma has different meanings it has to have as many entries as meanings and each e-element has to have an ID denoting its meaning 2. revert the xml file from aaabbb to bbbaaa java -Xmx2024m net.sf.saxon.Transform -it:main revert_oahpa-lexicon.xsl inDir=xml-out slang=LANG-CODE tlang=LANG-CODE ==> result files are generated in the directory defined in the variable outDir (here "_reverted2eng" because 'nob' is defined as target language 'tlang', generally _reverted2TLANG) NB: the parameter inDir should be adapted to whatever the input directory is 3. redistribute the reverted files by the pos values of the reverted entries (some pos values might be different than those of the original entries) java -Xmx2024m net.sf.saxon.Transform -it:main pos-split_reverted-data.xsl inDir=_reverted2SLANG java -Xmx2024m net.sf.saxon.Transform -it:main pos-split_reverted-data.xsl inDir=_reverted2nob ==> result files are generated outDir (CAVEAT: slang is not the original tlang!) 5. merge the possible doublings for each language and each file separately java -Xmx2024m net.sf.saxon.Transform -it:main merge_pos-split-data.xsl slang=SLANG The script processes all the files in the directory pos_redistr_SLANG, e.g. java -Xmx2024m net.sf.saxon.Transform -it:main merge_pos-split-data.xsl slang=nob processes all the files in the directory pos_redistr_nob. ==> result files are written into outDir (here: to_filter_nob) 6. filter away the entries without stat="pref" java -Xmx2024m net.sf.saxon.Transform -it:main stat-filter_merged-data.xsl slang=SLANG The script processes all the files in the directory to_filter_SLANG, e.g. java -Xmx2024m net.sf.saxon.Transform -it:main stat-filter_merged-data.xsl slang=nob processes all the files in the directory to_filter_nob. ==> result files are written into outDir (nobfkv, engcrk, etc., generally: SLANGTLANG) These directories that contain the bilingual reverted Oahpa lexicons should be copied under the directory ped/TLANG (parent directory of language material for TLANG-oahpa, e.g. ped/fkv for fkv-oahpa).