Quasicode for transfer from sme to smj Overall plan Goal: Create sme-smj lexical pairs, with sme as a starting point: A. Cognates 1. Divide the sme lexicon in homogenous groups 2. Translate each group to quasi-smj 3. Spell-check or analyse the result 4. Group entries into analysable and non-analysable 5. Go manually through the result B. Non-cognates 1. Always record non-cognate pairs when going through the cognate list 2. go through frequency lists and mark non-cognates 3. go through parallel text and look for non-cognates C. The dictionary 1. Make smesmj lists 2. Make real dict5ionary entries 3. Convert the smesmj lists into xml entries D. machine translation Detailed code for transfer: According to etymology: Recent loans transfer ok, now for checking Incorporated Sámi phon According to stem type: Even-syllable stems Odd-syllable stems Contracted stems Closed classes Even-syllable stems Odd-syllable stems Contracted stems Closed classes Report, Oct 4th Tested out odd-syll verbs: 466 cat ../sme/src/verb-sme-lex.txt | egrep '(ALIST|ARVIL|ARVVASJ|BALAT|BEAGASJ|BONJAT|BORGGIST|BOTNJAS|BUOVVAL|COASKKIT|CUOHCIT|CUORPMAST|GEAGAT|HALIID|JORGGIID|LASSAN|MUITAL|MUITTASJ|SUOTNJAL|VUORDIL)' | grep -v '^\!' | cut -d":" -f1 > smeoddverbs 468 cat smeoddverbs | lookup -flags mbTT -utf8 ej.fst | grep -v '^$' | l Report, Oct 8th Tested out contract verbs: 688 cat ~/gt/sme/src/verb-sme-lex.txt | grep LOAN | egrep '(DOHPPE|GILLE|CIRRO|BASSO|BORGE|MUITA|FERTE) ' | grep -v '^\!' | tr -d "^0'" | cut -d":" -f1 | cut -d" " -f1 > smecontrloanverbs 689 cat ~/gt/sme/src/verb-sme-lex.txt | grep -v LOAN | egrep '(DOHPPE|GILLE|CIRRO|BASSO|BORGE|MUITA|FERTE) ' | grep -v '^\!' | tr -d "^0'" | cut -d":" -f1 | cut -d" " -f1 > smecontrverbs Made a working version of the converted smeoddverbs file: cat smeoddverbs7 | tr -d "0\'" | lookup -flags mbTT -utf8 ej.fst | grep -v '^$' > src/inc-smesmj-alloddverbs.txt Next steps: 1. split the inc-smesmj-alloddverbs.txt file in two: one that we can analyse and one that we cannot 2. add them to the lexicon, manually. The recognised ones fast, the other ones with care 3. rewrite the conversion script 4. convert and add the contracted verbs 3. then even-syll verbs 4. then Nouns 5. then djectives How to get the words: ejodd.fst: sme-to-smj jok.fst: smj-to-analysis jerr.fst: smj