sme-sma-mt meeting 12.8.2013 Francis, Lene, Trond. !!!Agenda * Evaluation * Plan, overall principles * Analysis * Linguistic transfer issues **Px **Inflected forms **Numerals **Lexical selection **Px **... * Generation !!!Evaluation The abstract and hence the plan: * Show sme2sma as a pilot, that it is feasible. Choose a narrow domain. Evaluation procedure # Send text pairs to sma translators: sme2sma and nob. (the sme2sma text perhaps enriched with missing words) ## Which is quicker: editing sma MT vs translating from scratch ## Method: giving two texts: one to translate and one to edit Two texts to three translators: one nob-only, one nob + smaMT # Questions: ## Time the task ## Answer question: How did you like the smaMT text? ## hypothesis: smaMT has a less Norwegian syntax, and this can be seen as an asset (?) There is a similar study evaluating es2pt, giving pt translators an en original and a es2pt MT text. Here is the paper: "Using the Apertium Spanish-Brazilian Portuguese machine translation system for localization". François Masselot, Petra Ribiczey (both Autodesk) and Gema Ramírez-Sánchez (Prompsit) Annual Conference of the European Association for Machine Translation in 2010. Content: * 2 articles, each one or two pages * 3 translators !!!Plan, overall principles Content: # sme: Improve the analysis (syntactic functions...) # sme-sma texts: pick words, add words # sme-sma mt-tests: improve the syntax, morphosyntax # sma: Improve the generation (double forms, ...) ## Worst-case-fix: word1/word2 => word1 # sma and sme: add missing words to fst # CG-rules for lexical selection # Improve/finish sme/src/smi-syn.rle (the file is temporarily in sme/src/) Online: * [https://gtweb.uit.no/mt/] * Update: ** gtweb: {{/opt/mt/README}} Apertium Wiki: * [http://wiki.apertium.org/wiki/Talk:North_S%C3%A1mi_and_South_S%C3%A1mi] __Deadlines:__ * Find texts * Find translators * 30.8. Send texts to the translators * 15.9. Receive evaluation from the translators * 26.9. Conference !!!Analysis sme-dis.rle vs. Old-sme-dis.rle Some syntactic tags are missing. Linda used syntactic functions in her rules. Lene will spend a day or two on that. We do not use dependency. Evaluate Francis' tag conversion: Analyse the same sme text with identical morphology, and identical dis, but one with gt tags and one with Fran's converted apertium tags. Francis to look into that and report differences. !!!Linguistic issues !!Inflected forms Two ways of translating positive adjectives in the attributive: # to adjective (attr -> attr) # to a noun in the genitive Here are the cases: 1)

adjsmeadjsma

2)

adjsmensma

estehtalaš vs. estetihken fágalaš vs. faagen 3)

advsmensma

- báikkálaččat vs. byjreskisnie => fst? MT !!Numerals {{{ guoktečuođigolbmalogi guokte#čuođi#golbma#logi+Num+Sg+Nom <= change the "#" to "+"? guoktečuođigolbmalogi+Num+Sg+Nom guoktečuođigolbmalogi+Num+Sg+Nom guoktečuođigolbmalogi guoktečuođigolbmalogi+Num+Sg+Nom guoktečuođigolbmalohki guokte#čuođi#golbma#logi+Num+Sg+Nom guokte#čuođi#golbma#logi+Num+Sg+Nom guoktečuođigolbmalogi guokte#čuođi#golbma#logi+Num+Sg+Nom guoktečuođigolbmalohki guokte#čuođi#golbma#logi+Num+Sg+Nom göökte#tjuetie#golme#luhkie+Num+Sg+Nom ^göökte+tjuetie+golme+luhkie$ ^göökte$ ^tjuetie$ ^golme$ ^luhkie$ }}} !!Lexical selection .dix: {{{

lávettsietsehthmuerjie

lávetprovhkedh

}}} The default pair is listed in the file: apertium-sme-sma.sme-sma.lrx: !transfer/bidix {{{ < }}} * Tag differences for the whole paradigm: bidix * Tag differences for parts of the paradigm: t1x-files {{{ input: ^lávet$ -> ^lávet/aaa/bbb$ ^lávet$ -> ^lávet/xxx/yyy$ sed 's/lávet/aaa/g' sed 's/lávet/yyy/g' vs. sed 's/lávet/aaa/g' sed 's/lávet/yyy/g' rules: 1. select aaa for lávet ; 2. select yyy for lávet ; l: á: v: e: t: :select(aaa) l: á: v: e: t: :select(yyy) vs. l: á: v: e: t: : :select(aaa) l: á: v: e: t: : :select(yyy) result: input: ^lávet/xxx/yyy$ ; rules-matched: 1, 2 input: ^lávet/aaa/bbb$ ; rules-matched: 1, 2 which rule is chosen ? 1 or 2 ? }}} {{{