!!!Plan for common conversion from LexC to speller engines of Aspell type Three different speller engines * Polderland * Aspell * OOo speller => HunSpell Common features and properties: * not lexc-compatible => they require converting from lexc to native/whatever * basically list-based, with compounding and morphology (Aspell has no compounding) * => similar expressive power * takes surface forms as input Because of the similarities: * one conversion "engine"/script * several output formats !!Output format varies according to engine, but is basically a full-form word list that can be processed for compresssion. In Aspell, this processing is called {{munch}}-ing. For Polderland, we have no name yet, but they have a similar processing with the same goal. !!Information to be added: * "inflection" tags: munching of fullform lists * wordform frequency: extracted and added during compilation, can use full-form lists * compounding: tags as comments in the LexC format * style: tags as comments in the LexC format Pseudocode: # closed POS: create a transducer containing all and only the rest, and xfst:print; convert to desired format # NAVAdv: For each word: ## read one line from the lexicon files, including Comp and Style comments ## generate full paradigm, and all compounding forms ## filter the resulting word form list against any Comp and Style restrictions ## add the Comp and Style restrictions to the relative wordforms (all for Style, 5 for Comp) ## output in the desired format Implementation points to consider: * It should be easy to add new output formats * the transducer(s) used for the conversion should be wrapped into a server * the same server setup could be used for the CGI-BIN scripts WHO??? Candidates: Saara, Tomi The following output was generated to try out different strategies for generating the compounding stem. {{{ hum-tf4-ans157:~ trond$ lookup -flags mbTT -utf8 gt/sme/bin/isme.fst 0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100% eadni+N+SgNomCmp#giella+N+Sg+Nom eadni+N+SgNomCmp#giella+N+Sg+Nom eadnegiella eadni+N+SgNomCmp#giella+N+Sg+Nom eadnegiella eadni+N+SgNomCmp#giella+N+Sg+Nom eadnegiella eadni+N+SgNomCmp#giella+N+Sg+Nom eadnegiella eadni+N+SgNomCmp#giella+N+Sg+Nom eadnegiella eadni+N+SgNomCmp#giella+N+Sg+Nom eadnegiella eadni+N+SgNomCmp#giella+N+Sg+Nom eadne-giella eadni+N+SgNomCmp#giella+N+Sg+Nom eadne-giella eadni+N+SgNomCmp eadni+N+SgNomCmp eadni+N+SgNomCmp +? Trond's version: sealgi+N+SgCmp#eadni+N+Sg+Nom sealgi+N+SgCmp#eadni+N+Sg+Nom sealeadni sealgi+N+SgCmp#eadni+N+Sg+Nom seal-eadni sealgi+N+SgCmp#eadni+N+Sg+Nom sealgeadni sealgi+N+SgCmp#eadni+N+Sg+Nom sealg-eadni sealgi+N+SgCmp#eadni+N+Sg+Nom sealggeadni sealgi+N+SgCmp#eadni+N+Sg+Nom sealgg-eadni sealgi+N+SgNomCmp#eadni+N+Sg+Nom sealgi+N+SgNomCmp#eadni+N+Sg+Nom sealgeeadni sealgi+N+SgNomCmp#eadni+N+Sg+Nom sealge-eadni sealgi+N+SgNomCmp#eadni+N+Sg+Nom sealgieadni <==== ? sealgi+N+SgNomCmp#eadni+N+Sg+Nom sealgi-eadni <==== ? sealgi+N+SgGenCmp#eadni+N+Sg+Nom sealgi+N+SgGenCmp#eadni+N+Sg+Nom sealggeeadni sealgi+N+SgGenCmp#eadni+N+Sg+Nom sealgge-eadni sealgi+N+SgGenCmp#eadni+N+Sg+Nom sealggieadni sealgi+N+SgGenCmp#eadni+N+Sg+Nom sealggi-eadni sealgi+N+PlGenCmp#eadni+N+Sg+Nom sealgi+N+PlGenCmp#eadni+N+Sg+Nom selggiideadni sealgi+N+PlGenCmp#eadni+N+Sg+Nom selggiid-eadni dušši+N+SgNomCmp#eadni+N+Sg+Nom dušši+N+SgNomCmp#eadni+N+Sg+Nom duššeadni dušši+N+SgNomCmp#eadni+N+Sg+Nom dušši-eadni dušši+N+SgNomCmp#eadni+N+Sg+Nom duššieadni }}}