!!!Meeting with Polderland 21.11.2006 Participants: * Peter Beinema * Sjur Moshagen * Thomas Omma !!Agenda * since last time * questions and answers !!Since last time __Polderland:__ * speller: include large lexicon ** -> adapt "mklex" lexicon compiler for large files (found bug in gcc 3.2) * mklex: transfer mklex to macintosh OS/X, to be delivered to divvun ** make Sami-specific; leave out some general functionality ** create scripts to run mklex on your side * hyphenator: ** insert lexicon lookup-step before pattern matching ** adapt pattern matching (ascii-based, not UTF8/UCS2) __Divvun:__ * first test run of PLX data done, including hyphenation points !!Alpha version !Spellers Both {{sme}} and {{smj}}. The {{sme}} version will be using the latest, 20Gb lexicon, if possible. !Hyphenators Will use only the limited data delivered to Polderland, and use the fallback algorithm for all words not in the lexicon. It will provide a nice test case for the fallback algorithm:-) {{{ Divvun hyphenation marks: # - word boundary ^ - soft hyphen - - hard hyphen Polderland hyphenation marks: -- hard hyphen - soft hyphen }}} !!Possible issues !Clitics Clitics should be applied to all inflected forms. These are normally marked as rightmost only. How can we specify the clitics such that they can be combined with inflected forms? Or do we have to pregenerate all word forms with clitics? If so, the size of the generated data will increase more than 10 times (> 250 Gb!) sample word: xyzzy NR will + operator work in this case: ish +N,A go Vt goes VRI ing +Vt gå +V schaaps NL (sheep-) !!Next meeting Next Tuesday (28.11.) at the usual time. !!!TODO: * continue to improve hyphenation (__Sjur__ and __Thomas__) * provide new batch of hyphenated data (__Sjur__) * send clitic problem via e-mail to Polderland (__Sjur__) * send first round of PLX data to Polderland (__Tomi, Børre__) * make complete PLX data set (__Tomi__) * get language codes to work with Mac Office 2004 (and check MacOffice 2007) (__Polderland__) * deliver Alpha versions (__Polderland__) including mklex + hyphen script * try to find proper compiler version for Adobe Indesign (old version will probably do)