!!!Meeting with Polderland 26.9.2006 Participants: * Peter Beinema * Thomas Omma * Sjur Moshagen * Tomi Pieski !!Agenda * since last time * hyphenation data * questions and answers !!Since last time __Peter__ sent phonetic speller correction rules, and __Thomas__ has been working on them. __Binary__ version of the speller for Mac not yet available, the old version has to be updated to be Unicode-compatible with Office 2004/Mac. It is in the works. __Test results__: the regression testing didn't cope well with the rather large data set (got into memory swapping:-( ), but it was tested with a smaller sample (the test code will be rewritten to deal better with data in the 600+ Mb size). Test findings upto now: all words tested are OK, except for: {{{ - words containing colons (":"), such as "ABC-company:ai" "ABC-company:aid", ... - words containing asterisk characters ("*"), such as "Juvvelan*gorsa" - this is a typo, the only one, it is already fixed - multi-word expressions, such as "Beakka Hánno gieddila...": }}} !!Hyphenation {{{ # = word boundary ^ = possible hyphenation point - = hard hyphen }}} Hyphenation at Divvun has improved, but there are still issues. Will be investigated. !!!Tasks since last time * send correction/phonetic rules to Thomas (__Peter__) ** done * review the processed data sets (__Thomas__) ** no need to do this, as the delivered data isn't split any more. * make speller lexicon data for Polderland, with POS, compounding properties and hyphenation (__Divvun__, mainly __Tomi__) ** started * try to send a first binary to Sjur (__Peter__) ** we'll wait for the Mac version * check whether hyphen is used when compounding abbreviations (TV-stuolla) (__Thomas__) ** yes, hyphen !!!TODO: * continue to improve hyphenation (__Sjur__ and __Thomas__) * continue with speller data generation/conversion (__Tomi__)